Bedrock
engineeringComputer VisionPythonMemory Optimization

How We Compare 1000-Page Drawing Sets Without Running Out of Memory

Construction drawings at 300 DPI can exceed 200 million pixels per sheet. Here's how we built a system that handles 1000+ page comparisons without crashing.

Stan Liu
Stan Liu · Co-Founder
·9 min read
Share

TL;DR

  • We use pre-allocated memory buffers and explicit garbage collection to process 1000+ page drawing sets without crashing. Standard NumPy operations create hidden temporaries that spike RAM unpredictably.
  • A single 24x36" construction drawing at 300 DPI is 77.8 million pixels. A 500-sheet comparison means 233 GB of raw pixel data.
  • The three hardest problems: memory management, automatic alignment (no grid lines, rotated sheets), and batch throughput (500 pairs needs to finish the same day).
  • McKinsey research shows construction still relies mainly on paper. Digital tools that actually work at scale are rare.

The Scale Problem

Construction drawings are big. Not "large JPEG" big. Enormous.

A standard US architectural sheet is 24x36 inches. At 300 DPI (the minimum for legible detail), that's 7,200 x 10,800 pixels per sheet. That's 77.8 million pixels. Per sheet.

Sheet SizeDPIPixelsMemory (RGB)
ARCH D (24x36")30077.8M233 MB
ARCH E (36x48")300155.5M467 MB
ARCH D (24x36")400138.2M415 MB

A mid-size commercial project might have 500+ sheets across architectural, structural, and MEP disciplines. Comparing two revisions means processing 1,000 images. At 233 MB each, that's 233 GB of raw pixel data.

Most comparison tools work fine on 50-page samples. They fail on real project volumes.

"One reason for the industry's poor productivity record is that it still relies mainly on paper to manage its processes and deliverables such as blueprints, design drawings, procurement and supply-chain orders, equipment logs, daily progress reports, and punch lists."

— McKinsey Global Institute, Imagining Construction's Digital Future

Why Existing Approaches Break

The NumPy Temporary Problem

Python's NumPy is the standard for image processing. It's fast and convenient. It's also a memory landmine.

Consider this innocent-looking code:

# Seems reasonable
result = (image_a * 0.5 + image_b * 0.5).astype(np.uint8)

What actually happens in memory:

  1. image_a * 0.5 creates a temporary float64 array (8 bytes/pixel)
  2. image_b * 0.5 creates another temporary
  3. The addition creates a third temporary
  4. .astype() creates the final output

For a single 8000x8000 RGB image, that's 1.5 GB of hidden temporaries. Stack two of these operations and you've blown past typical container limits.

The Alignment Challenge

Drawing comparison requires pixel-perfect alignment. If sheet B is rotated 0.3 degrees relative to sheet A, the overlay shows false changes everywhere.

Tools like Bluebeam require manual 3-point alignment. An operator clicks three corresponding points on each drawing, and software computes the transformation. This works but doesn't scale. For 500 sheet pairs, that's 3,000 clicks minimum. For an overview of different comparison approaches, see 3 Drawing Comparison Methods That Scale.

Automatic alignment is hard because:

  • Not all drawings have grid lines
  • Title blocks change between revisions
  • Scales can vary slightly between issues
  • Scanned drawings may be rotated

The Throughput Problem

Even with infinite memory and perfect alignment, there's throughput. A project team receiving updated drawings needs comparison results the same day. Processing 500 pairs at 10 minutes each means 83 hours. That's not useful.

How We Solved Memory

Pre-allocated Scratch Buffers

The fix for NumPy temporaries is explicit memory control. Instead of letting NumPy allocate on every operation, we pre-allocate fixed buffers and reuse them.

# Pre-allocate output and scratch buffers
overlay = np.empty((h, w, 3), dtype=np.uint8)
scratch1 = np.empty((h, w), dtype=np.float32)
scratch2 = np.empty((h, w), dtype=np.float32)
 
# Use in-place operations with out= parameter
np.multiply(b_f, coef_high, out=scratch1)
np.multiply(a_f, coef_low, out=scratch2)
np.add(scratch1, scratch2, out=scratch1)
overlay[:, :, 0] = scratch1.astype(np.uint8)

The out= parameter tells NumPy to write directly to an existing array instead of allocating new memory. This turns unpredictable memory spikes into controlled, measurable peaks.

Aggressive Garbage Collection

Python's garbage collector runs when it feels like it. For image processing, that's not good enough. We force collection after every large array is done.

a_gray = convert_to_grayscale(aligned_a)
a_f = a_gray.astype(np.float32)
del a_gray  # Mark for collection
gc.collect()  # Force it now

This pattern appears after every intermediate step. It's verbose but keeps memory predictable.

Decompression Bomb Limits

PIL (Python Imaging Library) has a safety limit preventing "decompression bomb" attacks. The default is 89 million pixels. Construction drawings exceed this routinely.

from PIL import Image
Image.MAX_IMAGE_PIXELS = 250_000_000  # 250 million pixels

This allows processing up to roughly 16,000 x 16,000 pixel images.

How We Solved Alignment

SIFT Feature Matching

Scale-Invariant Feature Transform (SIFT) detects distinctive points in images that remain stable across scale, rotation, and lighting changes. According to OpenCV documentation, SIFT is "the standard for image registration" because features remain consistent even when images are scaled or rotated.

Our approach:

  1. Convert both images to grayscale
  2. Extract up to 10,000 SIFT keypoints from each
  3. Match features using Lowe's ratio test (filters bad matches)
  4. Compute transformation matrix using RANSAC (rejects outliers)
  5. Apply the transformation to align images
def extract_sift_features(
    gray_image: np.ndarray,
    n_features: int = 10_000,
    exclude_margin: float = 0.2,
) -> tuple[tuple, np.ndarray]:
    """Extract SIFT features, excluding margins to avoid title blocks."""

The exclude_margin parameter is key for construction drawings. Title blocks and borders contain text that changes between revisions. Excluding the outer 20% focuses feature detection on the actual drawing content.

Constrained Optimization

Standard RANSAC can produce wildly wrong transformations when matches are ambiguous. A 180-degree rotation might have more "inliers" than the correct 0.3-degree rotation.

We add constraints using scipy's L-BFGS-B optimizer:

  • Scale must be between 0.9x and 1.1x (drawings don't shrink by half)
  • Rotation must be under 5 degrees (drawings aren't flipped upside down)

This turns "find the transformation with most inliers" into "find the transformation with most inliers that's actually plausible."

Fallback to Grid Detection

When SIFT fails (too few features, too much noise), we fall back to grid line detection. Construction drawings often have reference grids with labeled intersections (A1, B2, etc.).

  1. Detect horizontal and vertical lines using Hough transform
  2. Find intersection points
  3. Match intersections between images by relative position
  4. Compute transformation from matched intersections

This works even when the drawing content has changed significantly between revisions.

How We Solved Throughput

Parallel Processing

Each sheet comparison is independent. We use Python's ThreadPoolExecutor to process multiple pairs simultaneously.

from concurrent.futures import ThreadPoolExecutor
from contextvars import copy_context
 
def process_batch(sheet_pairs):
    with ThreadPoolExecutor(max_workers=4) as executor:
        # Each task needs its own context copy for tracking
        futures = []
        for pair in sheet_pairs:
            ctx = copy_context()
            futures.append(executor.submit(ctx.run, compare_pair, pair))
        return [f.result() for f in futures]

The copy_context() call is important. Context variables (used for logging, tracing, and usage tracking) need separate copies per thread. A single context cannot be entered concurrently from multiple threads.

Event-Driven Architecture

The comparison pipeline runs as a series of jobs triggered by Pub/Sub messages:

  1. Convert job: PDF to PNG images
  2. Preprocess job: Extract metadata, detect sheet numbers
  3. Match job: Pair sheets between revisions
  4. Align job: Compute transformation matrices
  5. Overlay job: Generate visual diffs

Each job is stateless and idempotent. If a job fails, it can be retried. If load increases, more workers can be added. This scales horizontally without code changes.

Smart Caching

Alignment parameters rarely change between comparison runs of the same sheet. If sheet A-101 aligned with scale 1.02 and rotation 0.3 degrees last time, it'll probably be similar this time.

We cache alignment results and use them as starting points for optimization. This cuts alignment time by 60-70% on subsequent comparisons.

Results

With these techniques, we can process a 500-sheet comparison in under 30 minutes on a single 8-core machine with 16 GB RAM. That's:

MetricBeforeAfter
Memory per sheet2+ GB (unpredictable)~800 MB (controlled)
Alignment time45-60 sec/pair8-12 sec/pair
Total throughput~10 pairs/hour~1000 pairs/hour
Failure rate15-20% (OOM)Under 1%

The key insight: controlled memory peaks are better than lower but unpredictable peaks. A system that reliably uses 800 MB is more useful than one that usually uses 400 MB but sometimes spikes to 3 GB and crashes.

Key Takeaways

  • Construction drawings at 300 DPI can exceed 200 million pixels. Standard image processing libraries aren't designed for this scale.
  • NumPy's convenience syntax hides memory allocations. Use out= parameters and explicit garbage collection for predictable memory.
  • Automatic alignment requires constraints. Unconstrained optimization produces implausible results on noisy real-world data.
  • Horizontal scaling beats vertical scaling. An event-driven architecture with stateless jobs handles load spikes gracefully.
  • Cache intermediate results. Alignment parameters are stable across runs; recomputing them wastes time.

FAQ

Why not use GPU acceleration?

GPU libraries like CUDA require specific hardware that's not always available in cloud environments. Our CPU-based approach runs anywhere. For teams with GPU infrastructure, the same algorithms can be ported to CUDA for additional speedup.

How do you handle scanned drawings vs. native PDFs?

Scanned drawings have more noise and artifacts. We use adaptive thresholding during preprocessing to clean up scan artifacts, and we're more aggressive with morphological operations to filter small differences. Native PDFs produce cleaner comparisons because the pixel data is generated directly from vectors.

What happens when alignment fails?

Failed alignments are flagged for manual review. The system provides the best-guess transformation with a confidence score. Users can adjust alignment manually if needed, and those corrections feed back into the matching algorithm for future comparisons.

Can this approach work for CAD files (DWG)?

DWG files would need to be exported to PDF first. We process rasterized images, not vector data. The advantage is format independence; the disadvantage is we lose semantic information about what each line represents.


Bedrock's drawing comparison handles 1000+ page sets with automatic alignment. Try it free with 50 comparisons.

Stan Liu
Stan Liu · Co-Founder
·9 min read
Share