Why I rebuilt the PDF pipeline in Python

← lab

Why I rebuilt the PDF pipeline in Python

The neatworld render pipeline drives Adobe InDesign on a Mac Mini via FastAPI and AppleScript, generating PDFs from IDML templates through a Tailscale tunnel. Complex, but it works. Here's why Python instead of Node, and the tradeoffs I'm living with.

The PDF render pipeline is one of those “it shouldn’t work but it does” systems. Adobe InDesign on a Mac Mini, driven by Python via AppleScript, serving IDML-to-PDF conversion through a Tailscale tunnel. Complex, ugly, and absolutely critical to everything I build.

Here’s why I rebuilt it in Python instead of Node, and what I learned.

The Problem

Every pipeline deliverable needs a PDF. Brand guidelines, project specs, weekly reviews, client reports. InDesign is the only tool that handles typography, layout, and print specs correctly.

But InDesign doesn’t have an API. It has ExtendScript (ancient JavaScript) and AppleScript on Mac. Both are terrible, but AppleScript is slightly less terrible.

The pipeline generates IDML templates with content injection, ships them to the Mac Mini, opens them in InDesign, applies the master template for typography, exports as PDF, and serves the result back. One HTTP request, multiple steps, file system coordination, process management.

Why Python Won

Node was the obvious choice. Most of the neatworld platform is JavaScript/TypeScript. The backend is FastAPI (Python) but the mental model was “keep it simple, JavaScript everywhere.”

Wrong.

AppleScript integration is painful in Node. You shell out to osascript, parse stdout, handle errors in bash. It works but you’re debugging three languages at once.

Python’s subprocess handling is better. The subprocess module gives you process control, timeout handling, and error capture that Just Works. When you’re shelling out to AppleScript and ExtendScript, that matters.

File system operations are cleaner. IDML files are zip archives with XML inside. Python’s zipfile and xml.etree handle the content injection without external dependencies. In Node you’d need libraries for everything.

Error handling at the system level. When InDesign crashes (and it will), you need process cleanup, file locking detection, and restart logic. Python’s system-level tooling is more mature.

The Architecture

# Render request comes in
POST /render/pdf
{
  "template": "weekly_brief",
  "data": {...},
  "format": "indesign"
}

# Pipeline:
# 1. Generate IDML with content injection
# 2. Send to Mac Mini via HTTP
# 3. Mac Mini runs AppleScript → ExtendScript
# 4. InDesign opens, applies master template, exports PDF
# 5. Return download URL

The Mac Mini runs a separate FastAPI server (nw-render) because it needs InDesign installed and must run on macOS. Everything else runs on Linux.

Why Tailscale Funnel? The Mac Mini sits on my home network. Tailscale Funnel exposes it as https://shawns-mac-mini.tail32c617.ts.net without port forwarding or dynamic DNS. The Railway backend calls the public Funnel URL. Simple.

The AppleScript Problem

InDesign automation is AppleScript calling ExtendScript. Two levels of abstraction, both with terrible debugging.

tell application "Adobe InDesign 2024"
    set myDocument to open myFilePath
    tell myDocument
        do script myExtendScriptCode language javascript
        export format PDF type to myOutputPath
    end tell
end tell

ExtendScript is JavaScript from 2005. No modern syntax, no proper error handling, no package management. You write it as strings inside AppleScript.

The key insight: Use Python to generate the ExtendScript, not write it by hand. Templates with proper escaping, error checking, and logging.

def render_indesign_pdf(idml_path, output_path):
    script = f"""
    try {{
        var doc = app.open(File("{idml_path}"));
        app.scriptPreferences.measurementUnit = MeasurementUnits.POINTS;
        doc.exportFile(ExportFormat.PDF_TYPE, File("{output_path}"));
        doc.close();
        "SUCCESS";
    }} catch (e) {{
        "ERROR: " + e.toString();
    }}
    """
    
    result = subprocess.run([
        'osascript', '-e', f'tell application "Adobe InDesign 2024" to do script "{script}" language javascript'
    ], capture_output=True, timeout=60)

What Actually Breaks

Memory leaks. InDesign doesn’t clean up properly when driven by automation. The Mac Mini restarts nightly.

File locking. IDML files stay locked if the script crashes. Python handles cleanup better than Node for this specific case.

Unicode in file paths. AppleScript has opinions about character encoding. Python’s pathlib makes this manageable.

Process coordination. Multiple render requests can’t run simultaneously. Python’s asyncio.Lock handles the queue.

The Tradeoffs I’m Living With

Complexity. This pipeline has four components: Railway backend, Tailscale tunnel, Mac Mini service, InDesign automation. Any of them can break.

Platform dependency. Locked to macOS and InDesign. No containerization possible.

Performance. Cold start is 8-12 seconds because InDesign has to launch and load fonts. Warm renders are 2-3 seconds.

Maintenance. InDesign updates break the automation. Adobe’s licensing model creates its own problems.

Would I Do It Again?

Yes, but I’d architect it differently.

Option 1: Web-based layout engine. CSS Grid + Puppeteer for PDF generation. Faster, containerizable, no Adobe dependency. But the typography never quite matches print specs.

Option 2: Headless InDesign on AWS WorkSpaces. Windows virtual desktop with InDesign installed. More reliable than my Mac Mini, still terrible automation model.

Option 3: Pay for a real service. Something like Bannerbear or HTMLCSStoImage. Less control, ongoing costs, design constraints.

I chose control over simplicity. The pipeline produces exactly the PDFs I want, with exactly the typography and spacing I need. That’s worth the complexity for now.

But if I was starting today, I’d probably try Puppeteer first.

The Lesson

Sometimes the “right” technical choice isn’t the obvious one. Node would have been simpler for integration but harder for system-level process management. Python’s maturity for file operations, subprocess handling, and error recovery made the difference.

When you’re building something that bridges multiple systems (web service + desktop application + file system + networking), pick the tool that handles the integration points best. Not the one that fits your mental model.

The PDF pipeline is running in production. It’s ugly, complex, and fragile. But it works, and the PDFs are beautiful.

That’s the tradeoff.