Writing

Building a Searchable Recipe Archive from Scanned Cards

A recipe archive is a useful example of a document-ingestion system: scanned cards come in, multiple extraction approaches compete to produce structured recipes, human review resolves the edge cases, and a public Recipe Book serves the best available version through a shared API.

This post was written with AI assistance.

Skills In This Project

What the system needs to do

A recipe archive looks simple until it has to be usable. The system has to preserve the original front and back of the card, produce structured ingredients and steps, let multiple extraction methods coexist, make room for human correction, and still serve a public recipe browser that feels lightweight. That combination makes the problem look more like a document pipeline than a static recipe site.

3 Extraction approaches, each stored as a separate variant rather than overwriting the last result.
1 Shared FastAPI and SQLite layer powering the public Recipe Book, admin tools, and future clients.
2-3 Scans per imported recipe: front required, back optional, augmented front available for OCR help.
Review iterations, because machine output is treated as draft material instead of final truth.

System architecture

The Recipe Book now has three connected surfaces: the public recipe browser, the admin dashboard, and the extraction pipeline. All three sit on top of the same FastAPI and SQLite backend, so the website is no longer just presenting files. It is querying and updating a shared recipe system.

Input model: imported recipes are scan sets

Scan roles

  • Front scan: the primary source image and the archival baseline.
  • Back scan: captures notes, alternate instructions, and handwritten context.
  • Front-augmented scan: a utility image used to improve OCR quality when the original is difficult.

A recipe is not treated as a single image. Imported recipes are organized as scan sets, usually containing a front scan, a front-augmented scan, and a back scan. That gives the pipeline more than one representation of the card to work with.

The front and back scans preserve the original object. The front-augmented scan exists mainly to improve OCR reliability. This is a small design choice, but it matters because the system can separately value archival fidelity and extraction quality instead of forcing one compromise image to do everything.

Why the project stores variants

The key architectural decision is that extraction is variant-producing rather than overwrite-producing. Each approach writes a separate structured result for the same slug, along with metadata such as model name, review status, and priority. That means the archive can compare approaches, rerun one model without losing another, and promote a human-reviewed correction without destroying the machine-generated history.

Approach 1

PaddleOCR plus Qwen text-only extraction. This treats OCR as the main signal and hands the model a text bundle.

Approach 2

PaddleOCR plus image-aware Qwen extraction. This keeps the OCR text, but also lets the model see the front and back card images.

Approach 3

PaddleOCR plus OpenAI vision. This is a paid, explicitly approved path used as a higher-capability prototype.

Approach 1: PaddleOCR + Qwen text-only

Approach 1 flow
Front / augmented / back scans Three potential text sources from the same physical card.
PaddleOCR Run OCR independently on each scan and collect the extracted text.
Prompt builder Merge OCR output into a structured prompt with schema instructions and allowed tags.
Qwen text-only Infer title, ingredients, steps, notes, and tags without image input.
Extraction variant Store structured JSON in extraction_variants with approach metadata.

This approach is cheap and local, but it is limited by OCR quality. If OCR misses formatting cues or handwriting, the model can only recover so much.

Why keep it

This is the simplest local baseline and a useful control. It is fast, cheap to rerun, and makes it obvious when OCR quality is the actual bottleneck.

  • Best when cards are clean and legible.
  • Most vulnerable to OCR mistakes and lost layout cues.
  • Useful as a baseline even when a stronger approach exists.

Approach 2: PaddleOCR + image-aware Qwen

Approach 2 flow
Front / augmented / back scans The same scan set used by approach 1.
PaddleOCR Create text transcripts for all available scans.
Dual-context prompt Send OCR text plus selected scan images so the model can reconcile both sources.
Image-aware Qwen Use the text for structure and the images for disambiguation, layout cues, and missed words.
Extraction variant Persist a second candidate with its own review status and priority.

This is usually the most interesting local approach because it benefits from OCR and direct visual access at the same time.

What improves here

Approach 2 keeps the OCR bundle but also passes the actual scan images into the model. That helps when section breaks, punctuation, handwriting, or ingredient boundaries survive visually but not textually.

  • Better at recovering layout and intent.
  • Still local, so it fits well into batch experimentation.
  • Most representative of the “OCR plus vision” design pattern in this project.

Approach 3: PaddleOCR + OpenAI vision

Approach 3 flow
Front / back scans The augmented front can be skipped here to reduce paid-image usage.
PaddleOCR Preserve the OCR transcript as a supporting signal.
Responses API request Send images, OCR text, schema constraints, and tag vocabulary to a multimodal model.
OpenAI vision model Produce structured recipe JSON with stronger multimodal reasoning than the local baseline.
Paid extraction variant Stored like the other variants, but only after an explicitly approved run.

This approach exists as a prototype for higher-quality extraction, but the project is designed to prevent accidental API charges during normal batch processing.

Why it is gated

Approach 3 is structurally similar to approach 2, but it swaps in the OpenAI API as the multimodal reasoning layer. Since it can incur cost, it is treated as an explicitly approved tool instead of a default batch step.

  • Best used selectively on harder recipes.
  • Excluded from default batch runs to prevent accidental spend.
  • Useful as a quality ceiling when comparing local approaches.

Review and publication flow

Editorial layer

Once variants exist, the problem changes from extraction to editorial control. The admin side of the system allows reviewers to inspect variants, mark them reviewed, create a dedicated human-reviewed variant, upload display photos, adjust tags, and publish the version that should power the public site.

This is the bridge between the pipeline and the public Recipe Book. Without it, the system would be good at generating drafts but weak at maintaining a trustworthy archive.

The data model

The backend uses FastAPI with SQLite and stores recipes in normalized tables for scans, variants, ingredients, steps, tags, and publication choices. The important thing is not the specific stack so much as the shape of the data: the schema is built around the idea that raw scans, machine outputs, and reviewed public recipes are different states of the same object.

  • Scans remain reference material and can be shown in both the admin editor and the public recipe page.
  • Variants retain their approach metadata, so the system can explain where a version came from.
  • Publishing is separate from generation, which prevents a newly-created draft from automatically becoming the public truth.
  • Tags and display photos behave like editorial metadata that can be attached and refined over time.

The website app

Even though the pipeline and admin tools are the most technical parts of the system, the public Recipe Book is where the architecture is validated. Search, tags, scan references, display photos, and the in-page Data Extraction Method selector are what turn extracted data into a usable archive. The back end exists to support reading and cooking workflows, not just to run models.

Future plans

The long-term direction is to let the same recipe API power more than one interface. The current public site is one client. The admin dashboard is another. A Raspberry Pi kitchen display is a natural next client. That only works because the underlying system is structured around reusable recipe data rather than a one-off website.

Seen that way, the Recipe Book is not just “OCR for recipe cards.” It is a small archive platform with multiple ingestion paths, multiple extraction methods, human review, and a public-facing interface that stays connected to the original artifacts.