What the system needs to do
A recipe archive looks simple until it has to be usable. The system has to preserve the original front and back of the card, produce structured ingredients and steps, let multiple extraction methods coexist, make room for human correction, and still serve a public recipe browser that feels lightweight. That combination makes the problem look more like a document pipeline than a static recipe site.
System architecture
The Recipe Book now has three connected surfaces: the public recipe browser, the admin dashboard, and the extraction pipeline. All three sit on top of the same FastAPI and SQLite backend, so the website is no longer just presenting files. It is querying and updating a shared recipe system.
Input model: imported recipes are scan sets
Scan roles
- Front scan: the primary source image and the archival baseline.
- Back scan: captures notes, alternate instructions, and handwritten context.
- Front-augmented scan: a utility image used to improve OCR quality when the original is difficult.
A recipe is not treated as a single image. Imported recipes are organized as scan sets, usually containing a front scan, a front-augmented scan, and a back scan. That gives the pipeline more than one representation of the card to work with.
The front and back scans preserve the original object. The front-augmented scan exists mainly to improve OCR reliability. This is a small design choice, but it matters because the system can separately value archival fidelity and extraction quality instead of forcing one compromise image to do everything.
Why the project stores variants
The key architectural decision is that extraction is variant-producing rather than overwrite-producing. Each approach writes a separate structured result for the same slug, along with metadata such as model name, review status, and priority. That means the archive can compare approaches, rerun one model without losing another, and promote a human-reviewed correction without destroying the machine-generated history.
Approach 1
PaddleOCR plus Qwen text-only extraction. This treats OCR as the main signal and hands the model a text bundle.
Approach 2
PaddleOCR plus image-aware Qwen extraction. This keeps the OCR text, but also lets the model see the front and back card images.
Approach 3
PaddleOCR plus OpenAI vision. This is a paid, explicitly approved path used as a higher-capability prototype.
Approach 1: PaddleOCR + Qwen text-only
extraction_variants with approach metadata.
This approach is cheap and local, but it is limited by OCR quality. If OCR misses formatting cues or handwriting, the model can only recover so much.
Why keep it
This is the simplest local baseline and a useful control. It is fast, cheap to rerun, and makes it obvious when OCR quality is the actual bottleneck.
- Best when cards are clean and legible.
- Most vulnerable to OCR mistakes and lost layout cues.
- Useful as a baseline even when a stronger approach exists.
Approach 2: PaddleOCR + image-aware Qwen
This is usually the most interesting local approach because it benefits from OCR and direct visual access at the same time.
What improves here
Approach 2 keeps the OCR bundle but also passes the actual scan images into the model. That helps when section breaks, punctuation, handwriting, or ingredient boundaries survive visually but not textually.
- Better at recovering layout and intent.
- Still local, so it fits well into batch experimentation.
- Most representative of the “OCR plus vision” design pattern in this project.
Approach 3: PaddleOCR + OpenAI vision
This approach exists as a prototype for higher-quality extraction, but the project is designed to prevent accidental API charges during normal batch processing.
Why it is gated
Approach 3 is structurally similar to approach 2, but it swaps in the OpenAI API as the multimodal reasoning layer. Since it can incur cost, it is treated as an explicitly approved tool instead of a default batch step.
- Best used selectively on harder recipes.
- Excluded from default batch runs to prevent accidental spend.
- Useful as a quality ceiling when comparing local approaches.
Review and publication flow
Editorial layer
Once variants exist, the problem changes from extraction to editorial control. The admin side of the system allows reviewers to inspect variants, mark them reviewed, create a dedicated human-reviewed variant, upload display photos, adjust tags, and publish the version that should power the public site.
This is the bridge between the pipeline and the public Recipe Book. Without it, the system would be good at generating drafts but weak at maintaining a trustworthy archive.
The data model
The backend uses FastAPI with SQLite and stores recipes in normalized tables for scans, variants, ingredients, steps, tags, and publication choices. The important thing is not the specific stack so much as the shape of the data: the schema is built around the idea that raw scans, machine outputs, and reviewed public recipes are different states of the same object.
- Scans remain reference material and can be shown in both the admin editor and the public recipe page.
- Variants retain their approach metadata, so the system can explain where a version came from.
- Publishing is separate from generation, which prevents a newly-created draft from automatically becoming the public truth.
- Tags and display photos behave like editorial metadata that can be attached and refined over time.
The website app
Even though the pipeline and admin tools are the most technical parts of the system, the public Recipe Book is where the architecture is validated. Search, tags, scan references, display photos, and the in-page Data Extraction Method selector are what turn extracted data into a usable archive. The back end exists to support reading and cooking workflows, not just to run models.
Future plans
The long-term direction is to let the same recipe API power more than one interface. The current public site is one client. The admin dashboard is another. A Raspberry Pi kitchen display is a natural next client. That only works because the underlying system is structured around reusable recipe data rather than a one-off website.
Seen that way, the Recipe Book is not just “OCR for recipe cards.” It is a small archive platform with multiple ingestion paths, multiple extraction methods, human review, and a public-facing interface that stays connected to the original artifacts.