A production-grade RAG pipeline built specifically for science, technology, engineering, and mathematics documents. Combines ChromaDB vector search, dense embeddings, cross-encoder reranking, and Claude/OpenRouter LLMs to deliver precise, grounded answers from your technical corpus.
A layered, modular architecture that separates ingestion, retrieval, reranking, and generation cleanly.
From raw document to grounded answer — every stage is optimised for STEM content.
ingestion.py reads every supported format, extracts text, tables, and images page-by-page
Text split into overlapping chunks preserving equations, tables, and STEM notation
embeddings.py encodes chunks into dense vectors ready for semantic similarity search
vectorstore.py persists all vectors into ChromaDB on disk for fast ANN queries
retriever.py encodes query, runs ANN search, returns Top-K candidate chunks
Cross-encoder scores each candidate against the query and picks the best Top-3
llm.py fills the prompt template with context and sends to Claude / OpenRouter
Every Python file in the backend/ folder has a single clear responsibility.
Entry point for all document processing. Handles PDF, DOCX, PPTX, XLSX, TXT, CSV, HTML. Extracts text per page, identifies table regions, and routes images to the extracted_images/ directory. Produces a uniform list of content objects for the embedding stage.
Wraps the chosen embedding model and exposes a clean embed(texts) interface. Handles batching,
normalisation, and any model-specific quirks. Used by both the indexing pipeline and the query-time
retrieval path so both use identical vector representations.
Thin abstraction layer over ChromaDB. Manages collection creation, bulk upsert of (vector, metadata, id) triples, and similarity queries. Loads the persistent chroma_db/ directory on startup so no re-indexing is needed between sessions.
Orchestrates the two-stage retrieval: first a fast approximate-nearest-neighbour search over ChromaDB returns the top-K candidates, then the cross-encoder reranker rescores them and returns only the highest-confidence Top-3 chunks for context injection.
Manages LLM inference. Primary path uses the Claude API (key from claude api key.txt). Falls back to OpenRouter for alternative models (configured in openrouter_related.txt). Takes the ranked context, fills prompt.txt, and streams or returns the final answer.
Top-level application orchestrator. Wires together all backend modules, handles the request/response lifecycle, and exposes the query interface. Reads config at startup to determine model settings, chunk parameters, and retrieval depth.
Why reranking matters — and why it's critical for STEM content specifically.
The embedding model (bi-encoder) independently encodes the query and all document chunks into dense vectors once during indexing. At query time, ChromaDB's approximate nearest-neighbour search finds the top-K chunks in milliseconds. This is fast but approximate — some genuinely relevant chunks may score lower than less-relevant ones because bi-encoders don't model the interaction between query and passage directly.
The cross-encoder sees the full (query, passage) pair together and computes a deeper relevance score. It's too slow to run over the entire corpus, but on just 10–20 candidates from Stage 1 it's extremely fast. In STEM contexts, this is critical: a passage about "fluid dynamics pressure" may look geometrically similar to "blood pressure dynamics" — the cross-encoder correctly distinguishes them.
Semantic embedding similarity alone struggles with:
| Challenge | Example | Reranker Fix |
|---|---|---|
| Symbol ambiguity | "σ" in stress vs. statistics | Context from surrounding text resolves domain |
| Formula lookups | "Navier-Stokes derivation" | Cross-encoder scores equation-heavy passages higher |
| Units & notation | N/m² vs. Pa vs. kPa | Joint query-passage model handles unit equivalence |
| Near-duplicate chunks | Same theorem, different notation | Reranker picks the clearest, most complete version |
Carefully chosen for local-first, privacy-preserving, STEM-optimised operation.
Core Stack
Document & Utility