Documents QAG
NVIDIA's RAG blueprint, remade deterministic
The NVIDIA RAG reference blueprint is the most-cloned enterprise retrieval starter there is. We forked it into the Enterprise Factory, swapped the embedding layer for Q-Prime, and routed reasoning through the QAG Engine — turning probabilistic RAG into deterministic QAG.
Same pipeline surface. Deterministic core.
We do not replace the NVIDIA RAG blueprint — we re-spine it. The ingestion, retrieval API, and tooling stay recognizable. The encoding layer becomes Q-Prime, and the reasoning layer becomes the QAG Engine.
Classical NVIDIA RAG
Retrieve similar chunks → LLM
- 1.Chunk the corpus. SBERT-style embeddings.
- 2.Top-k cosine similarity on the question.
- 3.Stuff the chunks into the generator's context window.
- 4.Generate. Hope the generator picks the right version, catches the contradiction, notices the missing overlay.
Risk
Conflict, coverage gaps, and scope violations surface only after the generator speaks — if at all.
Documents QAG
Q-Prime → HSC signals → QAG Engine
- 1.Chunk the corpus. Q-Prime encodes polarity, scope, conditions, dependencies.
- 2.Relevance over the hypergraph — plus Conflict, Overlap, Coverage signals.
- 3.The QAG Engine evaluates the signals against the question's scope and time frame.
- 4.Generator receives named structure — "these two sources conflict on scope X between dates Y and Z" — and responds deterministically.
Outcome
Conflict, coverage gaps, and scope violations surface as first-class signals before generation — visible to the auditor.
Six capabilities the base blueprint does not ship
Q-Prime encoding, not SBERT
The base NVIDIA RAG blueprint uses sentence-transformer embeddings. We replace that layer with Q-Prime — so polarity, scope, conditions, and cross-rule dependencies survive the chunking step instead of being averaged away.
HSC signals over similarity scores
Retrieval no longer returns "top-k cosine-similar chunks". It returns the seven HSC signals: Relevance, Conflict, Overlap, Redundancy, Coverage, Coherence, Topology. The generator sees structure, not a similarity vector.
Conflict surfaces before generation
When two policy versions disagree, classical RAG serves both chunks and hopes the generator picks correctly. Documents QAG emits an explicit Conflict signal — the generator is told "these sources disagree on this scope" before it writes a word.
Coverage gaps are named
If the required regulatory overlay is missing from the corpus, QAG names the gap as a Coverage signal. Classical RAG would silently hallucinate to fill it. A compliance team can route the gap to a rule-curation task rather than ship a wrong decision.
Replay-grade provenance
Every chunk that enters the reasoning graph carries its source, version, and validity window. When a decision is replayed later, you see exactly which document version — at which date — produced the output.
Drop-in for the NVIDIA RAG pipeline
The Documents QAG fork preserves the NVIDIA RAG ingestion, retrieval API surface, and tooling. Teams already running the reference blueprint swap encoding and reasoning in-place — and keep their existing pipelines, observability, and scaffolding.
First home: Financial Services
Documents QAG is designed for mortgage overlays, underwriting guidelines, investor compliance corpora, and other policy-heavy workflows. The same blueprint ports into credit, claims, AML/KYC, and analogous regulated workflows.
Questions teams ask before they migrate from NVIDIA RAG.
What is Documents QAG?
How is this different from the classical NVIDIA RAG blueprint?
Is Documents QAG a full rewrite, or a migration?
Which industries is Documents QAG suited for?
How do I evaluate it?
Already running the NVIDIA RAG blueprint?
We will help you migrate the encoding and reasoning layers in place, keep your existing ingestion and observability, and retire the probabilistic core of the pipeline.