QGI Logo QGI
Active Delivery · Integrations NVIDIA-AI-Blueprints/rag

Documents QAG

NVIDIA's RAG blueprint, remade deterministic

The NVIDIA RAG reference blueprint is the most-cloned enterprise retrieval starter there is. We forked it into the Enterprise Factory, swapped the embedding layer for Q-Prime, and routed reasoning through the QAG Engine — turning probabilistic RAG into deterministic QAG.

The migration

Same pipeline surface. Deterministic core.

We do not replace the NVIDIA RAG blueprint — we re-spine it. The ingestion, retrieval API, and tooling stay recognizable. The encoding layer becomes Q-Prime, and the reasoning layer becomes the QAG Engine.

Classical NVIDIA RAG

Retrieve similar chunks → LLM

  1. 1.Chunk the corpus. SBERT-style embeddings.
  2. 2.Top-k cosine similarity on the question.
  3. 3.Stuff the chunks into the generator's context window.
  4. 4.Generate. Hope the generator picks the right version, catches the contradiction, notices the missing overlay.

Risk

Conflict, coverage gaps, and scope violations surface only after the generator speaks — if at all.

Documents QAG

Q-Prime → HSC signals → QAG Engine

  1. 1.Chunk the corpus. Q-Prime encodes polarity, scope, conditions, dependencies.
  2. 2.Relevance over the hypergraph — plus Conflict, Overlap, Coverage signals.
  3. 3.The QAG Engine evaluates the signals against the question's scope and time frame.
  4. 4.Generator receives named structure — "these two sources conflict on scope X between dates Y and Z" — and responds deterministically.

Outcome

Conflict, coverage gaps, and scope violations surface as first-class signals before generation — visible to the auditor.

What the fork adds

Six capabilities the base blueprint does not ship

Encoding

Q-Prime encoding, not SBERT

The base NVIDIA RAG blueprint uses sentence-transformer embeddings. We replace that layer with Q-Prime — so polarity, scope, conditions, and cross-rule dependencies survive the chunking step instead of being averaged away.

Reasoning

HSC signals over similarity scores

Retrieval no longer returns "top-k cosine-similar chunks". It returns the seven HSC signals: Relevance, Conflict, Overlap, Redundancy, Coverage, Coherence, Topology. The generator sees structure, not a similarity vector.

Deterministic

Conflict surfaces before generation

When two policy versions disagree, classical RAG serves both chunks and hopes the generator picks correctly. Documents QAG emits an explicit Conflict signal — the generator is told "these sources disagree on this scope" before it writes a word.

Audit

Coverage gaps are named

If the required regulatory overlay is missing from the corpus, QAG names the gap as a Coverage signal. Classical RAG would silently hallucinate to fill it. A compliance team can route the gap to a rule-curation task rather than ship a wrong decision.

Provenance

Replay-grade provenance

Every chunk that enters the reasoning graph carries its source, version, and validity window. When a decision is replayed later, you see exactly which document version — at which date — produced the output.

Migration

Drop-in for the NVIDIA RAG pipeline

The Documents QAG fork preserves the NVIDIA RAG ingestion, retrieval API surface, and tooling. Teams already running the reference blueprint swap encoding and reasoning in-place — and keep their existing pipelines, observability, and scaffolding.

Where it ships

First home: Financial Services

Documents QAG is designed for mortgage overlays, underwriting guidelines, investor compliance corpora, and other policy-heavy workflows. The same blueprint ports into credit, claims, AML/KYC, and analogous regulated workflows.

Frequently asked

Questions teams ask before they migrate from NVIDIA RAG.

What is Documents QAG?
Documents QAG is the QGI Enterprise Factory fork of NVIDIA's RAG reference blueprint. It preserves NVIDIA's ingestion, retrieval API surface, and tooling — so teams already running the reference blueprint can migrate in place — while replacing the probabilistic core: the embedding layer becomes Q-Prime, retrieval becomes the seven HSC signals, and generation is routed through the QAG Engine. The result is a drop-in deterministic upgrade for regulated retrieval pipelines.
How is this different from the classical NVIDIA RAG blueprint?
Classical NVIDIA RAG uses sentence-transformer embeddings, retrieves top-k cosine-similar chunks, and hopes the generator composes a defensible answer. Polarity, scope, conditions, and cross-rule dependencies get averaged into a similarity score. Documents QAG replaces retrieval with Quantum-Augmented Generation: Q-Prime preserves structure at encode time, and the QAG Engine surfaces Relevance, Conflict, Overlap, Redundancy, Coverage, Coherence, and Topology as explicit signals before generation. Conflicts between two policy versions become a Conflict signal — not a silent averaging.
Is Documents QAG a full rewrite, or a migration?
A migration. The Documents QAG fork is designed to be a drop-in swap for the encoding and reasoning layers of an existing NVIDIA RAG deployment. Ingestion, the retrieval API surface, observability, and orchestration stay in place. You swap embeddings to Q-Prime and route reasoning through the QAG Engine client — and retire the probabilistic core of the pipeline, not the whole stack.
Which industries is Documents QAG suited for?
Financial services is live today — mortgage compliance, guideline intake, underwriting narratives, investor-overlay review. Insurance, Healthcare, and Government are on the roadmap as waitlisted verticals. The common pattern is regulated document retrieval where a wrong answer is not a UX annoyance but an audit event.
How do I evaluate it?
Start by cloning the public repo from GitHub and reading the migration guide, or contact us to scope a pilot — we'll run a structured evaluation against your existing NVIDIA RAG baseline with a specific regulated workflow in mind. Pilots come with a signed governance contract, model cards, signal-level dashboards, and replay tooling from day one.

Already running the NVIDIA RAG blueprint?

We will help you migrate the encoding and reasoning layers in place, keep your existing ingestion and observability, and retire the probabilistic core of the pipeline.

Partner with QGI