From black-box RAG to deterministic QAG

TL;DR

Retrieval-augmented generation (RAG) bolts a semantic search on top of a general-purpose LLM. It's probabilistic at every layer — the retrieval is probabilistic, the ranking is probabilistic, and the final answer is probabilistic. In regulated workflows, that means the chain of custody is unrecoverable once an answer is produced.

The QGI stack replaces that pipeline with five explicit layers — Q-Prime (encoding), QAG Engine (reasoning), Qualtron (generation), Quantum Graph Memory (QGM) (memory), and Neural Symbolic Agents (orchestration). Each layer removes a specific class of probabilistic failure. The result is not “AI with an explanation slapped on top.” It's a reasoning pipeline where the decision trail is the output.

The regulated-AI problem in one paragraph

When a mortgage underwriter denies a loan, an adjuster rules on a claim, or a compliance officer declares an AML exception, the output isn't just an answer — it's a defensible record. Someone, somewhere, will read that record in a deposition or a regulatory examination. The reasoning has to survive that reading.

The trouble with retrieval-augmented generation is that it produces an answer the model cannot defend even to itself. Ask why the model returned a particular conclusion and you get a post-hoc rationalization, not the actual chain of reasoning. That gap — between the model's computation and its explanation — is where regulated-AI deployments break.

The five-layer QGI stack exists to close that gap by construction. Each layer is designed so that “the answer” and “the defense of the answer” are the same artifact.

Layer 1 — Q-Prime: encoding that preserves meaning

Standard embedding models compress a document into a dense vector and then lose the structure of what was said. Scope, polarity, negation, conditionality, cross-rule dependency — all of it collapses into a point in high-dimensional space. That's fine for “find similar documents.” It's ruinous for “reason about conditional regulatory clauses.”

Q-Prime encodes the document into a quantum hypergraph instead of a flat vector. A hypergraph edge can connect any number of nodes — so a rule like “if the applicant has less than 24 months of employment AND the property is non-owner-occupied, THEN the income calculation requires additional verification” lives as a single structured object, not as three fragmented phrases.

Q-Prime ships today on HuggingFace under QGI's Commercial Model License. It's the only part of the stack you can pull down and instrument this afternoon.

Layer 2 — QAG Engine: reasoning that can be replayed

The QAG Engine is what replaces the “retrieve, rank, feed-to-LLM” pipeline. Instead of handing a stack of documents to a general-purpose model, QAG runs a reasoning pass over the Q-Prime hypergraph and produces seven interpretable signals that summarize the decision space:

Polarity — does the evidence support or contradict the claim?
Scope — under what conditions does the rule apply?
Coherence — are the retrieved facts internally consistent?
Conflict — which rules or facts disagree, explicitly?
Provenance — where did each fact come from?
Temporality — is the rule/fact still current?
Confidence — how robust is the signal under perturbation?

Those seven signals are the Hilbert-Space Compacting layer. They are computed, stored, and replayable. If a regulator asks, two years after a decision, “why did your AI deny this claim?”, the answer is not a re-run of the model. It's the persisted seven-signal profile at decision time.

RAG

• Retrieval score is probabilistic
• Contradictions are averaged away
• No persisted intermediate state
• Explanation is post-hoc

QAG

• Retrieval is structural, not semantic
• Contradictions surface as explicit signals
• Seven-signal state is persisted
• Decision IS its own explanation

The QAG Engine is in public preview. Early pilots run it alongside existing RAG pipelines for A/B comparison on regulated workloads.

Layer 3 — Qualtron: generation at regulated precision

Once the QAG Engine has produced its seven-signal profile, something still has to generate the written output — the credit memo, the claim determination, the compliance narrative. Qualtron is that generation layer.

Qualtron is a composite architecture of specialized small models that compose into a 4M-token working context. It's designed specifically to sit inside the QAG Engine and respect the seven-signal profile rather than override it — so the text the customer actually reads matches the reasoning the engine actually did.

Qualtron is coming soon. The waitlist is open for customers who want to help shape the first regulated-generation release.

Layer 4 — Quantum Graph Memory (QGM): time as a first-class citizen

Quantum Graph Memory (QGM) is the time-aware substrate that keeps QAG reasoning consistent across sessions, audits, and regulatory replays. In a regulated workflow, the question isn't just what did the system decide? — it's what did the system decide, given what it knew at the time?

QGM preserves, for every fact, decision, and revision, the provenance and the temporal context. When a rule changes in 2027, your 2026 decisions don't silently become wrong — they stay defended against the rule that was in force when they were made. This is the difference between an AI system that is “explainable today” and one that is replayable tomorrow.

QGM is coming soon.

Layer 5 — Neural Symbolic Agents: orchestration with provenance

Neural Symbolic Agents (NSA) is the runtime layer — the execution and orchestration environment that turns the lower four layers into a multi-agent regulated workflow. It brings persistent memory, dependency tracking, and conflict coordination to agent systems that other frameworks treat as stateless chains of LLM calls.

If QAG is the reasoning kernel, NSA is the operating system. It's how you build an adjuster-copilot, an underwriter-copilot, or a compliance-review pipeline that can hand off between agents without losing the chain of custody.

Neural Symbolic Agents is available under enterprise engagement today.

Delivery — Enterprise Blueprints

The five-layer stack is delivered through Enterprise Blueprints — reference integrations, mostly built on top of NVIDIA AI Blueprints, that wire Q-Prime and QAG into regulated base workflows. A mortgage base blueprint, a claims base blueprint, a compliance base blueprint. You don't start from a white-sheet integration; you start from a blueprint that already ships with deterministic reasoning baked in.

The bigger picture

The easy mental model for QGI is that we're building a better RAG. We're not. We are building the decision-grade replacement for RAG in regulated industries — a stack where every layer is designed around the question that actually matters in credit, claims, compliance, and risk scoring: can you defend this output?

The answer the stack is designed to produce — at every layer — is yes, and here is the record.