Viewport 04 · How injection works

How a billion-node graph runs locally

The injection is two-layer: a parametric expert bank holds the structure and the rubric, while retrieval carries the billion-node detail. Here is how the graph reaches a model the client runs offline.

KiB per LoRA adapter (one knowledge unit)

to train one expert (single GPU)

add / update / delete, no backbone retrain

node target the retrieval layer carries

Why hybrid: pure parametric injection holds faithfulness under conflict but loses fine-grained facts; retrieval carries detail; together they beat either alone (arXiv 2510.12668). DMoE (arXiv 2606.14243) makes each knowledge unit a hot-swappable adapter, exactly what a nightly-growing graph needs. Agents-K1 (arXiv 2606.13669) is the document-to-graph factory feeding it.