Capability without custody.
A private-by-construction intelligence stack: ShurIQ's accumulated knowledge graph, injected into an open-weights model the client runs entirely on its own hardware.
Frontier models do ShurIQ's own higher-level work, building the graph, training the scoring rubric, running the fact-check ledger. What ships to the client runs totally locally. The confidential research never reaches a hosted API.
The custody trap
Two firms now sit between every enterprise and the frontier of language-model capability, and both default to keeping a copy of what passes through. For a fund running adversarial research on a position it has not yet taken, the act of sending that research to a hosted model is itself the exposure.
The capability is wanted. The custody is the problem. ShurIQ sells the capability and removes the custody.The market moved underneath the duopoly
Distrust of the OpenAI / Anthropic frontier-API duopoly is now operational, and it carries dates and penalties.
OpenAI and Anthropic each retain API inputs for 30 days by default; zero-data-retention is reserved for eligible enterprise contracts, not the standard tier most teams run on. On 2026-06-09 Anthropic's 30-day policy for Claude Fable 5 led Microsoft to restrict employee access pending legal review.
Deutsche Bank, Goldman Sachs, JPMorgan, Bank of America, Citigroup, and Wells Fargo have already restricted or banned public ChatGPT inside their walls. A silent ChatGPT leakage flaw ran from December 2025 to February 2026 before it was patched, a live demonstration of the custody risk.
The regulatory layer hardens the same point. The EU AI Act's full high-risk obligations land on 2026-08-02, carrying fines up to €35M or 7% of global turnover. DORA has applied to financial entities since 2025-01-17. GDPR treats training on EU data over non-EEA infrastructure as a cross-border transfer. A firm operating in the EU faces all three at once.
The market reads this as structural. Enterprise LLM spend runs roughly $5.9–8.2B in 2026 toward $48.25B by 2034. On-premises and hybrid could absorb 30–40% of regulated US workloads by 2027, and Deloitte projects more than 70% of enterprises scaling on-prem or edge inference by 2028. Sovereign AI is now a board-level line item.
What ShurIQ ships, and where the frontier model stays
The line is precise. Frontier models, Claude Opus 4.8, Sonnet 4.6, do ShurIQ's own higher-level work, inside ShurIQ. What ships to the client runs entirely on the client's own hardware, and the client's confidential research never reaches a hosted API.
Runs on client hardware
- Frozen open-weights base model (Apache / MIT family)
- Per-client LoRA expert bank, compiled from the brand-vertical graph
- Retrieval index over the relevant graph slice (Oxigraph SPARQL)
- Agent CLI / Report Engine, behind a local endpoint
- SBPI scoring the client re-computes as its graph grows
Stays inside ShurIQ (R&D)
- Frontier-model graph construction & rubric training
- The multi-agent fact-check ledger
- The editorial render harness authoring
- The compounding billion-node knowledge graph (full)
- The vertical structural priors that grow it
Open weights closed the frontier gap during 2026, DeepSeek V4 Pro leading the open agentic field, Qwen 3.6, Mistral's Apache-licensed line for the EU, Llama 4 Scout's long context. The precondition for a local-run wedge is met: a client can run a model good enough for the work on a machine it owns. ShurIQ holds the part that stays scarce, the graph, the rubric, the vertical priors, and the frontier-model R&D that grows them.
License discipline. Client-shipped base models come from the commercial-clean Apache 2.0 and MIT families: Qwen 3.5, DeepSeek V3.2/V4, Mistral, gpt-oss. Gemma 3 is excluded, its custom license restricts financial, legal, and medical generation and reserves remote-shutdown rights. Llama 4 carries caution flags: a 700M-MAU cap and a clause barring use of its outputs to train other models.
The pivot was written into the plan on day one
This is the literal execution of a budget line approved before the pivot had a name. The production system already runs the central separation: the auto-research pipeline uses a locally hosted, quantized Llama-3-70B to extract knowledge-graph nodes at 2:00 AM while frontier models do the higher-level synthesis. That one separation, local model on the mechanical extraction, frontier model on the reasoning, is the template for the entire sovereign product.
The MiroFish-Offline configuration proves the full local pattern end to end: Ollama replaces the hosted LLM, Neo4j Community replaces the hosted memory layer, one-command Docker. The Hermes runtime runs a deterministic gate plus a read-only frontier call on the Max subscription at zero marginal cost. The target stack, mem0 over REST, InfraNodus over MCP, Oxigraph for SPARQL, Postgres for Aethelgard, is an open toolchain from day one. The feasibility question is already answered by the running system.
Seven layers, one boundary
Everything from L0 to L6 runs on the client's premises. The frontier layer is internal R&D only, and never client-facing.
| Layer | Component | Open basis | Runs where |
|---|---|---|---|
| L0 | Mac mini M4 Pro · RTX 5090 · DGX Spark · Mac Studio M3 Ultra · RTX PRO 6000 · H100 node | , | Client premises (or transitional rented GPU) |
| L1 | Inference runtime, Osaurus (single-analyst) / vLLM (multi-seat) | MIT / Apache | Local |
| L2 | Base model, frozen, Qwen 3.5, DeepSeek V3.2/V4, Mistral, gpt-oss @ Q4_K_M min | Apache 2.0 / MIT | Local |
| L3 | Parametric injection, DMoE LoRA expert bank (client corpus + vertical priors + SBPI rubric) | DMoE method | Local |
| L4 | Retrieval / orchestration, Agents-K1 · GraphRAG over Oxigraph SPARQL · BM25 · mem0 | Agents-K1 / Oxigraph | Local |
| L5 | Knowledge graph, RDF triple store (105,738 → ~1B facts), SBPI ontology | Oxigraph / InfraNodus | Local (slice) + ShurIQ (full) |
| L6 | Agent harness, Report Engine / SBPI scorer / fact-check ledger | ShurIQ harness | Local |
| R&D | Frontier, Claude Opus 4.8 / Sonnet 4.6, graph construction, rubric training, higher-level synthesis | , | ShurIQ only · never client-facing |
How a billion-node graph becomes something the client runs locally
The injection mechanism is two-layer, because one billion nodes cannot all be parameterized.
Parametric layer, Decoupled Mixture-of-Experts (DMoE, arXiv 2606.14243, 2026-06-12). Each knowledge unit becomes one LoRA adapter, rank 4, roughly 481 KiB, attached to the final-layer feed-forward network with the base model frozen, trainable in about 10 seconds on a single GPU, and addable, updatable, or deletable without retraining the backbone. That last property is exactly what a nightly-growing graph requires. A client's confidential brand corpus, the vertical's structural priors, and the SBPI rubric all become adapters, shipped as files.
Retrieval layer, Agents-K1 (arXiv 2606.13669, 2026-06-11). A document-to-graph factory adapted from scientific-paper extraction to the SBPI brand ontology, carrying the freshness and long tail the parametric experts cannot hold. It is the locally-runnable document-to-KG pipeline that feeds the billion-node graph.
Why hybrid. An independent honesty check (arXiv 2510.12668) finds pure parametric injection loses fine-grained facts but holds faithfulness under conflict, while retrieval carries detail; together they beat either alone. The parametric experts carry the structure and the rubric; the retrieval layer carries the facts.
The physical deliverable. A frozen 1–1.5B open-weights model, a per-client LoRA expert bank compiled from the knowledge graph, a BM25 / GraphRAG index, and an agent CLI. Frontier models are used only internally to build the artifact. The client runs it offline.
The graph computes a Structural Brand Power Index, a 100-point composite across five vertical-weighted dimensions. When it runs on the client's premises over Oxigraph SPARQL, the SBPI stops being a number ShurIQ reports and becomes a number the client's own installation computes and re-computes as the graph updates.
Local from day one, owned over time
The boundary that never moves: frontier models are an internal R&D line, never client-facing COGS.
MVP, Hybrid
Client-facing inference runs the open-weights model and the injected expert bank locally for all confidential research. ShurIQ uses frontier models internally to build the vertical graph and train the rubric.
Transitional
Before the client owns hardware, ShurIQ rents H100 or RTX 5090 capacity to run the open-weights deliverable, not a frontier API. Only the hardware location moves; the privacy property holds on day one.
Fully Local
The client owns hardware sized to its tier. The expert bank, retrieval index, and harness run entirely on-premises. ShurIQ ships graph and rubric updates as adapter files and index deltas.
Hardware mapped to commercial tiers
| Commercial tier | Reference hardware | Model class |
|---|---|---|
| Self-serve, Market Intelligence SaaS ($750–6K/mo) | Mac mini M4 Pro 64GB (~$2.0–2.6K) · RTX 5090 · DGX Spark 128GB | 70B Q4 dense / 30B-class MoE |
| Bloomberg-Terminal, Company Intel ($7–100K + $8–25K/mo) · primary driver | Mac Studio M3 Ultra 256GB (~$7–9.5K) · vLLM multi-seat | Qwen3-235B-A22B Q4 (~793 vs 41 tok/s batched) |
| Enterprise / Architecture ($75–200K + retainer) | RTX PRO 6000 Blackwell 96GB · H100 node (640GB / 8×) | Dense 70B on-GPU, long context, high concurrency |
One outcome, several ways to ship it
A client-owned local runtime is one corner of the space: maximum sovereignty, maximum burden. Sovereignty is a position on a curve, and ShurIQ sells the whole curve.
| Pattern | What it is | Best fit |
|---|---|---|
| Client-owned local runtime | osaurus/Odysseus/Ollama, client owns the box | Enterprise · max sovereignty |
| ShurIQ sovereign appliance | Sealed box, ShurIQ operates remotely; data plane local | Mid / enterprise · solves support burden |
| Deploy-into-client-VPC (BYOC) | Single-tenant in the client's cloud, their keys | Enterprise · the SaaS standard |
| Confidential computing / TEE | Sealed + attested on shared GPU; buy via Phala/Tinfoil | Bloomberg mid · no client hardware |
| Small / distilled specialist models | SLMs do extraction/scoring/draft on a laptop | All tiers · cheapest, already built |
| Redaction-boundary / split | Secret stays local; only safe content hits a bigger model | Self-serve / standard · pragmatic |
| Frontier-private-tier | Bedrock/Vertex + ZDR, the honest baseline | Entry · lowest sovereignty |
The recommendation: lead with the cheap, high-sovereignty corner, small local models and the redaction boundary, mostly already built, add a TEE partner for the mid-tier, and reserve client-owned or BYOC for enterprises that demand ownership. Across every tier, small local models do the confidential work; frontier models stay internal R&D. FHE/MPC remains a watch-item: even crypto-native players use a hardware enclave, not homomorphic encryption, for real inference. See the design-space map →
The moat and the privacy wedge are the same spend
The seed target is $1.5M, split 40/30/20/10. Money put into the billion-node graph buys both the compounding asset that flips the business to infrastructure and the thing that lets ShurIQ sell a private-by-construction product no hosted competitor can match.
Self-hosting beats frontier APIs above roughly 2–5M tokens/day, and on-premises wins three-year total cost of ownership at 80%-plus GPU utilization, with 60–80% savings at the high end. The transitional crossover is clean: an H100 at roughly $3.60/hr is about $31K/yr, near one workstation RTX PRO 6000 in capex, so rental wins below about 50% utilization, owned hardware above it. Each client crosses to ownership on its own utilization curve rather than a forced timeline.
The line that matters for investors: frontier API spend is a cost of ShurIQ's own R&D, building the harness and growing the graph, not a cost of goods sold per client. It amortizes across every client in a vertical. The client COGS is hardware and support, both of which fall as the open-weights stack matures.
The business model canvas
Three views of one model: standard, Lean, and temporal. The temporal view tags each block by maturity: in production, designed, conceptual, or missing.
The central separation (local extract / frontier reason) and the moat (the knowledge graph and the SBPI rubric) are Observation, real and running. The sovereign product, per-client injection, local packaging, model licensing, is Plan / Recipe: designed, feasibility-proven, not yet built. Two things are Missing, and they are the first things the seed money buys: a support model for on-premises installs, and one instrumented proof that confidential research entered the system and nothing crossed a boundary the client didn't control.
A planning session per aspect
The near-term track (2026-06-16 → 2026-07-31) builds the Bloomberg-Terminal demo on a Mac Studio M3 Ultra reference box, with the "unplug the cable" offline-brief proof, and the first design-partner offer gated through Limore.
Where the argument becomes a record
The first test clients should be the firms whose pain is sharpest and most documented: a fund or analyst-team running adversarial research it cannot send to a hosted model, and an EU-exposed firm facing the 2026-08-02 AI Act obligations. Each gets a custody guarantee backed by a running system.
Lead with the running proof. The 2:00 AM local extraction, the MiroFish-Offline one-command stack, and the Hermes zero-marginal-cost runtime are already operating. Start each client in the transitional phase, ShurIQ-rented GPU running the open-weights deliverable, so the privacy property holds on day one without waiting on procurement.