Capability without custody.
A private-by-construction intelligence stack: ShurIQ's accumulated knowledge graph, injected into an open-weights model the client runs entirely on its own hardware.
Frontier models do ShurIQ's own higher-level work: building the graph, training the scoring rubric, and running the fact-check ledger. What ships to the client runs entirely on the client's own hardware, so confidential research stays there.
What enterprises can't send to a hosted model
Two firms now sit between every enterprise and the frontier of language-model capability, and both default to keeping a copy of what passes through. For a fund running adversarial research on a position it has not yet taken, sending that research to a hosted model is itself the leak.
Enterprises want the capability. They can't accept the retained copy of their research. ShurIQ delivers the capability and keeps the research on the client's own hardware.The market moved underneath the duopoly
Distrust of the OpenAI / Anthropic frontier-API duopoly now comes with dates and penalties.
OpenAI and Anthropic each retain API inputs for 30 days by default; zero-data-retention is reserved for eligible enterprise contracts, not the standard tier most teams run on. On 2026-06-09 Anthropic's 30-day policy for Claude Fable 5 led Microsoft to restrict employee access pending legal review.
Deutsche Bank, Goldman Sachs, JPMorgan, Bank of America, Citigroup, and Wells Fargo have already restricted or banned public ChatGPT inside their walls. A silent ChatGPT leakage flaw ran from 2025-12 to 2026-02 before it was patched, a live demonstration of the custody risk.
The EU AI Act's full high-risk obligations land on 2026-08-02, carrying fines up to €35M or 7% of global turnover. The EU's Digital Operational Resilience Act (DORA) has applied to financial entities since 2025-01-17. GDPR treats training on EU data over non-EEA infrastructure as a cross-border transfer. A firm operating in the EU faces all three at once.
Enterprise LLM spend runs roughly $5.9–8.2B in 2026, headed toward $48.25B by 2034. On-premises and hybrid could absorb 30–40% of regulated US workloads by 2027. Deloitte projects more than 70% of enterprises scaling on-prem or edge inference by 2028. Sovereign AI is now a board-level line item.
What ShurIQ ships, and where the frontier model stays
Claude Opus 4.8 and Sonnet 4.6 do ShurIQ's higher-level work inside ShurIQ. What ships to the client runs entirely on the client's own hardware, and the confidential research stays there.
Runs on client hardware
- Frozen open-weights base model (Apache / MIT family)
- Per-client adapter bank (small LoRA fine-tuning files), compiled from the brand-vertical graph
- Retrieval index over the relevant graph slice (Oxigraph SPARQL)
- Agent CLI / Report Engine, behind a local endpoint
- SBPI scoring the client re-computes as its graph grows
Stays inside ShurIQ (R&D)
- Frontier-model graph construction & rubric training
- The multi-agent fact-check ledger
- The editorial render harness authoring
- The compounding billion-node knowledge graph (full)
- The vertical structural priors that grow it
Open weights closed the frontier gap during 2026. DeepSeek V4 Pro leads the open agentic field, alongside Qwen 3.6, Mistral's Apache-licensed line for the EU, and Llama 4 Scout's long context. A client can now run a model good enough for the work on a machine it owns. What stays scarce is the graph, the rubric, and the vertical priors, all built inside ShurIQ.
License discipline. Client-shipped base models come from the commercial-clean Apache 2.0 and MIT families: Qwen 3.5, DeepSeek V3.2/V4, Mistral, gpt-oss. Gemma 3 is excluded, its custom license restricts financial, legal, and medical generation and reserves remote-shutdown rights. Llama 4 carries caution flags: a cap at 700 million monthly active users and a clause barring use of its outputs to train other models.
The pivot was already a budget line in the seed plan
The production system already runs this budget line, approved before the pivot had a name. The auto-research pipeline uses a locally hosted, quantized Llama-3-70B to extract knowledge-graph nodes at 2:00 AM. Frontier models do the higher-level synthesis. The product just widens that running split.
The MiroFish-Offline configuration proves the full local pattern end to end: Ollama replaces the hosted LLM, and Neo4j Community replaces the hosted memory layer, all in one-command Docker. The Hermes runtime runs a deterministic gate plus a read-only frontier call on the Max subscription at zero marginal cost. The target stack is an open toolchain from day one: mem0 over REST, InfraNodus over MCP, Oxigraph for SPARQL, and Postgres for Aethelgard. The running system already proves feasibility.
Seven layers, all local except the frontier R&D layer
Everything from L0 to L6 runs on the client's premises. The frontier layer, above the line, is the one exception.
| Layer | Component | Open basis | Runs where |
|---|---|---|---|
| L0 | Mac mini M4 Pro · RTX 5090 · DGX Spark · Mac Studio M3 Ultra · RTX PRO 6000 · H100 node | n/a | Client premises (or transitional rented GPU) |
| L1 | Inference runtime, Osaurus (single-analyst) / vLLM (multi-seat) | MIT / Apache | Local |
| L2 | Base model, frozen, Qwen 3.5, DeepSeek V3.2/V4, Mistral, gpt-oss @ Q4_K_M min | Apache 2.0 / MIT | Local |
| L3 | Parametric injection, DMoE LoRA expert bank (client corpus + vertical priors + SBPI rubric) | DMoE method | Local |
| L4 | Retrieval / orchestration, Agents-K1 · GraphRAG over Oxigraph SPARQL · BM25 · mem0 | Agents-K1 / Oxigraph | Local |
| L5 | Knowledge graph, RDF triple store (105,738 → ~1B facts), SBPI ontology | Oxigraph | Local (slice) + ShurIQ (full) |
| L6 | Agent harness, Report Engine / SBPI scorer / fact-check ledger | ShurIQ harness | Local |
| R&D | Frontier, Claude Opus 4.8 / Sonnet 4.6, graph construction, rubric training, higher-level synthesis | n/a | ShurIQ only · never client-facing |
How a billion-node graph becomes something the client runs locally
The injection mechanism is two-layer, because one billion nodes cannot all be parameterized.
Parametric layer, Decoupled Mixture-of-Experts (DMoE, arXiv 2606.14243, 2026-06-12). Each knowledge unit becomes one adapter file: a rank-4 low-rank adapter (LoRA) of roughly 481 KiB, attached to the model's final feed-forward layer while the base model stays frozen. Each one trains in about 10 seconds on a single GPU. Adapters can be added, updated, or deleted without retraining the backbone. That last property is exactly what a nightly-growing graph requires. A client's confidential brand corpus, the vertical's structural priors, and the SBPI rubric all become adapters, shipped as files.
Retrieval layer, Agents-K1 (arXiv 2606.13669, 2026-06-11). A document-to-graph factory adapted from scientific-paper extraction to the SBPI brand ontology, carrying the newest and rarest facts the parametric experts cannot hold. It is the locally-runnable document-to-KG pipeline that feeds the billion-node graph.
Why hybrid. An independent honesty check (arXiv 2510.12668) finds pure parametric injection loses fine-grained facts but holds faithfulness under conflict, while retrieval carries detail; together they beat either alone. The parametric experts carry the structure and the rubric; the retrieval layer carries the facts.
The physical deliverable. A frozen 1–1.5B open-weights model, a per-client adapter bank compiled from the knowledge graph, a keyword-plus-graph retrieval index (BM25 and GraphRAG), and an agent command-line tool, all shipped as files the client runs offline.
The graph computes a Structural Brand Power Index, a 100-point composite across five vertical-weighted dimensions. When it runs on the client's premises over Oxigraph SPARQL, the client's own installation computes and re-computes the SBPI as the graph updates, instead of receiving it from ShurIQ.
The client runs locally on day one and owns the hardware over time
Frontier models stay an internal research cost, never a per-client cost of goods. That holds in every phase.
MVP, Hybrid
Client-facing inference runs the open-weights model and the injected expert bank locally for all confidential research. ShurIQ uses frontier models internally to build the vertical graph and train the rubric.
Transitional
Before the client owns hardware, ShurIQ rents H100 or RTX 5090 capacity to run the same open-weights deliverable on rented hardware. The hardware location is the only change, and the privacy property holds from day one.
Fully Local
The client owns hardware sized to its tier. The expert bank, retrieval index, and harness run entirely on-premises. ShurIQ ships graph and rubric updates as adapter files and index deltas.
Hardware mapped to commercial tiers
| Commercial tier | Reference hardware | Model class |
|---|---|---|
| Self-serve, Market Intelligence SaaS ($750–6K/mo) | Mac mini M4 Pro 64GB (~$2.0–2.6K) · RTX 5090 · DGX Spark 128GB | 70B Q4 dense / 30B-class MoE |
| Bloomberg-Terminal, Company Intel ($7–100K + $8–25K/mo) · primary driver | Mac Studio M3 Ultra 256GB (~$7–9.5K) · vLLM multi-seat | Qwen3-235B-A22B Q4 (~793 vs 41 tok/s batched) |
| Enterprise / Architecture ($75–200K + retainer) | RTX PRO 6000 Blackwell 96GB · H100 node (640GB / 8×) | Dense 70B on-GPU, long context, high concurrency |
Several ways to deliver the same local intelligence stack
A client-owned local runtime is the most sovereign option and the heaviest to support. It is one of seven delivery patterns, and ShurIQ sells the full range, from a hosted private tier to a box the client owns.
| Pattern | What it is | Best fit |
|---|---|---|
| Client-owned local runtime | osaurus/Odysseus/Ollama, client owns the box | Enterprise · max sovereignty |
| ShurIQ sovereign appliance | Sealed box, ShurIQ operates remotely; data plane local | Mid / enterprise · solves support burden |
| Deploy-into-client-VPC (BYOC) | Single-tenant in the client's cloud, their keys | Enterprise · the SaaS standard |
| Confidential computing / TEE | Sealed + attested on shared GPU; buy via Phala/Tinfoil | Bloomberg mid · no client hardware |
| Small / distilled specialist models | SLMs do extraction/scoring/draft on a laptop | All tiers · cheapest, already built |
| Redaction-boundary / split | Secret stays local; only safe content hits a bigger model | Self-serve / standard · pragmatic |
| Frontier-private-tier | Bedrock/Vertex with a zero-data-retention contract, the honest baseline | Entry · lowest sovereignty |
Lead with the cheap, high-sovereignty options, small local models and the redaction boundary, which are mostly already built. Add a confidential-computing partner (a trusted execution environment, TEE) for the mid-tier, and reserve a client-owned box or the client's own cloud (bring-your-own-cloud) for enterprises that demand ownership. The confidential work runs on small local models at every tier. Fully encrypted computation remains a watch-item: even crypto-native players use a sealed hardware chip, not full encryption, for real inference. See the design-space map →
One spend buys both the knowledge graph and the privacy
The seed target is $1.5M, split 40/30/20/10. Most of it goes into the billion-node graph. The graph is the compounding asset that turns ShurIQ into infrastructure, and it is what lets a client run everything privately on its own hardware.
Self-hosting beats frontier APIs above roughly 2–5M tokens/day, and on-premises wins three-year total cost of ownership at 80%-plus GPU utilization, with 60–80% savings at the high end. The transitional crossover is clean. An H100 at roughly $3.60/hr is about $31K/yr, near the capex of one RTX PRO 6000 workstation. So rental wins below about 50% utilization, and owned hardware wins above it. Each client crosses to ownership on its own utilization curve rather than a forced timeline.
For investors: frontier API spend is part of ShurIQ's own R&D budget, building the harness and growing the graph, and amortizes across every client in a vertical rather than landing on each client's bill. The client's own cost is hardware and support, both of which fall as the open-weights stack matures.
The business model canvas
Three views of one model: standard, Lean, and temporal. The temporal view tags each block by maturity: in production, designed, conceptual, or missing.
The central separation (local extract, frontier reason) and the compounding asset (the knowledge graph and the SBPI rubric) are Observation, real and running. The sovereign product, per-client injection, local packaging, model licensing, is Plan / Recipe: designed, feasibility-proven, not yet built. Two things are Missing, and they are the first things the seed money buys: a support model for on-premises installs, and one instrumented proof that confidential research entered the system and every step ran on hardware the client controlled.
The near-term build plan
The near-term track runs 2026-06-16 → 2026-07-31. It builds the Bloomberg-Terminal demo on a Mac Studio M3 Ultra box and the "unplug the cable" offline-brief proof. The first design-partner offer is gated through Limore.
The first clients to prove it
The first test clients should be the firms whose pain is sharpest and most documented: a fund or analyst-team running adversarial research it cannot send to a hosted model, and an EU-exposed firm facing the 2026-08-02 AI Act obligations. Each gets a custody guarantee backed by a running system.
Start with what already runs. The 2:00 AM local extraction, the MiroFish-Offline one-command stack, and the Hermes zero-marginal-cost runtime are already operating. Start each client in the transitional phase (Phase 2) so the privacy holds on day one, without waiting on procurement.