[

]

[

]

OLLM vs API Aggregators: What’s the Difference?

Marina Romero

May 20, 2026

TL;DR

An LLM API aggregator like OpenRouter gives you one API key, a catalog of 200+ models, and consolidated billing. A confidential AI gateway like OLLM routes those same requests through hardware-isolated execution environments and hands back a cryptographic receipt proving your data was never exposed.
When an aggregator advertises ZDR, the provider is promising not to log your prompts. When a confidential gateway enforces it, the inference runs inside an Intel TDX confidential VM with NVIDIA GPU attestation, and the host OS itself cannot read the memory. One depends on trust; the other makes trust irrelevant.
If you're benchmarking models or shipping a prototype with no PII in the prompt, an aggregator is the right call. Once patient records, financial data, or legal documents are included in the prompt, the aggregator's convenience layer becomes a compliance liability.
Per OLLM's published architecture, the router authenticates requests, validates model availability, and coordinates attestation. It doesn't inspect prompt or response data, doesn't substitute models, and doesn't perform inference. That architectural constraint is what makes per-request verification possible.
OLLM exposes an OpenAI-compatible API. Existing applications using any OpenAI SDK connect with a base URL change and a new API key. The engineering cost is low; the compliance posture change is not.
After every inference on OLLM, the TEE produces attestation artifacts that prove the specified model ran inside a verified hardware boundary. Auditors don't get a policy document; they get a cryptographic receipt that they can independently verify with the OLLM scanner every single time.

Introduction

Gartner predicted in March 2024 that over 30% of the increase in API demand by 2026 would come directly from LLMs and GenAI tools, and, separately, that more than 80% of enterprises will have deployed GenAI APIs or applications by 2026, up from less than 5% in 2023. Both predictions are landing on schedule. What neither forecast accounted for is the infrastructure panic that follows when a team that started with a quick API key discovers, six months into production, that their routing layer was never built to answer the question an auditor is about to ask.

A thread on r/LLMDevs titled "Best LLM Gateway" collected hundreds of developer responses debating OpenRouter, LiteLLM, and custom proxies. The clearest pattern is this: developers who started with aggregators for convenience hit a wall the moment compliance requirements came into play. The conversation keeps cycling back to the same gap: the tool handles routing beautifully, but nobody can tell the security team what happened to the prompt after it left the aggregator's network. The confusion is partly a naming problem; "Gateway" now covers everything from a hosted routing proxy to a hardware-secured execution environment with per-request cryptographic attestation. Treating them as interchangeable is how teams end up shipping production AI infrastructure that passes a developer demo and fails a SOC 2 review.

OLLM was built specifically for the teams that ran into that ceiling. Its architecture separates TEE-backed model execution from the routing layer entirely: TEE models run inside hardware-backed Trusted Execution Environments with hardware-enforced memory isolation from the host OS, hypervisor, and infrastructure, with encryption of data in use via Intel TDX confidential VMs and NVIDIA H100 GPU attestation, and a cryptographic attestation receipt generated per request. Over the next several sections, we'll break down exactly what an API aggregator controls and what it doesn't, how a confidential AI gateway changes the execution trust model at the hardware level, how the request lifecycle differs between the two architectures step by step, and when each is the right call for the workload you're actually running.

Why Developers Keep Confusing These Two Infrastructure Layers

Every vendor in the LLM infrastructure space uses "gateway" in their copy. OpenRouter calls itself a gateway. LiteLLM calls itself a gateway. OLLM calls itself a gateway. None of them are wrong, but they're describing entirely different layers of a system, and that loose terminology has a cost.

The Naming Problem That Obscures Real Architectural Tradeoffs

The confusion starts with how these tools market themselves. A quick scan of the LLM tooling space reveals that "gateway" has come to mean any layer between an application and a model provider, regardless of whether it performs routing, policy enforcement, observability, or hardware-level execution isolation. When every proxy, aggregator, and confidential execution layer shares the same label, developers end up picking infrastructure based on feature checklists rather than understanding what each architecture actually guarantees.

Teams choose an aggregator because it's easy to set up and offers hundreds of models through a single key. Months later, when the application processes real user data and a security review asks how prompts are handled, the answer is "the aggregator relays them to the upstream provider, and we trust the provider's ZDR policy." That's not an architectural answer; it's a contract answer. The gap only becomes visible when the stakes are high enough to care about the difference.

What Decision Engineers Are Actually Making When They Pick a Tool

There are two independent axes to consider when evaluating LLM infrastructure. The first is access breadth: how many providers and models can the tool reach, how is billing consolidated, and how quickly can you switch from GPT-4o to Claude Sonnet without touching application code? The second is execution trust: can you verify, per request and cryptographically, that your data was processed inside a trusted boundary?

For every TEE AI model inference request processed through OLLM, privacy guarantees are request-scoped, not platform-scoped. Each response stands on its own, with its own proof.

What is proven cryptographically

Using hardware attestation, OLLM enables customers to verify that:

The inference ran inside a genuine TEE
The execution environment was not tampered with
The environment matched the expected security measurements
The response was generated within that trusted boundary
These proofs are anchored in hardware root-of-trust mechanisms, not software assertions.

API aggregators are built almost entirely around the first axis. Confidential gateways like OLLM are built around the second. Most tools occupy exactly one of these axes, and the mistake is assuming that strong performance on one implies anything about the other. A tool with 600 models and excellent routing logic tells you nothing about what happens to your prompt once it leaves the aggregator's network and reaches the provider.

What an LLM API Aggregator Actually Does at the System Level

Understanding an aggregator at the system level requires looking past the developer experience and at the data path. The convenience is real, but so are the structural boundaries of what an aggregator can and cannot control.

Routing, Model Catalog, and Unified Billing as Core Functions

An API aggregator's primary job is normalization. Different LLM providers expose different API shapes, authentication schemes, and response structures. OpenAI uses one schema for chat completions; Anthropic uses another; models on HuggingFace add their own conventions for tool calling and context formatting. An aggregator absorbs all these differences and presents a single OpenAI-compatible endpoint to the calling application.

Billing consolidation is the other core function, i.e., instead of managing separate API keys, quota limits, and invoices for Anthropic, Google, Mistral, and a handful of open-source hosts, a team pays one aggregator and draws from a credit pool. For teams running experiments across many models, that operational simplicity has genuine value. OpenRouter, one of the most widely used aggregators, handles this for 200+ models and routes requests to provider infrastructure without requiring the caller to manage any of the provider-level credentials.

What Aggregators Do Not Own: The Execution Environment

The important architectural constraint is what an aggregator does not control. When a request leaves the aggregator's network and reaches the upstream provider, inference runs on the provider's own infrastructure. The aggregator has no visibility into, and no control over, how the provider handles the prompt during execution. The aggregator can contractually promise not to log data in transit. It cannot guarantee what happens inside the provider's inference stack.

ZDR flags on aggregators are policy instruments. They tell you the aggregator isn't retaining your data on its own systems. They say nothing about what happens at the provider's inference layer, what the provider logs, or whether the compute environment is isolated at the hardware level. For most workloads, that distinction doesn't matter. For workloads in regulated industries, it's the only distinction that matters.

Where Aggregators Break Down Under Compliance and Audit Requirements

Regulated environments, financial services under SOC 2, healthcare under HIPAA, and legal work involving privileged communications require more than policy assertions. An auditor asking, "Prove that this prompt was processed in a compliant environment," needs a verifiable answer. An aggregator produces documentation: terms of service, ZDR agreements, and provider compliance certifications. None of those is wrong, but none of them is proof.

When teams build on a hosted aggregator, they route through third-party-controlled infrastructure, which raises compliance concerns for some teams and adds latency for everyone. That's not a hypothetical risk; it's an architectural fact. The aggregator sits in the data path, and the application has no independent mechanism to verify what occurred during execution. Compliance teams working under GDPR Article 32 or HIPAA's technical safeguards requirements increasingly treat that gap as a blocker for production deployment.

How a Confidential AI Gateway Changes the Execution Trust Model

Where an aggregator abstracts provider differences, a confidential gateway changes what's provable about execution. The architecture shift is not at the API surface; it's in what happens between the request leaving the client and the response returning.

Trusted Execution Environments as the Foundation of Verifiable Privacy

A Trusted Execution Environment is a hardware-isolated region of memory and compute resources where code runs on the processor, with encryption enforced. The host operating system, the hypervisor, and any other process running on the same machine cannot read the contents of a TEE's memory, even with root access. The hardware enforces the isolation; no software policy can override it.

In the context of LLM inference, a TEE means that the model processes the prompt inside an enclave where the provider's own infrastructure has no visibility into the plaintext data. Intel TDX creates confidential virtual machines at this level. NVIDIA's confidential compute extends the same guarantees to GPU workloads, which is where actual inference runs for large models. OLLM's published architecture states explicitly that all inference for TEE AI models runs inside hardware-backed TEEs provided by OLLM's supported providers, with hardware-enforced memory isolation protecting data from the host OS, hypervisor, and infrastructure access.

Per-Request Attestation: What It Produces and How to Verify It

Attestation is the mechanism that makes TEE-backed inference verifiable to someone outside the execution environment. After inference runs inside a TEE, the hardware generates a cryptographic artifact that contains measurements of the execution environment: which model ran, which TEE platform was used, and whether the environment matched expected values. Anyone with the right public key can verify that artifact, without relying on the provider's word.

After sending a request, you can inspect it directly in the OLLM dashboard.

From the Messages view, you can:

See request status (Success, Failed, Pending)
View the selected model and provider
Confirm TEE attestation status (Verified / Pending / Failed)
Inspect cryptographic verification details

Per OLLM's architecture documentation, every TEE AI model request produces attestation artifacts that prove three things: the specified model ran inside a valid TEE, the execution environment matched expected measurements, and the response was generated within the trusted boundary. These artifacts are returned in the API response and can be independently verified using the OLLM scanner. OLLM's partnership with Phala extends this further: teams can now run frontier models on NVIDIA H200 GPUs with Intel TDX and AMD SEV protection, incurring only 0.5%-5% performance overhead, making full hardware-level privacy viable for production workloads without sacrificing meaningful throughput.

The Control Plane Constraint That Makes the Trust Model Work

OLLM's architecture makes an explicit design choice that most developers initially read as a limitation: the router does not perform automatic model selection or dynamic routing. The model specified in the request is the model that executes. OLLM's router authenticates the request, validates model availability and permissions, enforces security constraints, and coordinates attestation. It doesn't choose models, inspect inference data, or perform inference itself.

Verifiable inference is the core security guarantee provided by OLLM. It ensures that privacy is not based on policy, contractual assurances, or provider claims, but on cryptographic proof tied to each individual request.

Read as a security property rather than a missing feature, this constraint is load-bearing. Automatic model substitution would break the attestation chain: if the router can swap in a different model, the attestation artifact no longer proves what the caller requested. By making model selection entirely user-controlled, OLLM ensures that the cryptographic proof returned with the response refers to the exact model the application specified, rather than a "similar" model the router judged equivalent.

The Security Architecture Gap Between Routing and Attestation

Security claims in LLM infrastructure tend to cluster around data in transit, because TLS is table stakes and easy to advertise. The more substantive gap is what happens to data while it's being used.

Data in Transit vs. Data in Use: Where Encryption Differs

TLS encrypts data moving between your application and the aggregator, and between the aggregator and the provider. That's necessary, but it doesn't address data in use, which is the period when the model processes the prompt inside the provider's compute environment. Standard inference infrastructure decrypts the prompt at the compute layer so the GPU can process it. During that window, the prompt is in plaintext inside the provider's infrastructure, accessible to anyone with sufficient access to that environment.

Confidential computing closes that window; as per OLLM's documentation, the platform enforces encryption across the full request lifecycle: TLS between client, OLLM, and model providers for data in transit; hardware memory encryption via Intel TDX and NVIDIA confidential compute during inference for data in use; and encrypted configuration across the control plane. The prompt stays encrypted until the TEE processes it, and the TEE's hardware isolation prevents the provider's own infrastructure from reading the plaintext.

Insider Threat Surface and Infrastructure-Level Risks

An aggregator's managed infrastructure introduces a third-party network path with its own attack surface. A compromised aggregator, misconfigured logging, or an insider with access to request logs can expose prompt data even if the aggregator has good intentions. With a self-managed or hosted aggregator, your data passes through their infrastructure, and some organizations can't accept this due to regulatory requirements or data sensitivity.

OLLM's TEE model removes the aggregator itself from the threat model. Because OLLM's router cannot inspect raw prompt or response data outside the TEE boundary, a compromised OLLM infrastructure layer cannot expose inference data. The hardware guarantees what software policy cannot: even a fully compromised intermediary layer yields nothing from the TEE-protected inference. That's a materially different risk posture for any workload involving sensitive data.

Audit Trails and What Passes a Security Review

An aggregator's audit trail documents that a request was made, to which model, at what time, and for what cost. That's useful for billing attribution and basic operational monitoring. What it doesn't contain is any evidence about the execution environment, whether the prompt was processed in an isolated context, or whether the model that responded is the one requested.

You can track your OLLM usage in the OLLM dashboard, including:

Total requests
Token usage
Model usage
Verification status

OLLM's per-request attestation artifacts directly fill that gap. Each artifact contains hardware-level proof of execution that can be verified independently against the TEE's cryptographic measurements. When a security review asks for evidence of compliant processing, the answer is a signed artifact with a timestamp and model measurement, not a link to a compliance certificate page. For teams operating under HIPAA, GDPR Article 32, or financial data handling requirements, that distinction separates a tool that passes an audit from one that requires an exemption.

Request Lifecycle Compared: Aggregator Path vs. Confidential Gateway Path

Walking through the actual mechanics of how a request travels through each system makes the architectural difference concrete rather than abstract.

Aggregator Request Lifecycle: Client to Provider via Third-Party Network Path

With a typical aggregator, the flow looks like this: the application sends a request to the aggregator's endpoint; the aggregator authenticates the API key, resolves the model name to a specific provider, and may apply routing logic like failover or load balancing; the request travels through the aggregator's network to the upstream provider's API; the provider runs inference on its standard compute infrastructure; the response returns through the aggregator to the calling application.

At no point in this flow does the calling application have any visibility into the provider's execution environment. The aggregator can tell you which provider handled the request and how many tokens were consumed. It cannot tell you whether the prompt was processed in isolated memory, whether the provider logged it for model training, or whether the execution environment matches any particular security standard. The aggregator provides a unified API across multiple models; what happens beyond that proxy layer depends entirely on the provider's infrastructure and policies.

Confidential Gateway Request Lifecycle: Client to TEE with Hardware Receipt

OLLM's request flow, as documented in its architecture reference, works differently at every step after authentication. The client sends a request with an explicit model specification; OLLM authenticates the request and verifies the model is available and supported; the request is forwarded to the selected model's TEE-backed execution environment; hardware attestation is generated as part of the execution process; and the model output plus verification metadata return to the client together.

Two properties of this flow matter most. First, OLLM does not alter the model choice at any point. The requested model is the one that runs. Second, OLLM's software layer generates the attestation artifact by the hardware, so neither OLLM nor the provider can fabricate a passing attestation for a non-compliant execution.

OLLM supports multiple hardware attestation technologies to enable verifiable inference across different execution environments.

Migration Path for Teams Moving from Aggregator to Confidential Gateway

The API-level migration is straightforward. OLLM is OpenAI-compatible, so any application that uses an OpenAI SDK can connect by changing the base URL and swapping in an OLLM API key. Model name conventions may differ slightly between providers, but the request and response structures remain consistent. An existing application can migrate with minimal changes, without custom clients, wrappers, or SDK rewrites.

The non-trivial work is on the verification side, specifically deciding whether to act on the attestation artifacts OLLM returns with each response. For teams that want to verify execution per request, the response includes attestation metadata that can be checked against the OLLM scanner. For teams that primarily need the hardware-level isolation guarantee without building custom verification logic, the trust model holds by default, with no additional integration.

When Each Architecture Is the Right Fit

Neither architecture is universally correct. The right choice depends on what data moves through the prompt, what regulatory environment the application operates in, and what an audit actually requires.

Aggregators as the Correct Choice for Prototype, Experimentation, and Low-Stakes Workloads

For teams benchmarking models, building internal tools with no PII in the prompt, or running development and staging environments, an aggregator is the right tool. The breadth of the catalog, consolidated billing, and zero infrastructure overhead genuinely reduce friction. A hosted aggregator removes the need to maintain gateway infrastructure, making it a fast starting point when breadth across models matters more than execution verification.

The workload signals that point toward an aggregator: no regulated data in prompts, no audit requirement for execution provenance, development iteration speed as the primary concern, and willingness to accept the provider's standard compliance posture for each model. Most prototype workloads check all these boxes, and an aggregator handles them well.

Confidential Gateways as the Correct Choice for Regulated, Sensitive, and Production-Grade Workloads

Once the application processes PII, PHI, financial records, legal documents, or any data that triggers a compliance obligation, the aggregator's policy layer no longer satisfies the requirement. HIPAA's technical safeguard rules require access controls and audit controls that can be demonstrated, not merely claimed. GDPR Article 32 requires appropriate technical measures to ensure a level of security appropriate to the risk. A cryptographic attestation artifact is a technical measure; a ZDR agreement is a contractual one.

OLLM's architecture is designed for organizations that cannot rely on trust statements alone when handling sensitive data. Rather than replacing implicit trust with better contracts, it replaces implicit trust with cryptographic verification and hardware-enforced isolation. For a healthcare application processing clinical notes, a financial platform handling transaction data, or a legal tool working with privileged documents, the confidential gateway is the architecture that passes a meaningful security review, not just a checkbox audit.

The Hybrid Stack Reality: Aggregator for Dev, Confidential Gateway for Production

Many teams run both. An aggregator in development provides the iteration speed and model breadth that prototyping requires, without the latency overhead of confidential computing. Once the application moves to production and real user data enters the prompt, the base URL changes, and the application now routes through OLLM's confidential infrastructure. The performance overhead of TEE-backed inference runs between 0.5% and 5%, a trade-off that's operationally negligible for most production workloads compared to the compliance risk of running sensitive data through a standard aggregator.

The OpenAI SDK compatibility on both sides makes the switch mechanical. The application code doesn't change; the infrastructure posture does. That's the right division of concern: use the aggregator for speed when data sensitivity is low, and switch to the confidential gateway when it isn't.

Conclusion

LLM API aggregators and confidential AI gateways solve different problems at different layers of the stack. Aggregators solve the integration and access problem: one key, many models, consolidated billing, and no per-provider SDK work. Confidential gateways solve the execution trust problem: hardware-isolated inference, cryptographic proof of processing, and a control plane that cannot access the data it routes. Using an aggregator where a confidential gateway is needed doesn't leave a small gap; it leaves the entire execution layer unverified, and no contract fills an architectural gap.

The distinction becomes consequential the moment a production application handles data that a regulator, a legal team, or a security review will scrutinize. At that point, the question isn't "which tool has more models," it's "what can I prove about how this data was handled." OLLM's architecture answers that question with a hardware receipt per request, not a terms-of-service page.

FAQs

1. What Is The Difference Between An LLM API Aggregator And An LLM Gateway?

An LLM API aggregator, such as OpenRouter, normalizes multiple provider APIs into a single endpoint and consolidates billing. An LLM gateway, in the confidential computing sense, adds a hardware-enforced execution layer with per-request attestation. The API surface looks identical; the security posture differs at the execution layer, not the integration layer.

2. Can An API Aggregator Like OpenRouter Provide Zero Data Retention Guarantees?

Aggregators can contractually commit not to retain data in their own systems, but they relay requests to upstream providers whose infrastructure they don't control. ZDR on an aggregator is a policy commitment. ZDR on an aggregator is a policy commitment from the upstream provider. On OLLM, ZDR models carry the same policy-based guarantee: no prompts or responses are retained. Hardware-enforced guarantees with cryptographic attestation apply to OLLM's TEE models, which run inside Intel TDX confidential VMs provided by Phala and NEAR.

3. How Does Per-Request Attestation Work In A Confidential AI Gateway?

During inference within a TEE, the hardware generates a cryptographic artifact that contains measurements of the execution environment, thereby confirming that the specified model ran in a verified, isolated context. OLLM returns this artifact alongside the API response, and it can be independently verified through the OLLM scanner without trusting OLLM's own assertion that the execution was secure.

4. When Should A Team Switch From An LLM Aggregator To A Confidential AI Gateway In Production?

The moment production prompts contain PII, PHI, financial records, or legally privileged content, the aggregator's policy-layer guarantees are insufficient for most regulatory frameworks. Because OLLM is OpenAI-compatible, the migration is a base URL change, so the cost of switching is low relative to the compliance risk of not switching.

Build on Any Axis With Origin

Transform your development process with Origin's intelligent automation and persistent context management.

Start Building

Contact Sales

View Documentation

All logos, trademarks, and brand names of other companies displayed on this site are the property of their respective owners AND ARE ONLY INTENDED TO SHOWCASE THE MODELS AND INTEGRATIONS SUPPORTED, WITH NO CLAIMS OF PARTNERSHIP. All rights reserved to the respective companies.

Build on Any Axis With Origin

Transform your development process with Origin's intelligent automation and persistent context management.

Start Building

Contact Sales

View Documentation

Build on Any Axis With Origin

Transform your development process with Origin's intelligent automation and persistent context management.

Start Building

Contact Sales

View Documentation

ollm°

PROVIDERS

BLOG

SCANNER

GENERATE API KEY

ollm°

SCANNER

GENERATE API KEY

SCANNER