← All features
AI Controls & Guardrails

Token cap per request

Live

200K input / 8K output limits with graceful truncation and a clear notice.

What you get
  • 200K input cap with structural truncation
  • 8K output cap
  • Truncation notice in every response
  • Per-org override on enterprise plans
Overview

What it is.

Every LLM call has a 200K input token cap (≈400-500 pages depending on density) and an 8K output cap. If your document exceeds the input cap, we truncate intelligently — by section, preserving structure — and tell you in the response so you know what was dropped.

Caps prevent runaway costs and ensure responses stay within token-budget guarantees we make to you.

How it works

Three steps.
End to end.

01
1. Call goes out

Input checked against the 200K cap.

02
2. Intelligent truncation if needed

We truncate by section, preserving structure. Truncation is reported to you.

03
3. Response capped at 8K out

Output capped so a runaway model can't burn through your budget on a single call.

Capabilities

What you get.

  • 200K input cap with structural truncation
  • 8K output cap
  • Truncation notice in every response
  • Per-org override on enterprise plans
FAQ

Quick answers.

How do you handle a 1000-page document?

Use bulk analysis to chunk it across multiple calls, or split it into logical sections. Single-call cap stays at 200K.

Related

More in AI Controls & Guardrails.

No model training on your data
Live

We do not train models on customer content without explicit opt-in. Default is off.

Prompt injection defence
Live

User-pasted text is sanitised before reaching the LLM.

Per-org cost guardrails
Live

Daily and monthly cost ceilings per organisation. Soft warnings, hard cutoffs.

Prompt caching
Live

Stable prompt prefixes are cached. Faster responses, lower bill, passed to you.

Want to try Token cap per request?
Get started in 60 seconds.

Sign up →All features