Token cap per request
Live200K input / 8K output limits with graceful truncation and a clear notice.
- ✓200K input cap with structural truncation
- ✓8K output cap
- ✓Truncation notice in every response
- ✓Per-org override on enterprise plans
What it is.
Every LLM call has a 200K input token cap (≈400-500 pages depending on density) and an 8K output cap. If your document exceeds the input cap, we truncate intelligently — by section, preserving structure — and tell you in the response so you know what was dropped.
Caps prevent runaway costs and ensure responses stay within token-budget guarantees we make to you.
Three steps.
End to end.
Input checked against the 200K cap.
We truncate by section, preserving structure. Truncation is reported to you.
Output capped so a runaway model can't burn through your budget on a single call.
What you get.
- ✓200K input cap with structural truncation
- ✓8K output cap
- ✓Truncation notice in every response
- ✓Per-org override on enterprise plans
Quick answers.
Use bulk analysis to chunk it across multiple calls, or split it into logical sections. Single-call cap stays at 200K.
More in AI Controls & Guardrails.
We do not train models on customer content without explicit opt-in. Default is off.
User-pasted text is sanitised before reaching the LLM.
Daily and monthly cost ceilings per organisation. Soft warnings, hard cutoffs.
Stable prompt prefixes are cached. Faster responses, lower bill, passed to you.