Cheap inference for open-weight models
Run open-weight models — MiniMax, Kimi, GLM 5.2 — at fixed USD per-token rates with a drop-in OpenAI-compatible endpoint.
Open-weight, priced low
Open-weight models on cost-efficient infrastructure at a fixed USD margin — roughly 10× less per token than OpenRouter on the same model. Predictable, no subscription.
Drop-in OpenAI-compatible
Same OpenAI API, same open-weight models — just a different base URL and key. Pay-as-you-go from a USD balance topped up with USDT. No cards, no KYC.
Drop-in: change the base URL
# Open-weight, OpenAI-compatible
export OPENAI_BASE_URL="https://api.cheaptokens.dev/v1"
export OPENAI_API_KEY="YOUR_CHEAPTOKENS_KEY" Same model, a fraction of the price
Per-million-token rates versus OpenRouter on the same model.
| Model | OpenRouter | cheaptokens | You save |
|---|---|---|---|
| MiniMax-M2.7 | $0.24 / $0.96 | $0.025 / $0.1 | ~10× cheaper |
| Kimi-K2.6 | $0.66 / $3.41 | $0.07 / $0.35 | ~9× cheaper |
Same models. USD per 1M tokens (input / output). OpenRouter shown for comparison. · Open cost calculator
FAQ
Does Hermes connect over the OpenAI API?
Yes — point Hermes at our OpenAI-compatible base URL and key.
Are agentic, multi-step runs supported?
Yes — tool use and multi-step workflows pass through unchanged.
Is pricing fixed during a run?
Yes — per-model, per-token USD that doesn't change mid-run.
Related use cases
Cheap open-weight inference
Create an account, top up with USDT, and point your client at cheaptokens.
Get startedOpen-weight model names are trademarks of their respective owners. Compatible via the standard OpenAI API.