Cheap inference for open-weight models

Run open-weight models — MiniMax, Kimi, GLM 5.2 — at fixed USD per-token rates with a drop-in OpenAI-compatible endpoint.

Open-weight, priced low

Open-weight models on cost-efficient infrastructure at a fixed USD margin — roughly 10× less per token than OpenRouter on the same model. Predictable, no subscription.

Drop-in OpenAI-compatible

Same OpenAI API, same open-weight models — just a different base URL and key. Pay-as-you-go from a USD balance topped up with USDT. No cards, no KYC.

Drop-in: change the base URL

env
# Open-weight, OpenAI-compatible
export OPENAI_BASE_URL="https://api.cheaptokens.dev/v1"
export OPENAI_API_KEY="YOUR_CHEAPTOKENS_KEY"

Same model, a fraction of the price

Per-million-token rates versus OpenRouter on the same model.

Model OpenRouter cheaptokens You save
MiniMax-M2.7 $0.24 / $0.96 $0.025 / $0.1 ~10× cheaper
Kimi-K2.6 $0.66 / $3.41 $0.07 / $0.35 ~9× cheaper

Same models. USD per 1M tokens (input / output). OpenRouter shown for comparison. · Open cost calculator

FAQ

Does Hermes connect over the OpenAI API?

Yes — point Hermes at our OpenAI-compatible base URL and key.

Are agentic, multi-step runs supported?

Yes — tool use and multi-step workflows pass through unchanged.

Is pricing fixed during a run?

Yes — per-model, per-token USD that doesn't change mid-run.

Cheap open-weight inference

Create an account, top up with USDT, and point your client at cheaptokens.

Get started

Open-weight model names are trademarks of their respective owners. Compatible via the standard OpenAI API.