Cheap inference for open-weight models

Run open-weight models — MiniMax, Kimi, GLM 5.2 — at fixed USD per-token rates with a drop-in OpenAI-compatible endpoint.

Get started

inference

$ export OPENAI_BASE_URL=https://api.cheaptokens.dev/v1
› ready · model minimax-m2.7
  open-weight · fixed USD · ~10× cheaper
▌

Open-weight, priced low

Open-weight models on cost-efficient infrastructure at a fixed USD margin — roughly 10× less per token than OpenRouter on the same model. Predictable, no subscription.

Drop-in OpenAI-compatible

Same OpenAI API, same open-weight models — just a different base URL and key. Pay-as-you-go from a USD balance topped up with USDT. No cards, no KYC.

Drop-in: change the base URL

env

# Open-weight, OpenAI-compatible
export OPENAI_BASE_URL="https://api.cheaptokens.dev/v1"
export OPENAI_API_KEY="YOUR_CHEAPTOKENS_KEY"

Same model, a fraction of the price

Per-million-token rates versus OpenRouter on the same model.

Model	OpenRouter	cheaptokens	You save
MiniMax-M2.7	$0.24 / $0.96	$0.025 / $0.1	~10× cheaper
Kimi-K2.6	$0.66 / $3.41	$0.07 / $0.35	~9× cheaper

Same models. USD per 1M tokens (input / output). OpenRouter shown for comparison. · Open cost calculator

FAQ

Does Hermes connect over the OpenAI API?

Yes — point Hermes at our OpenAI-compatible base URL and key.

Are agentic, multi-step runs supported?

Yes — tool use and multi-step workflows pass through unchanged.

Is pricing fixed during a run?

Yes — per-model, per-token USD that doesn't change mid-run.

Related use cases

For AI agents Cheapest AI tokens For coding

Cheap open-weight inference

Create an account, top up with USDT, and point your client at cheaptokens.

Get started

Open-weight model names are trademarks of their respective owners. Compatible via the standard OpenAI API.