Question 1

Use model routing

Accepted Answer

Configure different models for different tasks. Use a cheap model (Haiku, GPT-4o-mini) for simple queries and a powerful model (Sonnet, GPT-4) only for complex reasoning. This alone can cut costs by 40-60%.

Question 2

Enable context compaction

Accepted Answer

Use the /compact command regularly to summarize older conversation turns. This reduces token usage on every subsequent message. Configure auto-compaction in your settings.

Question 3

Set token budgets

Accepted Answer

Configure maximum token limits per conversation and per day. This prevents runaway costs from long conversations or looping agent behaviors.

Question 4

Use local models for simple tasks

Accepted Answer

Run Ollama with a small model (Llama 3 8B, Qwen 7B) for simple tasks like calendar checks, quick answers, and file operations. Reserve cloud API calls for complex reasoning.

Spending Too Much on OpenClaw? 10 Ways to Cut Your Existing Bill

Need professional help with OpenClaw?