Configure different models for different tasks. Use a cheap model (Haiku, GPT-4o-mini) for simple queries and a powerful model (Sonnet, GPT-4) only for complex reasoning. This alone can cut costs by 40-60%.
Use the /compact command regularly to summarize older conversation turns. This reduces token usage on every subsequent message. Configure auto-compaction in your settings.
Configure maximum token limits per conversation and per day. This prevents runaway costs from long conversations or looping agent behaviors.
Run Ollama with a small model (Llama 3 8B, Qwen 7B) for simple tasks like calendar checks, quick answers, and file operations. Reserve cloud API calls for complex reasoning.
Need professional help with OpenClaw?
Skip the troubleshooting. We handle setup, security, and integrations.
Book a Setup Call → View PricingSee our full pricing breakdown for iClaudebot setup services.