One of the most common questions we get: "How much does it cost to run 8 AI agents every day?"
The answer shocks people: most months, very little. Some months, nothing for model API costs at all.
This isn't magic โ it's model routing. OpenClaw lets you define which model handles which type of task, and both Groq and Google offer genuinely usable free tiers that cover the vast majority of what a small AI agency actually does. This guide shows exactly how we configure it.
Understanding the Free Tiers
Before touching OpenClaw config, it's worth understanding what you're working with:
| Provider | Free Model | Rate Limit | Context | Best For |
|---|---|---|---|---|
| Groq | llama-3.3-70b-versatile | 30 req/min FREE | 128k | Fast inference, summaries, analysis |
| Groq | gemma2-9b-it | 30 req/min FREE | 8k | Lightweight tasks, quick classification |
| gemini-1.5-flash | 15 req/min FREE | 1M | Long docs, research, multimodal | |
| gemini-2.0-flash | 15 req/min FREE | 1M | Research, complex reasoning, long context | |
| Anthropic | claude-haiku-3.5 | Pay-per-token PAID | 200k | Orchestration, nuanced instructions |
| OpenAI | gpt-4.1-mini | Pay-per-token PAID | 1M | High-quality writing, client-facing content |
The strategy is simple: use free models for everything that doesn't touch a client or go live. Internal analysis, system monitoring, data aggregation, draft summaries โ all of this goes through Groq or Gemini. Only the final outputs that clients see get routed to paid models.
Adding Free Models to OpenClaw
OpenClaw manages all model credentials in its config. Adding Groq and Gemini is straightforward:
# Get your free API keys
# Groq: https://console.groq.com (free account, no card required)
# Gemini: https://aistudio.google.com/app/apikey (free tier, Google account)
# Add them to OpenClaw
openclaw config add-provider groq \
--api-key YOUR_GROQ_KEY \
--base-url https://api.groq.com/openai/v1
openclaw config add-provider gemini \
--api-key YOUR_GEMINI_KEY \
--base-url https://generativelanguage.googleapis.com/v1beta/openai
Both Groq and Gemini expose OpenAI-compatible endpoints. That's why the --base-url flag works โ OpenClaw treats them identically to OpenAI, so no special driver needed.
Verify both providers are registered:
openclaw config list-providers
# Expected output:
Providers registered: 4
โ anthropic [claude-haiku-3.5, claude-sonnet-4-5]
โ openai [gpt-4.1-mini]
โ groq [llama-3.3-70b-versatile, gemma2-9b-it]
โ gemini [gemini-1.5-flash, gemini-2.0-flash]
Configuring Agent-Level Model Routing
This is where the cost savings happen. Each agent in OpenClaw can be assigned a default model. Here's how our eight agents are configured:
# /opt/openclaw/agents.yaml
agents:
cobalt:
model: claude-haiku-3.5
provider: anthropic
reason: "Orchestrator โ needs reliable instruction-following"
helix:
model: gemini-2.0-flash
provider: gemini
reason: "Research โ long context, free tier, perfect fit"
surge:
model: gpt-4.1-mini
provider: openai
reason: "Blog writing โ client-facing, quality matters"
vega:
model: llama-3.3-70b-versatile
provider: groq
reason: "Social posts โ fast, short output, free"
lyra:
model: gemini-1.5-flash
provider: gemini
reason: "SEO analysis โ long context docs, free"
prism:
model: fal-ai/flux/schnell
provider: fal
reason: "Image generation โ separate billing, minimal usage"
kova:
model: llama-3.3-70b-versatile
provider: groq
reason: "Report compilation โ aggregation task, free"
optimum:
model: gemma2-9b-it
provider: groq
reason: "System monitoring โ lightweight checks, free"
With this config, six of eight agents run entirely on free tiers. Only Cobalt (Haiku) and Surge (GPT-4.1-mini) touch paid models, and Surge only runs once per week.
The Routing Decision Tree
How We Decide Which Model Gets a Task
Rate Limit Management
Free tiers have rate limits. Groq allows 30 requests per minute; Gemini allows 15. For a multi-agent system, you need to ensure you're not hammering these limits simultaneously.
OpenClaw handles this with a built-in rate limiter per provider:
# In openclaw.json (edit via openclaw CLI, not directly)
openclaw config set groq.rate_limit_rpm 25
openclaw config set gemini.rate_limit_rpm 12
We set slightly below the actual limit as a safety margin. The gateway queues requests automatically โ agents don't need to know about this.
Don't schedule multiple free-tier agents to trigger at the same minute. Stagger your crons by at least 2-3 minutes between agents. If Lyra (Gemini) and Kova (Groq) both fire at * * * * 0, you may hit limits and get silent failures.
Our weekly schedule is deliberately staggered:
# Staggered cron schedule to avoid rate limit collisions
0 8 * * 1 /opt/masterclaw/crons/trigger-lyra.sh # Lyra: Monday 08:00 (Gemini)
30 8 * * 1 /opt/masterclaw/crons/trigger-helix.sh # Helix: Monday 08:30 (Gemini)
0 9 * * 1 /opt/masterclaw/crons/trigger-surge.sh # Surge: Monday 09:00 (GPT)
0 18 * * * /opt/masterclaw/crons/trigger-vega.sh # Vega: Daily 18:00 (Groq)
0 10 * * * /opt/masterclaw/crons/trigger-optimum.sh # Optimum: Daily 10:00 (Groq)
30 23 * * * /opt/masterclaw/crons/kova-daily-report.sh # Kova: Daily 23:30 (Groq)
Fallback Routing for Reliability
Free tier outages happen โ rarely, but they do. We configure fallback models so tasks don't silently fail:
openclaw config set-fallback helix \
--primary gemini-2.0-flash \
--fallback llama-3.3-70b-versatile
openclaw config set-fallback lyra \
--primary gemini-1.5-flash \
--fallback llama-3.3-70b-versatile
If Gemini is unreachable, these agents automatically retry on Groq. The fallback itself is also free. Only if both fail does the task error out and get logged for Kova's daily report.
Our monthly spend: ~$2โ6 on Haiku for Cobalt orchestration, ~$1โ3 on GPT-4.1-mini for Surge's weekly article. Everything else is zero. Six agents, zero cost, running daily.
Monitoring Free Tier Usage
You need to know when you're approaching rate limits, especially if your usage grows. Optimum checks token usage as part of its daily monitoring sweep:
# Optimum checks usage via OpenClaw's usage endpoint
curl -s http://localhost:18789/v1/usage/today | \
jq '.providers[] | select(.name == "groq" or .name == "gemini")'
# Example response
{
"name": "groq",
"requests_today": 47,
"tokens_today": 82400,
"estimated_cost": 0.00
},
{
"name": "gemini",
"requests_today": 23,
"tokens_today": 341200,
"estimated_cost": 0.00
}
Optimum writes this to its daily output file. Kova includes a usage summary in the nightly report. If requests_today on either provider approaches 1000, Kova flags it for review.
When to Upgrade from Free
Free tiers are a starting point, not a ceiling. You'll know it's time to move to paid when: your agents need to respond in real-time (free tiers have variable latency), you're regularly hitting daily token caps, or a client deliverable depends on a Gemini/Groq task that's timing out during rate-limit queuing.
When that happens, the upgrade path is just a one-line config change in OpenClaw โ the agents themselves don't change at all.
For the full picture of how these agents coordinate, see our guide on running 8 AI agents on one server. And if you're starting from scratch and hitting gateway issues before even reaching model routing, check out our OpenClaw error troubleshooting guide first.