From $0.60 to $0.02: My Journey into Agentic Vibe Coding and Extreme Cost Optimization

In the world of modern software development, we are witnessing a paradigm shift. We’ve moved past simple "chat-with-code" to something far more potent: Agentic Vibe Coding. It’s the art of steering high-level intent while AI agents handle the "how"—the terminal commands, the file searching, and the multi-file refactoring.
But as I discovered on my Arch Linux dev machine while building this-is-adix.dev, this power comes with a price tag. If you aren't careful, a single "vibe" session can cost you as much as a fancy coffee.
Here is the story of how I tamed the beast, optimized my Continue setup, and slashed my API costs by over 90%.
The "Context Bloat" Wake-up Call
I started using Continue with Claude 4.6 Sonnet via OpenRouter. The performance was breathtaking, but the bills were... eye-opening. A simple request to add a theme-aligned popup cost me $0.60.
Why? Because the agent was "blindly" reading full files and I was paying the "Open Tabs Tax." Every time I asked a question, Continue was sending every open file in my IDE as context. Combined with the agent's tendency to be overly chatty, I was burning tokens for "politeness" and redundant code explanations.
The Strategy: The "Silent Detective" Protocol
To fix this, I redesigned my config.yaml and systemMessage around three pillars: Frugality, Autonomy, and Silence.
1. The Detective Rule
Instead of giving the model all the context upfront, I taught it to be a detective. I instructed the agent to use terminal tools (rg, grep, ls) to map the codebase first. It now reads only the specific lines it needs.
2. The Silent Protocol
I stripped away the AI’s "personality." I don't need a "Sure, I can help with that!" I need code changes. I implemented a rule where responses must be under 15 words unless explaining complex logic.
3. The Hybrid Model Approach
Why use a "heavy" model for everything? I moved to a hybrid setup:
Claude 4.6 Sonnet: The "Lead Engineer" for complex refactoring and security logic.
DeepSeek V4 Pro: The "Fast Executor" for boilerplate, unit tests, and—crucially—Tab Autocomplete. It’s 10x cheaper and incredibly snappy.
The Modern config.yaml
Here is the "battle-tested" configuration I’m running now. It uses the latest 2026 Continue schema, focusing on Prompt Caching and Agentic Awareness.
YAML
# ~/.continue/config.yaml
ui:
allowAutomaticallyAddedContext: false # No more "Open Tabs Tax"
models:
- name: "Claude 4.6 Sonnet (Engineer)"
provider: openrouter
model: anthropic/claude-3.5-sonnet
requestOptions:
headers:
"X-OpenRouter-Caching": "true" # Massive savings on repeated context
- name: "DeepSeek V4 Pro (Worker)"
provider: openrouter
model: deepseek/deepseek-v4-pro
tabAutocompleteModel:
title: "DeepSeek Coder"
provider: openrouter
model: deepseek/deepseek-v4-pro
context:
- provider: os # Agent knows it's on Arch Linux
- provider: diff # Essential for cheap, effective Code Reviews
- provider: repo-map
params:
includeSignatures: false # Just the map, save those tokens!
The System Message: My Secret Sauce
This is what keeps the agent in check. It’s a "Token-Frugal" instruction set that forces the model to act instead of talk.
XML
<important_rules>
You are a Token-Frugal, Silent Agent.
1. THE SILENCE RULE: Responses under 15 words. No filler. No pleasantries.
2. TOOL-CENTRIC: Always 'grep' or 'rg' before reading. Never 'cat' a 3000-line file.
3. FORBIDDEN: Do not repeat code in chat that you are already applying via 'edit' tools.
</important_rules>
The Result: Real-world Testing
I put this setup to the test with a security task: Implementing a multi-step password change feature.
The Implementation: The agent used
grepto find existing auth patterns, copied the BCRYPT cost settings, updated the routes, and built the frontend. Total cost: $0.29.The Code Review: I used the
@git-diffprovider to ask for a review of the changes.Before Optimization: This would have sent the whole controller and cost ~$0.16.
After Optimization: It sent only the diff. Total cost: $0.02.
Final Thoughts
Vibe coding isn't about being lazy; it's about being an orchestrator. By treating context as a precious resource and moving to an agentic "pull" model (where the AI finds info) rather than a "push" model (where you give it everything), you can build complex systems for pennies.
Stay lean, stay silent, and let the agents do the heavy lifting.
Happy (Frugal) Coding!