Coding Agents 4 min read

From $0.60 to $0.02: My Journey into Agentic Vibe Coding and Extreme Cost Optimization

Adrian Kuczyński
Senior Security Developer
From $0.60 to $0.02: My Journey into Agentic Vibe Coding and Extreme Cost Optimization

In the world of modern software development, we are witnessing a paradigm shift. We’ve moved past simple "chat-with-code" to something far more potent: Agentic Vibe Coding. It’s the art of steering high-level intent while AI agents handle the "how"—the terminal commands, the file searching, and the multi-file refactoring.

But as I discovered on my Arch Linux dev machine while building this-is-adix.dev, this power comes with a price tag. If you aren't careful, a single "vibe" session can cost you as much as a fancy coffee.

Here is the story of how I tamed the beast, optimized my Continue setup, and slashed my API costs by over 90%.

The "Context Bloat" Wake-up Call

I started using Continue with Claude 4.6 Sonnet via OpenRouter. The performance was breathtaking, but the bills were... eye-opening. A simple request to add a theme-aligned popup cost me $0.60.

Why? Because the agent was "blindly" reading full files and I was paying the "Open Tabs Tax." Every time I asked a question, Continue was sending every open file in my IDE as context. Combined with the agent's tendency to be overly chatty, I was burning tokens for "politeness" and redundant code explanations.

The Strategy: The "Silent Detective" Protocol

To fix this, I redesigned my config.yaml and systemMessage around three pillars: Frugality, Autonomy, and Silence.

1. The Detective Rule

Instead of giving the model all the context upfront, I taught it to be a detective. I instructed the agent to use terminal tools (rg, grep, ls) to map the codebase first. It now reads only the specific lines it needs.

2. The Silent Protocol

I stripped away the AI’s "personality." I don't need a "Sure, I can help with that!" I need code changes. I implemented a rule where responses must be under 15 words unless explaining complex logic.

3. The Hybrid Model Approach

Why use a "heavy" model for everything? I moved to a hybrid setup:

  • Claude 4.6 Sonnet: The "Lead Engineer" for complex refactoring and security logic.

  • DeepSeek V4 Pro: The "Fast Executor" for boilerplate, unit tests, and—crucially—Tab Autocomplete. It’s 10x cheaper and incredibly snappy.

The Modern config.yaml

Here is the "battle-tested" configuration I’m running now. It uses the latest 2026 Continue schema, focusing on Prompt Caching and Agentic Awareness.

YAML

# ~/.continue/config.yaml
ui:
  allowAutomaticallyAddedContext: false # No more "Open Tabs Tax"

models:
  - name: "Claude 4.6 Sonnet (Engineer)"
    provider: openrouter
    model: anthropic/claude-3.5-sonnet
    requestOptions:
      headers:
        "X-OpenRouter-Caching": "true" # Massive savings on repeated context

  - name: "DeepSeek V4 Pro (Worker)"
    provider: openrouter
    model: deepseek/deepseek-v4-pro

tabAutocompleteModel:
  title: "DeepSeek Coder"
  provider: openrouter
  model: deepseek/deepseek-v4-pro

context:
  - provider: os      # Agent knows it's on Arch Linux
  - provider: diff    # Essential for cheap, effective Code Reviews
  - provider: repo-map
    params:
      includeSignatures: false # Just the map, save those tokens!

The System Message: My Secret Sauce

This is what keeps the agent in check. It’s a "Token-Frugal" instruction set that forces the model to act instead of talk.

XML

<important_rules>
  You are a Token-Frugal, Silent Agent.
  1. THE SILENCE RULE: Responses under 15 words. No filler. No pleasantries.
  2. TOOL-CENTRIC: Always 'grep' or 'rg' before reading. Never 'cat' a 3000-line file.
  3. FORBIDDEN: Do not repeat code in chat that you are already applying via 'edit' tools.
</important_rules>

The Result: Real-world Testing

I put this setup to the test with a security task: Implementing a multi-step password change feature.

  • The Implementation: The agent used grep to find existing auth patterns, copied the BCRYPT cost settings, updated the routes, and built the frontend. Total cost: $0.29.

  • The Code Review: I used the @git-diff provider to ask for a review of the changes.

    • Before Optimization: This would have sent the whole controller and cost ~$0.16.

    • After Optimization: It sent only the diff. Total cost: $0.02.

Final Thoughts

Vibe coding isn't about being lazy; it's about being an orchestrator. By treating context as a precious resource and moving to an agentic "pull" model (where the AI finds info) rather than a "push" model (where you give it everything), you can build complex systems for pennies.

Stay lean, stay silent, and let the agents do the heavy lifting.

Happy (Frugal) Coding!

Discussion

Read Next