Human Capital Meets Token Capital - How I Cut My Claude API Costs Before They Bankrupted Me
Posted June 27, 2026 by Gowri Shankar ‐ 12 min read
Somewhere between I'll just tweak this prompt one last time and How on earth did I spend another fifty dollars?, it happened. Not a breakthrough. Not an elegant architectural insight. Just me... staring at Anthropic's billing dashboard like it had personally betrayed my family. For twenty-six years of writing software, pressing Run had never required financial planning. Compilers didn't charge me. Python didn't invoice me. Docker never interrupted halfway through a build to politely ask if I was sure I wanted to continue spending. Then came AI APIs, and suddenly every enthusiastic press of Shift+Enter carried the same emotional weight as swiping my credit card at an airport duty-free store. Surely this Jupyter notebook cell can't possibly cost four dollars, I told myself. It absolutely could. The scary part wasn't the money. Well... it was the money. But it was also what the money represented. I wasn't paying for servers. I wasn't paying for GPUs. I wasn't even paying for code execution. For the first time in my life, I was directly paying for thinking.
Every failed prompt. Every let's try a different approach. Every hallucination. Every beautifully reasoned answer that turned out to solve a completely different problem. Somewhere during six blissfully focused hours of building a Jupyter notebook, Claude happily consumed two hundred dollars' worth of my optimism. I topped up my credits, convinced I'd learned my lesson. An hour later, they were gone too. At that point I stopped debugging my notebook and started debugging my spending habits.

The Day “Run” Started Costing Money
For nearly three decades, software engineering has been a wonderfully forgiving profession. You could write terrible code, delete it, write even worse code, delete that too, and nobody from the Python Foundation would send you an invoice. Compilers judged your syntax, not your spending habits. Docker happily spun up containers without quietly deducting ₹300 from your account every time you forgot an environment variable. Experimentation was gloriously cheap. The only currency you spent was time… and occasionally your sanity after chasing a missing semicolon for forty-five minutes.
The Software Engineer’s Assumption
We built an entire generation of engineers around that assumption. Run. Break. Fix. Repeat. The faster you could iterate, the better engineer you became. Nobody ever asked, “Can we afford another unit test?” Nobody scheduled a meeting before running an integration test suite. We optimized for developer velocity because, economically speaking, thinking was free. Computers did the expensive work. We just told them what to do.
Then AI quietly rewrote the rules.
Welcome to the AI Economy
Today, every prompt has a price. Every retry has a cost. Every “let me rephrase that” is another line item on your invoice. Hallucinations aren’t just amusing screenshots for Twitter anymore; they’re billable events. Somewhere along the way, we crossed an invisible line. We stopped paying primarily for compute and infrastructure, and started paying for something far more interesting… reasoning itself.
That’s a subtle shift, but an important one. For the first time in software history, pressing Run isn’t just executing code. It’s authorizing a financial transaction.
My $200 Afternoon: Building a Complex Jupyter Notebook
I wasn’t doing anything outrageous. I was simply building a Jupyter notebook to solve a fairly complex AI orchestration problem. Like every engineering session, it followed a familiar rhythm: think, write, Shift+Enter, observe, repeat. Hours disappeared. Coffee went cold. The notebook slowly came alive, one cell at a time. Nothing felt expensive because nothing looked expensive.
About five or six hours later, Claude politely informed me that I’d exhausted my monthly API limit. Fair enough, I thought. Hard problem, lots of experimentation. I topped up my account, convinced I’d bought myself another productive afternoon.
An hour later, it was gone.
That’s when curiosity replaced panic. I started calculating what each notebook cell was actually costing me. Some of them were quietly burning three or four dollars every time I pressed Shift+Enter. A failed experiment wasn’t just wasted time anymore… it was a line item on my credit card.
For the first time in my career, I wasn’t measuring execution time.
I was measuring the cost of curiosity.
The Invisible Leak Nobody Talks About
For decades, software engineering had a wonderfully simple economic model. During development, you could experiment as much as you wanted. Run another test. Add another log statement. Rewrite the function for the fifth time because “this one feels cleaner.” The cost of experimentation was almost entirely your own time.
AI quietly broke that contract.
Why AI Development Feels Different
Every prompt has a price. Every retry has a cost. Every “let’s see what happens if…” becomes a billable event. The faster you iterate, the faster your token balance disappears. Curiosity, once the cheapest part of software engineering, suddenly became one of the most expensive.
The strange part is that we’ve already built entire disciplines around measuring expensive resources. We monitor CPU utilization, memory consumption, latency, disk I/O, database queries, cache hit ratios… we can probably tell you how many milliseconds a request spent waiting on Redis. Yet somehow, the one resource we’re actively burning while building AI systems is the one we rarely observe.
We treat tokens like electricity. They behave more like capital.
And that’s the invisible leak.
The Birth of an AI Accountant
My first instinct was the same as every engineer’s: optimize the prompts. Shorten the context. Switch models. Cache responses. Batch requests. Surely there had to be a clever trick to make Claude cheaper.
Stop Optimizing Blindly
Then I realized I was making the same mistake I warn people about when they optimize cloud infrastructure.
You don’t start by optimizing.
You start by measuring.
Until that afternoon, I couldn’t answer embarrassingly simple questions. Which notebook cell was the most expensive? Which prompt consumed the most tokens? How much had I spent so far? How many more experiments could I afford before blowing my budget?
I had observability for my application.
I had none for my AI spending.
That realization changed everything. Before building smarter agents, I needed to build an accountant. Something that could track every API call, maintain a running balance, enforce budgets, and tell me exactly where my Token Capital was going.
Because you cannot optimize what you cannot see.
So I built one.
Building Guardrails
Not another AI agent. An accountant.
Every API call now leaves behind a receipt. Every workflow has a budget. Every token is accounted for. If a notebook starts behaving like a billionaire with someone else’s credit card, it gets cut off.

Building an AI Accountant
Simple guardrails that let me focus on solving problems instead of worrying about the bill.
By remembering and tracking the following:
- Budget tracking
- Cost accounting
- Spending audits
- Maximum iteration limits
- Output truncation
- Budget exceptions
# ============================================================
# Guardrails -- every model call in this notebook is routed through
# run_agent_loop(), which enforces these caps unconditionally.
# ============================================================
MAX_COORDINATOR_TURNS = 12 # safety cap for a coordinator-level loop (used in later notebooks)
MAX_SUBAGENT_TURNS = 3 # safety cap for a single-purpose loop -- this notebook's default
MAX_BUDGET_USD = 0.10 # hard stop for this notebook's cumulative spend
MAX_OUTPUT_CHARS = 5000 # truncate any single response's text before returning it
# Approximate Claude Sonnet rates. Verify current pricing at
# https://docs.claude.com/en/docs/about-claude/pricing -- do not treat these
# as authoritative for real billing decisions, only for this notebook's
# illustrative budget guardrail.
INPUT_COST_PER_MTOK = 3.00
OUTPUT_COST_PER_MTOK = 15.00
class BudgetExceeded(Exception):
pass
class Guardrails:
# Tracks cumulative spend across every model call made with this instance
# and enforces MAX_BUDGET_USD as a hard stop.
def __init__(self, max_budget_usd=MAX_BUDGET_USD):
self.max_budget_usd = max_budget_usd
self.spent_usd = 0.0
self.api_calls = 0
def record(self, response):
usage = response.usage
cost = (
(usage.input_tokens / 1_000_000) * INPUT_COST_PER_MTOK
+ (usage.output_tokens / 1_000_000) * OUTPUT_COST_PER_MTOK
)
self.spent_usd += cost
self.api_calls += 1
if self.spent_usd > self.max_budget_usd:
raise BudgetExceeded(
f"Stopped after {self.api_calls} calls: spent ${self.spent_usd:.4f}, "
f"exceeding MAX_BUDGET_USD=${self.max_budget_usd:.2f}"
)
return cost
def truncate(self, text):
if len(text) > MAX_OUTPUT_CHARS:
omitted = len(text) - MAX_OUTPUT_CHARS
return text[:MAX_OUTPUT_CHARS] + f"\n...[truncated -- {omitted} chars omitted, MAX_OUTPUT_CHARS={MAX_OUTPUT_CHARS}]"
return text
def status(self):
return f"${self.spent_usd:.4f} / ${self.max_budget_usd:.2f} spent across {self.api_calls} calls"
budget = Guardrails()
Teaching a Notebook About Money
Proving the guardrail actually works, not just trusting that it would: instantiate a separate, deliberately tiny budget and run a few real calls against it until it trips.
Every API Call Leaves a Receipt
Observe how every model call now records:
- Input tokens
- Output tokens
- Estimated cost
- Cumulative spend
- Remaining budget
tiny_budget = Guardrails(max_budget_usd=0.01)
try:
for i in range(20):
response = client.messages.create(
model=MODEL,
max_tokens=200,
messages=[{"role": "user", "content": f"Say the number {i} and nothing else."}]
)
cost = tiny_budget.record(response)
print(f"call {i}: cost=${cost:.5f} | {tiny_budget.status()}")
except BudgetExceeded as e:
print()
print("BudgetExceeded raised as expected:")
print(" ", e)
Failing Fast Instead of Spending Forever
You should see the loop stop partway through with a clean BudgetExceeded exception rather than silently continuing to spend when you run with a small budget. This is the same mechanism run_agent_loop() below uses against budget (the real, $0.10 instance) for the rest of this notebook.
call 0: cost=$0.00013 | $0.0001 / $0.01 spent across 1 calls
call 1: cost=$0.00013 | $0.0003 / $0.01 spent across 2 calls
call 2: cost=$0.00013 | $0.0004 / $0.01 spent across 3 calls
call 3: cost=$0.00013 | $0.0005 / $0.01 spent across 4 calls
call 4: cost=$0.00013 | $0.0006 / $0.01 spent across 5 calls
call 5: cost=$0.00013 | $0.0008 / $0.01 spent across 6 calls
call 6: cost=$0.00013 | $0.0009 / $0.01 spent across 7 calls
call 7: cost=$0.00013 | $0.0010 / $0.01 spent across 8 calls
call 8: cost=$0.00013 | $0.0011 / $0.01 spent across 9 calls
call 9: cost=$0.00013 | $0.0013 / $0.01 spent across 10 calls
call 10: cost=$0.00013 | $0.0014 / $0.01 spent across 11 calls
call 11: cost=$0.00013 | $0.0015 / $0.01 spent across 12 calls
call 12: cost=$0.00013 | $0.0016 / $0.01 spent across 13 calls
call 13: cost=$0.00013 | $0.0018 / $0.01 spent across 14 calls
call 14: cost=$0.00013 | $0.0019 / $0.01 spent across 15 calls
call 15: cost=$0.00013 | $0.0020 / $0.01 spent across 16 calls
call 16: cost=$0.00013 | $0.0021 / $0.01 spent across 17 calls
call 17: cost=$0.00013 | $0.0023 / $0.01 spent across 18 calls
call 18: cost=$0.00013 | $0.0024 / $0.01 spent across 19 calls
call 19: cost=$0.00013 | $0.0025 / $0.01 spent across 20 calls
- BudgetExceeded exception
- Automatic termination
- Predictable experimentation instead of runaway costs

AI Agent Orchestrator
At the heart of the notebook sits a simple run_agent_loop() function. Its job isn’t to be clever… it’s to keep the agent honest.
- Receives a user request and starts an agentic reasoning loop.
- Invokes Claude and immediately records the token usage and estimated cost.
- Executes tool calls whenever the model requests them.
- Feeds tool results back to Claude for the next round of reasoning.
- Enforces guardrails like maximum iterations, output truncation, and spending limits on every loop.
- Stops automatically when the task is complete, the iteration limit is reached, or the budget is exhausted.
The control flow barely changed. The accounting did.
def run_agent_loop(
user_message,
tools,
tool_registry,
system=None,
max_iterations=MAX_SUBAGENT_TURNS,
verbose=False,
):
# Guardrail-wrapped agentic loop. Identical control flow to Domain 1's
# run_agent_loop(), with budget.record() enforced after every call.
messages = [{"role": "user", "content": user_message}]
for iteration in range(1, max_iterations + 1):
kwargs = {
"model": MODEL,
"max_tokens": 1024,
"tools": tools,
"messages": messages,
}
if system:
kwargs["system"] = system
response = client.messages.create(**kwargs)
budget.record(response)
if verbose:
print(f" [iteration {iteration}] stop_reason={response.stop_reason} | {budget.status()}")
if response.stop_reason == "end_turn":
final_text = "".join(b.text for b in response.content if b.type == "text")
return {
"status": "completed",
"text": budget.truncate(final_text),
"iterations": iteration,
"tool_calls": [],
}
if response.stop_reason == "tool_use":
tool_block = next(b for b in response.content if b.type == "tool_use")
tool_function = tool_registry[tool_block.name]
result = tool_function(**tool_block.input)
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_block.id,
"content": json.dumps(result),
}],
})
continue
return {"status": "incomplete", "stop_reason": response.stop_reason, "iterations": iteration}
return {"status": "max_iterations_exceeded", "iterations": max_iterations}
def run_and_get_tool_called(user_message, tools, tool_registry, system=None):
# Helper for the routing-accuracy experiments below: returns the NAME of
# the first tool Claude called, or None if it answered without one.
response = client.messages.create(
model=MODEL,
max_tokens=300,
tools=tools,
system=system,
messages=[{"role": "user", "content": user_message}],
)
budget.record(response)
for block in response.content:
if block.type == "tool_use":
return block.name, block.input
return None, None
print("Final notebook spend:", budget.status())
Final notebook spend: $0.0326 / $0.10 spent across 10 calls

Human Capital Meets Token Capital
A Conversation That Changed My Perspective
The idea of Token Capital didn’t come from my notebook. It came from a conversation with my friend, Karan Kakwani.
Reach Karan,
He casually mentioned that his employer gives every engineer $500 worth of Claude credits every month. Sounds luxurious… until he added that he usually burns through the allocation by the third week. The final week? Waiting for next month’s credits.
What struck me wasn’t the number. It was the mindset.
The New Form of Capital
For years, companies have invested in developer productivity through better laptops, bigger monitors, ergonomic chairs, cloud infrastructure, and generous learning budgets. AI has quietly introduced a new line item.
Thinking budgets.
Those Claude credits aren’t an operational expense. They’re an investment in engineering velocity. They’re a resource that enables experimentation, exploration, and ultimately, better software.
That’s when it clicked.
We’re entering an era where the best engineers won’t be constrained only by Human Capital… their knowledge, creativity, and experience… but also by Token Capital: the reasoning budget available to amplify those abilities.
The Solopreneur’s Reality
Venture Capital vs Personal Capital
Large companies:
- Hundreds or thousands of dollars in monthly AI credits.
Solopreneurs:
- Every unnecessary prompt comes out of personal savings.
Why Spending Smarter Beats Spending More
- Bigger budgets aren’t always available.
- Better accounting is.
Accounting Before Optimization
Everyone talks about:
- Prompt engineering
- Context compression
- RAG
- Model routing
- Caching
Very few start with:
- Measuring spend
The Infrastructure Analogy
The same lesson learned from cloud infrastructure:
Observe first.
Optimize second.
A Bigger Shift Is Happening
We Are Paying for Thinking
This is the first generation of engineers whose daily work has a direct reasoning cost.
You’re no longer paying for execution.
You’re paying for intelligence.

Companies Will Compete on Token Capital
Tomorrow’s engineering perks won’t just be:
- Salary
- Stock options
- Remote work
They’ll also include:
- AI credit allowances
- Better models
- Larger context windows
- Faster inference
Companies that invest in developer reasoning capacity will move faster than those that don’t.
Closing Thoughts
If there’s one lesson this expensive afternoon taught me, it’s this:
Build the accountant before you build the agent.
Because AI agents are a lot like teenagers. Give them intelligence without a budget, and they’ll enthusiastically spend all your money exploring things that seemed like a great idea at the time.
The previous generation obsessed over CPU cycles.
The cloud generation obsessed over infrastructure costs.
Our generation gets to explain to finance why pressing Shift+Enter twelve times somehow cost forty dollars.
Welcome to the age of Token Capital.
Where the smartest engineer in the room isn’t necessarily the one with the biggest context window, the fanciest agent architecture, or the latest model.
It’s the one who can still afford to press Run tomorrow.
- Human Capital: your ideas, creativity, and engineering judgment.
- Token Capital: the fuel that allows those ideas to execute.
Manage both wisely.
Who is Karan Kakwani?
Karan Kakwani is a Machine Learning Engineer and software industry veteran, and an alumnus of the prestigious IIT Kharagpur. Having worked with esteemed companies like Samsung and high-growth startups, he specializes in software engineering, computer vision, data science, and generative AI applications, and his contributions have led to the development of multiple user-centric products benefiting thousands of users. Passionate about bridging the gap between cutting-edge research and real-world applications, Karan is also an educator who shares his expertise through online platforms, helping learners navigate the rapidly evolving landscape of AI and machine learning.