Burning Through My Claude Pro Limit: What the Usage Data Revealed

If you've been using Claude Code heavily, you've probably hit the usage limit on your Pro plan mid-session. I certainly have — multiple times, often at the worst moment: right when the AI is in the middle of a complex refactor and you're watching it piece together the puzzle. The session dies, you start fresh, and you lose momentum.
This is worth separating from the context window — that's about how much text fits in a single conversation before the model can't "see" earlier messages. The usage limit is something different: it's about how many tokens your plan allows you to consume across all your Claude Code sessions. With enough agentic activity, you can burn through it faster than you'd expect.
But instead of just complaining about it, I decided to understand it. I ran claudespend against my Claude Code logs to get the raw numbers. What came back was both surprising and humbling.
The numbers don't lie
Over roughly one month of daily AI-assisted development across five personal projects, here's what I found:
- 258.8 million tokens consumed across 83 sessions and 4,394 queries
- $138.88 API-equivalent cost — what this usage would cost hitting the API directly, with caching
- $804.51 estimated without caching — what it would cost if prompt caching didn't exist
- 66% of all tokens went to a single project: my personal finance app
A quick note on the cost figures: I'm on the Claude Pro plan, so I'm not actually paying per token. Claudespend calculates the API-equivalent bill as a proxy for understanding how much you're consuming, even if you're not paying per query. Think of it less as a bill and more as a consumption gauge.
With that framing, the caching numbers are striking. Claude Code's prompt caching means that conversation history, file contents, and system context are served at 10x lower cost when they haven't changed between turns. The estimated saving — $665 over one month — shows just how much repeated context re-reading is happening under the hood.
But here's the number that made me stop: only 0.6% of my tokens were Claude actually writing responses. The other 99.4% was the model re-reading context before every single reply.
The real problem: token consumption compounds with session length
This is the architectural reality of how LLMs work that most developers don't fully internalise until they see their own data. There is no persistent memory between turns. Every time you send a message — no matter how short — the model processes the entire conversation from the beginning.
Message #5 in a session is cheap. Message #80 is expensive, because Claude is re-reading 79 previous messages, all the file contents it read, all the code it wrote, and every tool call it made.
The data shows this clearly. In short conversations (under 15 messages), each message costs around 23,000 tokens. In long ones (80+ messages), the per-message cost climbs to 68,000 tokens. That is 3x more per message, and it compounds linearly as the session grows.
My worst single day was March 22nd: 53 million tokens in one day, mostly driven by a single mammoth session implementing a multi-workspace feature in the finance app. The session kept growing, each message got more expensive, and eventually I hit the plan's usage limit.
The "go for it" tax
Here's the insight that made me genuinely laugh at myself.
The tool flagged 53 instances of short, vague approval messages: "go for it", "go ahead", "yes", "go go go", "go", "ok, let's go". Across those 53 messages, those tiny instructions consumed 63.2 million tokens.
Why? Because without a specific target, Claude had to do reconnaissance. Each vague "go" triggered a cascade of tool calls — read files, search the codebase, glob for related modules, figure out what "go" actually meant in context — before doing any real work. A message costing 2 tokens of input generated thousands of tokens of exploratory tool use.
The fix is simple in theory, hard in practice when you're in flow: be specific. Instead of "go for it", say "go ahead with the authentication service refactor in src/auth/service.ts, don't touch the tests". Claude gets a clear target, makes fewer exploratory tool calls, and reaches the right answer faster.
The tool-call multiplier
Related to the above: in 47 of my conversations, Claude made roughly 8 tool calls for every message I sent. Each tool call (reading a file, running a bash command, searching the codebase) is a full round-trip that re-reads the entire conversation. Those 47 sessions consumed 225.1 million tokens combined.
Pointing Claude at specific files and line numbers when you know where the problem is makes a measurable difference. "Fix the bug in src/payments/split.ts around line 142" triggers fewer search calls than "fix the split transaction bug" where Claude has to hunt first.
What I'm changing: a practical guide
The data points to four concrete habits. Each one has a before/after drawn from my own actual prompts — because seeing the real thing is more useful than abstract advice.
1. Start fresh more often — and hand off cleanly
The biggest lever is session length. A 5-conversation workflow where each session focuses on one task costs far less than a single 500-message marathon.
The fear is losing context. The solution is to externalise it before you clear. At the end of a session (or at any natural task boundary), ask Claude to write a CONTEXT.md file summarising what was done, what's pending, and any gotchas. The next session opens with that file and picks up with almost zero overhead.
❌ What I was doing (actual prompt from my data):
the plan looks good to me. You have green light to do it till the end.
Don't ask, just code ;)
This ran for 172 continuations in a single session. It consumed over 18 million tokens as the context swelled with every file read, every edit, every bash command piling on top.
✅ Better approach:
Read CONTEXT.md to get up to speed. Your task for this session:
implement the workspace switcher in src/workspaces/Switcher.tsx.
Scope: only touch files under src/workspaces/. When done, update
CONTEXT.md with what was completed and what's next, then stop.
One session, one task, bounded scope, clean handoff. The next session starts fresh with full context in 200 tokens instead of 18 million.
2. Replace vague approvals with specific targets
Every time you send a short approval without a target, Claude does reconnaissance before it does work. That exploration — globbing directories, reading files it already read, searching for the right entry point — costs tokens without producing output.
❌ What I was doing (top offenders from my data):
go
go for it
go ahead
yes
ok, let's go
go go go
53 of these in one month. 63.2 million tokens between them.
✅ Better approach — attach the target:
| Vague | Specific |
|---|---|
go |
Go ahead — implement the budget allocation logic in src/budget/allocate.ts |
go for it |
Go for it. Focus on the split transaction editor in src/payments/SplitEditor.tsx, don't touch the API layer |
yes |
Yes — update the e2e tests in tests/workspaces.spec.ts to cover the new rename flow |
proceed |
Proceed with phase 2 from PLAN.md — only the database migration, leave the UI for later |
The pattern is always the same: approval + file or module + constraint. Claude gets a clear target, makes fewer tool calls, and finishes faster.
3. Point to files, not symptoms
When you describe a bug or feature without a location, Claude hunts. It reads your project structure, searches for likely files, reads those files, and sometimes reads the wrong ones before finding the right ones. Each of those reads re-processes the entire conversation.
❌ What I was doing:
it looks like there is an issue calculating pending to budget
This triggered 5 Grep calls, 10 Read calls, and 1 Glob call just to find where the budget calculation lived — before any fixing happened.
✅ Better approach:
There's a bug in the "pending to budget" calculation.
File: src/budget/Summary.tsx, function: calculatePendingToBudget().
The total account balance (~26K) doesn't match the unspent budget
(~16K) — roughly 10K unaccounted for. Read that function and the
related hook in src/budget/useBudgetSummary.ts, then fix it.
You already know where the bug is — you just found it. Telling Claude saves it 15 tool calls worth of searching.
4. Use /clear at task boundaries, not just at the limit
Most people (myself included) only reach for /clear when they're forced to — when the session gets killed by the usage limit or things just get sluggish. By then you've already paid the compounding token cost across dozens of messages.
The data suggests the inflection point is around turn 19. After that, per-message cost grows noticeably because the accumulated tool call history, file contents, and back-and-forth becomes dead weight.
The workflow I'm adopting:
# End of a task — before clearing
"Before we wrap up: write a CONTEXT.md file with:
1. What we implemented today (bullet points)
2. Files modified
3. Known issues or open questions
4. Suggested starting point for the next session"
# Then: /clear
# Start of next session
"Read CONTEXT.md. Your task: [new specific task]."
This keeps every session focused, context lean, and the handoff costs almost nothing.
The broader lesson
There's a tempting mental model when working with Claude Code where you treat it like a very smart colleague who remembers everything you've discussed. The reality is closer to working with someone who has perfect short-term recall but has to re-read the entire conversation file before they can answer you — every single time.
Once I internalised that model, the optimisations became obvious. Keep sessions short and focused. Be specific. Start fresh when the task changes. Use caching-friendly patterns (long stable system prompts, short user messages with specific targets).
The tools are extraordinary. But like any tool, understanding how they work under the hood lets you use them better.
Current target: keep per-session query counts under 30 and per-session token costs under 5 million. Let's see how long that discipline lasts.