Artificial IntelligenceSunday, April 19, 202610 min read

Claude Opus 4.7: What's New and Why Developers Are Frustrated

Anthropic released Claude Opus 4.7 on April 16, 2026. It comes with real improvements — better vision, task budgets, stronger coding. But the community reacted fast and not entirely well. Here's an honest look at what changed, what regressed, and what the tokenizer controversy actually means for your API bill.

Claude Opus 4.7: What's New and Why Developers Are Frustrated

Anthropic released Claude Opus 4.7 on April 16, 2026. Within 48 hours, a Reddit post titled "Opus 4.7 is not an upgrade but a serious regression" had 2,300 upvotes. On X, a post saying Opus 4.7 showed no improvement over 4.6 collected 14,000 likes.

The community reaction was fast, loud, and — to an extent — justified. But the full picture is more complicated than a simple regression story. There are real improvements in Opus 4.7. There are also real regressions. And there is a tokenizer change that, while not technically a price hike, functions like one in practice.

This post walks through what Opus 4.7 actually delivers, where it falls short of expectations, and what the backlash tells us about the gap between model release cycles and developer trust.

1What Is Claude Opus 4.7?

Claude Opus 4.7 is Anthropic's latest generally available model, released April 16, 2026. It is available through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry — the same distribution channels as previous Opus versions.

Pricing is unchanged from Opus 4.6: $5 per million input tokens and $25 per million output tokens. That's the official number. What the official number doesn't account for — and what much of the backlash is about — is a new tokenizer that changes how those tokens are counted.

Anthropic describes Opus 4.7 as their most capable generally available model to date, built for long-horizon agentic work, knowledge tasks, vision, and memory. That's a broad claim. The benchmarks support some of it. The community's experience supports less of it.

2What's Actually New in Opus 4.7

There are four changes worth understanding before getting into the complaints. Three are genuine improvements.

  • High-resolution image support: Opus 4.7 is Anthropic's first model with true high-resolution vision. Maximum image resolution increased from 1568px (1.15MP) to 2576px (3.75MP). If your application processes images — product photos, scanned documents, charts — this is a real, tangible upgrade.
  • Task budgets: A new feature that lets you give the model a rough estimate of how many tokens to target for a full agentic loop, including thinking steps, tool calls, and final output. This matters for agentic workflows where runaway token usage has been a problem.
  • Stronger software engineering: On advanced coding benchmarks, Opus 4.7 shows meaningful gains over Opus 4.6, particularly on the hardest tasks. Developers doing complex refactors or multi-file code generation report better results here than with 4.6.
  • New tokenizer: Opus 4.7 ships with a redesigned tokenizer. Anthropic says this contributes to improved performance. What it also does is consume more tokens for the same input — something we will address in detail below.

Those first three items are real gains. Anthropic is not lying when they say this is a capable model. The problem is that capability gains in some areas came alongside regressions in others, and the tokenizer change turned a mixed upgrade story into a controversy.

3What Is Adaptive Reasoning — And Why Are People Complaining About It?

Opus 4.7 replaces the Extended Thinking toggle — which let users manually enable deep reasoning — with something Anthropic calls Adaptive Reasoning. The idea is straightforward: instead of you deciding when the model should think hard, the model decides for itself based on how complex it thinks the task is.

In the Claude web and desktop apps, the old Extended Thinking toggle is gone entirely. You cannot manually force high or low effort. The model decides. In Claude Code, the terminal-based developer tool, you can still set effort levels explicitly — low, medium, high, xhigh, and max — with xhigh as the new default.

The backlash here comes from two directions. The first is loss of control: developers who built workflows around reliable extended thinking behavior now have unpredictable reasoning depth. The second is the model's judgment about task complexity not matching the user's judgment. Multiple reports describe Opus 4.7 not thinking deeply on questions where users expected it to — treating something as simple that 4.6 would have reasoned through carefully.

💡 Pro tip

Adaptive Reasoning is mandatory and cannot be overridden in the chat interface. For production API use cases where consistent reasoning depth matters, this is a meaningful behavior change — not a feature upgrade.

4The MRCR Benchmark Collapse

The most documented regression in Opus 4.7 is on the MRCR benchmark, which tests long-context retrieval — how well the model finds and uses specific information buried inside very long documents.

The numbers are hard to dismiss. On MRCR v2 at 1 million tokens, Opus 4.7 scores 32.2%. Opus 4.6 scored 78.3% on the same benchmark — a 46-point drop. At shorter contexts (256k tokens, 8-needle retrieval), performance fell from 91.9% to 59.2%.

Anthropic's response: they say they are phasing out MRCR because it stacks distractors in ways that don't reflect real usage, and point to GraphWalks as a better long-context benchmark where Opus 4.7 actually improves. Whether you find that convincing depends on how much you trust the model maker to choose its own evaluation benchmarks.

The practical implication is real regardless of the benchmark debate: if you run RAG pipelines, document analysis agents, or any workflow that depends on retrieving specific facts from long documents, you should test Opus 4.7 against your actual data before migrating from 4.6. Keep 4.6 available as a fallback.

5The Tokenizer Change: A Stealth Price Increase?

This is the part of the Opus 4.7 story that generated the most sustained developer anger. Understanding it requires a short explanation of how tokenization works.

Language models don't process text character by character. They break text into chunks called tokens — roughly 3-4 characters each on average for English text. You pay per token. When Anthropic ships a new tokenizer, the same text can be split into a different number of tokens than before.

Opus 4.7's new tokenizer can produce up to 35% more tokens for identical input text. The effect is most pronounced on code, structured data like JSON and CSV, and non-English text. For plain English prose, the difference is smaller — often near zero. But for developers sending API requests with code, data schemas, or multilingual content, costs can increase substantially on the exact same workloads.

Anthropic did not raise the stated per-token price. But if the same prompt now generates 1.35x more tokens, your effective cost increased by up to 35%. The developer community's characterization of this as a "stealth price hike" is, at minimum, understandable — even if Anthropic's position is that the tokenizer improves model performance in ways that offset the cost.

💡 Pro tip

If you use Opus 4.6 via API today, test your highest-volume prompts against Opus 4.7 before switching. Measure token counts on identical inputs. On code-heavy or structured-data-heavy requests, budget for a possible 20-35% cost increase before declaring the model "same price."

6Claude Code: False Malware Flags

Claude Code is Anthropic's developer tool for AI-assisted coding — one of their most-used products among software engineers. It got its own wave of complaints after the Opus 4.7 rollout.

Multiple developers reported that Opus 4.7, when used through Claude Code, was flagging routine benign code as malware and refusing to complete basic edits. This includes ordinary file operations, network calls, and standard library usage that 4.6 handled without issue.

Anthropic acknowledged the issue and said it adjusted the default reasoning level in Claude Code after reports came in. The company denied the changes were related to compute constraints or the Mythos development track. Whether the fix resolved all cases is still being tracked in developer communities.

7Specific Failure Examples That Spread Online

Beyond the benchmark regressions and tokenizer math, a set of concrete failure examples circulated on social media and developer forums that made the backlash feel personal rather than statistical.

  • Spelling errors: Opus 4.7 answered that there are 2 P's in "strawberry" — a type of simple factual failure that made earlier models look better.
  • Resume hallucinations: In one documented case, the model rewrote a resume with a different school name and a different surname than the user provided.
  • Self-acknowledged laziness: Screenshots circulated of the model stating it "was acting lazily" in response to a user complaint about incomplete answers.

These are anecdotal and not representative of systematic testing. But they spread because they are specific, verifiable, and embarrassing — and because they fit a narrative that resonated with users who felt performance had declined.

8Is Opus 4.7 Actually Worse Than Opus 4.6?

The honest answer is: it depends on your workload.

For coding tasks — especially advanced, multi-file, complex engineering work — Opus 4.7 appears to be a genuine improvement. Anthropic's internal benchmarks show gains, and developer reports in this area are more positive.

For long-context document retrieval — the MRCR benchmark case — the regression is documented and significant. If your application searches through large documents to find specific information, 4.7 may perform worse than 4.6 on your actual use case.

For workflows that relied on reliable Extended Thinking behavior in the web interface, the shift to Adaptive Reasoning is a real change in how the model works. Whether it is better or worse depends on whether the model's judgment about task complexity matches yours.

The tokenizer change affects everyone sending API requests. The impact varies by content type. It is not a regression in capability, but it is a real cost change that Anthropic did not communicate clearly at launch.

9Anthropic's Position — and Why It Didn't Land

Anthropic's public response to the backlash has been measured. They adjusted Claude Code behavior after the malware-flagging reports. They defended the MRCR regression by pointing to a different benchmark. They maintain that the tokenizer change improves model quality.

None of this is wrong. But the communication gap at launch — no explicit call-out of the tokenizer's cost implications, no warning about the MRCR regression for long-context users — left developers feeling like they were discovering problems rather than being prepared for tradeoffs.

The AI community has a short memory for mixed launches. If Opus 4.7's coding improvements prove out over the next few weeks, the narrative will shift. If the long-context regressions become blockers for real production use cases, the backlash will deepen.

10What This Means If You Are Building on Claude

A few practical steps worth taking before you migrate any production workload to Opus 4.7.

  • Measure token counts on your real prompts: Run your highest-volume API requests through both models and compare token counts directly. Do not assume costs are identical.
  • Test long-context retrieval: If your application sends large documents and asks the model to find specific information, test this against Opus 4.6 before switching. The MRCR regression may or may not affect your use case — but you should find out before it does in production.
  • Check Claude Code behavior: If you use Claude Code in your development workflow, verify that the code patterns you work with are not triggering the false malware flags that were reported post-launch.
  • Evaluate Adaptive Reasoning for your tasks: If you built workflows around extended thinking, test whether Opus 4.7's automatic reasoning allocation gives you equivalent depth on your specific prompts.

Opus 4.6 is not being deprecated immediately. You have time to evaluate before committing.

11A Note on AI Model Launches and Developer Trust

The Claude Opus 4.7 backlash is not purely about this model. It is about a pattern that developers have noticed across AI providers: models get released with marketing framing that emphasizes capability gains, and the regressions and cost changes are left for users to discover.

Developers building production systems need honest information at launch — not after the community has run its own tests. The tokenizer cost change was discoverable. The MRCR regression was measurable. Publishing both clearly at release, alongside the genuine improvements, would have produced a more credible story than the one that played out over 48 hours on Reddit and X.

This is the same principle that matters in data tooling. At Xlork, we work with developers who are importing customer data — CSVs, spreadsheets, XML exports — and the thing they need most is predictable, transparent behavior. When a library silently changes how it parses a date format, or a column mapper starts inferring differently, developers find out in production. That's the worst place to find out. The expectation is the same for AI models: communicate changes clearly, including the ones that cost users money or change behavior they depended on.

💡 Pro tip

Claude Opus 4.7 is a capable model with real improvements in vision and coding. It also has documented regressions in long-context retrieval and a tokenizer change that increases effective API costs. Test before you migrate, and keep 4.6 available as a fallback for workloads where retrieval depth matters.

#csv-import#data-engineering#best-practices#artificial-intelligence

Ready to simplify data imports?

Drop a production-ready CSV importer into your app. Free tier included, no credit card required.