Anthropic Unveils Claude Sonnet 4.5, Touting Best-in-Class Coding AI

Anthropic has officially released Claude Sonnet 4.5, a new AI model the company calls “the best coding model in the world”. Aimed at developers and organizations that need serious software assistance, Claude Sonnet 4.5 comes packed with upgrades that let it not only write code, but also use computer tools and even act as a junior software agent for extended periods.

This launch underscores a growing focus in the AI industry on specialized models – in this case, one tuned for programming – and marks Anthropic’s bid to outshine coding assistants like OpenAI’s Codex (which powers GitHub Copilot) and DeepMind’s AlphaCode.

Coding superpowers: According to Anthropic, Sonnet 4.5 has demonstrated remarkable coding abilities. In internal benchmarks, it achieved state-of-the-art results on the SWE-Bench Verified evaluation, which tests how well AI can handle real-world software engineering tasks.

For example, engineers often face multi-step coding problems that require maintaining focus over many hours – Anthropic claims Claude Sonnet 4.5 can stay on track for more than 30 hours on such tasks without losing context. The model also reportedly excels in reasoning and mathematics, which are crucial for debugging and algorithmic thinking.

In one benchmark (OSWorld) that measures how well AI can perform actual computer operations, Sonnet 4.5 now leads with a score of 61.4%, up from 42.2% just a few months prior (when its predecessor Sonnet 4 scored that). This leap suggests rapid progress; Anthropic attributes it to architectural improvements and more extensive training specifically on coding and tool-use data.

New developer tools: Alongside the model, Anthropic rolled out a suite of developer-focused features:

Checkpoints for code – One of the most requested features by users was a way to safely let the AI make changes without the risk of messing up a codebase. Sonnet 4.5 introduces checkpoints in the Claude Code interface. Before Claude applies any edit or writes a block of code, it can save the current state as a checkpoint. If something goes wrong or the developer doesn’t like the direction, they can hit a simple “rewind” command (or press Esc twice) to roll everything back to a previous checkpoint. This effectively gives developers an undo button for the AI’s actions, which is a huge confidence booster for using autonomous coding help. Checkpoints can revert code changes or the conversation state (or both), and are stored locally so developers maintain control. Anthropic recommends still using version control like Git, but checkpoints add an extra layer of safety during live collaboration with Claude.
VS Code Extension – Recognizing that many developers work in Visual Studio Code, Anthropic released a native VS Code extension for Claude. Once installed (currently in beta on the VS Code Marketplace), developers can chat with Claude in a sidebar and get inline code suggestions within their editor. The extension shows diffs in real time – if Claude suggests changing a function, you see those changes highlighted in your code immediately. This tight integration means developers no longer need to copy-paste between a separate Claude terminal and their IDE; it’s all in one interface, making the workflow smoother. Early users say it brings Claude a step closer to feeling like an actual pair programmer looking over your shoulder and typing suggestions in your editor.
Claude Agent SDK (formerly Claude Code SDK) – Anthropic is also encouraging companies to build custom AI agents using Claude’s tech. They rebranded their developer toolkit as the Claude Agent SDK and added support for subagents and hooks. Subagents let one main Claude instance spawn specialized “helper” AIs for specific tasks. For instance, if you’re building an AI to assist with software development, one subagent could handle frontend UI code while another focuses on backend logic – working in parallel. Hooks allow automatic triggers; e.g., after Claude writes code, a hook could run the test suite to verify nothing broke. Combined with background tasks (Claude can now run longer processes without stopping the conversation), these features enable a more autonomous and scalable AI workflow. Anthropic says teams are already using the SDK to build things like cybersecurity code auditors and financial compliance bots. Essentially, Anthropic is giving others the same building blocks it used to create Claude’s coding agent, which could seed an ecosystem of specialized AI “employees” in various domains.

Alignment and autonomy: Claude Sonnet 4.5 is touted as Anthropic’s “most aligned frontier model” to date. That means they’ve put extra work into ensuring it adheres to desired behaviors and ethical guidelines. Coding AIs can be double-edged – they might inadvertently generate insecure code or allow misuse (for example, writing malware if asked).

Anthropic’s Constitutional AI approach and red-teaming aim to prevent such outcomes. They note that Sonnet 4.5 has shown large improvements in avoiding harmful instructions compared to prior Claude models.

It’s also been trained to be aware of when it should defer to a human or not execute certain commands. This is critical given Sonnet 4.5’s autonomy: with features like background tasks and subagents, Claude can do a lot on its own.

But Anthropic has a permissions framework in place (exposed through the SDK) that limits what actions Claude can take on a developer’s machine without explicit approval. For example, if Claude is about to run a shell command that could modify files, it may seek confirmation or be sandboxed depending on settings.

Industry reception: Early testers from companies like Cursor (an AI-enabled IDE) have praised Sonnet 4.5. “State-of-the-art coding performance… significant improvements on longer horizon tasks,” said the CEO of Cursor, noting many developers choose Claude for the toughest problems.

GitHub’s product chief was quoted saying Claude 4.5 “amplifies Copilot’s core strengths,” particularly in understanding multi-step coding instructions and handling code that spans multiple files.

These testimonials, provided by Anthropic, suggest Claude is being used as part of workflows even alongside other tools (like Copilot) to boost their capabilities.

Another early user in the legal tech field (at Casetext’s CoCounsel) noted Claude 4.5’s prowess at reading entire litigation records and drafting summaries or opinions, calling it “state of the art on the most complex litigation tasks”.

This indicates the model’s use cases go beyond traditional software development into any field where large, complex bodies of text need analysis and transformation – which makes sense given coding often overlaps with data parsing and writing formal documents.

Availability and pricing: Claude Sonnet 4.5 is available immediately via Anthropic’s API (developers just call it with the model name claude-sonnet-4-5). In good news for users, Anthropic is not raising the price for using it: it remains at $3 per million input tokens and $15 per million output tokens, identical to the prior Claude Sonnet 4 model.

This competitive pricing – roughly in line with or slightly cheaper than OpenAI’s GPT-4 32K context pricing – could entice budget-conscious developers to switch to or try Claude for large-scale coding tasks.

It’s also available through interfaces like Claude.ai for subscribers, and as noted, through AWS (Amazon Bedrock) and Google Cloud’s Vertex AI marketplace, reflecting Anthropic’s multi-platform strategy.

With Sonnet 4.5, Anthropic isn’t just iterating on Claude – it’s doubling down on the vision of AI as an autonomous collaborator in coding and other complex workflows. The timing is interesting: it launched just ahead of OpenAI’s expected DevDay in late 2025, where some anticipate OpenAI might reveal a GPT-4.5 or GPT-5 with coding enhancements.

By beating them to the punch and focusing on developer experience (VS Code plugin, checkpoints, etc.), Anthropic is trying to position Claude as the preferred tool for the software development crowd. Given how crucial developers are as early adopters and multipliers of AI technology, winning their favor can have outsized effects on an AI platform’s success.

In summary, Claude Sonnet 4.5 is Anthropic’s most powerful and user-friendly AI yet for technical work. It promises to write and refactor code with greater skill, keep projects on track for longer, and do so under the watchful eye of alignment techniques that aim to ensure it’s a helpful teammate, not a rogue agent.

For anyone who has wrestled with a coding problem at 2 AM, the idea of a tireless AI partner that can brainstorm, execute, and even roll back mistakes is certainly appealing. Time will tell how widely it gets adopted, but it’s clear the race to build the ultimate AI coder is accelerating – and Anthropic is intent on leading it.

Leave a ReplyCancel Reply