Anthropic Launches Claude 3.5 with Expanded Vision and 200K Context Window

Anthropic has unveiled Claude 3.5 Sonnet, a major upgrade to its Claude AI assistant that significantly boosts both the model’s raw power and its toolset. Billed as the first in a new “Claude 3.5” family, the Sonnet model raises the industry bar for intelligence – outperforming Anthropic’s previous Claude 3 and rival models on many benchmarks – all while operating at twice the speed of its predecessor.

Importantly, Claude 3.5 comes with enhanced vision capabilities and a massive 200,000-token context window, positioning it as one of the most context-aware and versatile AI systems available.

Performance leaps: According to Anthropic, Claude 3.5 Sonnet sets new state-of-the-art results on tests of advanced reasoning and knowledge. It tops the charts in graduate-level reading comprehension (GPQA benchmark) and university-level subject exams (MMLU), indicating it can handle complex, nuanced questions better than before.

In coding, it substantially improved on the HumanEval programming benchmark – a critical measure of writing correct code. Perhaps more tangible for users, Anthropic says Claude 3.5 exhibits a better grasp of nuance, humor, and context in conversation, producing writing that feels natural and relatable.

Despite these gains, it is optimized to run twice as fast as the previous high-end Claude model (Claude 3 “Opus”), making it viable for real-time applications. Moreover, Anthropic is offering Claude 3.5 at a cost comparable to its mid-tier models – $3 per million input tokens – which is remarkable given the jump in capability.

Vision and multimodality: Claude 3.5 Sonnet is described as Anthropic’s “strongest vision model yet,” able to interpret images and graphics better than any prior Claude. In internal evaluations, it surpassed older models on tasks like reading charts, graphs, or diagrams – essentially demonstrating a level of visual reasoning. For instance, if given a bar chart, Claude 3.5 can now analyze it and describe insights (e.g. which category is highest, trends over time) with improved accuracy.

It can also do OCR (optical character recognition) on imperfect images, meaning if you snap a photo of a printed page or a screenshot of text, Claude can transcribe and understand it. These visual skills are not yet on the level of a dedicated vision model like GPT-4’s image understanding, but they mark a significant advancement for Claude.

Notably, Anthropic integrated these features into its Claude iOS app, so mobile users can take a picture with their phone and ask Claude questions about it – a very practical use-case introduced with this update.

This positions Claude to compete in the growing arena of multimodal AI, alongside OpenAI’s GPT-4 (which has vision) and Google’s upcoming Gemini (expected to be multimodal).

Context window 200K: Perhaps the most eye-popping feature is that Claude 3.5 Sonnet can handle up to 200,000 tokens of context. That’s roughly 150,000 words of text it can consider at once – equivalent to about 500 pages of a book. This doubles Claude’s previous record context length (100K) and is several times more than what most versions of GPT-4 offer.

In practical terms, users can feed Claude 3.5 huge collections of text – say, an entire academic thesis with all its references, or months of chat transcripts, or a large codebase – and Claude can analyze it in one session without losing the thread.

This opens up possibilities like comprehensive research assistance: Claude could take in a company’s entire policy manual or a trove of scientific papers and then answer detailed questions that require cross-refrencing across the whole dataset.

It also means fewer truncation issues; users don’t have to worry about their earlier conversation context dropping out as quickly. Anthropic has clearly been prioritizing these long-context capabilities as a differentiator, and with 200K tokens, Claude 3.5 arguably becomes the best AI for long-form and broad-context tasks on the market.

New features – “Artifacts” and collaboration: Alongside the model upgrade, Anthropic is evolving how users interact with Claude. They introduced a feature called Artifacts on the Claude.ai platform. When Claude generates content like a piece of code, an essay, or even a simple image design (in text format), these Artifacts appear in a panel separate from the chat.

Users can interact with them directly – for example, editing a code snippet in the Artifact window or copying it out – while continuing the conversation. This essentially turns Claude into a more collaborative workspace.

Anthropic says this is part of a broader vision to support team collaboration: soon, multiple people within an organization might be able to share a Claude chat, co-editing Artifacts or asking questions together, with Claude tapping into a shared knowledge base.

It’s reminiscent of Google Docs, but with an AI agent as both a collaborator and a sort of document editor. Anthropic’s roadmap hints that down the line entire departments could rely on a shared Claude that “remembers” their projects (via a feature called Memory in development) and assists continuously.

Safety maintained: Given the more powerful capabilities of Claude 3.5, Anthropic took steps to ensure it doesn’t become less safe. They report that despite the model’s greater intelligence, it still remains at AI Safety Level 2 (ASL-2) in their internal classification – meaning it’s not allowed to be used in ways that could pose major risks and it resists instructions to produce disallowed content.

They subjected Claude 3.5 to rigorous red-teaming (testing for misuse), and interestingly, they worked with the UK’s new AI Safety Institute to evaluate the model before deployment. (The UK safety group tested Claude 3.5 and shared results with the U.S.

NIST AI safety team, as part of the transatlantic cooperation on frontier AI oversight.) Anthropic also incorporated feedback from external experts – for example, child safety organization Thorn helped them fine-tune Claude to better detect and avoid harmful content related to minors.

Anthropic’s transparency also continues: they published an updated model card with details on Claude 3.5’s strengths and limitations, and restated that they do not train on user conversations by default without permission, addressing privacy concerns.

What’s next: Claude 3.5 Sonnet is the first of its line; Anthropic announced that Claude 3.5 “Haiku” and “Opus” – likely lighter or heavier versions – will come later in 2024.

This suggests a tiered approach: perhaps Claude 3.5 Haiku might be a faster, cheaper model (for quick responses) and Claude 3.5 Opus could be a larger, more powerful one for complex tasks, giving users choices based on their needs.

Anthropic also teased that they are working on “new modalities” beyond text and vision, which could imply maybe audio (allowing Claude to speak or understand spoken language) or other forms of data.

And integration with enterprise applications is on the horizon – likely deeper tie-ins with tools like Slack, Jira, databases etc., building on the Integrations feature they launched in 2025.

With Claude 3.5’s launch, Anthropic isn’t just playing catch-up with OpenAI; in some areas, it’s sprinting ahead (especially in context length).

The competitive landscape in mid-2024 has thus evolved: OpenAI’s GPT-4 still leads on some evaluations and has far broader usage, but Claude 3.5 is closing the gap and even leading in specific capabilities like context and potentially vision-understanding in enterprise contexts.

For AI enthusiasts and developers, Claude 3.5 offers an exciting new tool – one that can remember more, see more, and respond faster than before. For Anthropic, it’s another step in its mission to create AI that is “steerable” and beneficial – proving that safety and performance can advance hand in hand.

Leave a ReplyCancel Reply