The AI research company Anthropic has released Claude 2, the second generation of its AI chatbot, and for the first time is opening it up for the general public to use.
Claude 2 arrives with a suite of improvements in performance – from math and coding to a greater ability to follow user instructions – as Anthropic positions it as a rival to OpenAI’s ChatGPT and Google’s Bard.
Unlike its predecessor which was available only to a limited set of business clients, Claude 2 is accessible in beta to anyone in the United States and UK via a new public website (claude.ai) and through an API.
Anthropic’s head of go-to-market, Sandy Banerjee, said the company released Claude 2 in beta because “we believe that it’s important to deploy these systems to the market and understand how people actually use them”. In other words, Anthropic wants real-world feedback to further refine Claude.
The API access will allow developers to integrate Claude 2 into their apps, with pricing unchanged from the previous model, and indeed several partner companies – including AI content platform Jasper and coding search tool Sourcegraph – were already piloting Claude 2 before launch.
Claude 2’s skill upgrades are evident in standardized tests. Anthropic reports that Claude 2 can score 76.5% on the multiple-choice section of the bar exam, slightly higher than Claude 1.3’s 73%. It can also pass the multiple-choice portion of the U.S.
Medical Licensing Exam, a notable benchmark of knowledge in medicine. For coding, Claude 2 achieved 71.2% on a Python programming test (Codex HumanEval) – a significant leap from the 56% by the earlier Claude model.
It even improved at solving math word problems, scoring 88% on a set of grade-school math questions (GSM8K). These gains suggest Claude 2 has a better grasp of reasoning and complex problem-solving.
“We’ve been working on improving the reasoning and sort of self-awareness of the model,” Banerjee said, noting Claude 2 is more conscious of following instructions and acknowledging its own limitations.
Users in early tests have found Claude 2’s answers to be more detailed and on-topic, with fewer inexplicable tangents.
One of Claude 2’s standout features is its expanded memory – it boasts a context window of 100,000 tokens, meaning it can remember and take into account far more text from a conversation or document than most AI models.
In practical terms, 100k tokens roughly equals around 75,000 words. This huge context window is the largest of any widely available model at launch, surpassing OpenAI’s GPT-4 32k model which can handle about 25,000 words.
The benefit is that Claude 2 can ingest very large files or long dialogues without losing track of details. For example, you could feed Claude 2 an entire book or a lengthy legal contract, and ask it to summarize or answer questions about specific sections, and it can do so in one go.
Researchers and analysts are excited about this capability – it opens the door to using AI on tasks like scanning through thousands of lines of code, or analyzing transcripts of day-long meetings, where previously one would have to chop the input into smaller pieces for other AI.
In terms of use cases, Claude 2 continues to excel at tasks like summarization, drafting, Q&A, and coding assistance. Anthropic has also tried to make it “friendlier” and more aligned with user needs.
The model has been trained on more recent data (including information up to early 2023) and a portion of non-English content, broadening its knowledge and making it more useful for queries about current events or in languages beyond English.
However, Claude 2 is not connected to the live internet – it can’t browse the web in real time (unlike some versions of ChatGPT with the browsing plugin).
That means it won’t have information on events that happened after its training cutoff, aside from what users provide it. Anthropic emphasizes that this is a deliberate approach to mitigate risks: they want to carefully monitor Claude 2’s behavior before potentially enabling internet access down the line.
Early adopters have compared Claude 2 with ChatGPT (GPT-4) and report pros and cons. Claude 2 is described as “less terse and more conversational,” often giving very lengthy and structured answers. It tends to make its reasoning process visible – a trait of Anthropic’s training method – which some users find insightful, though others might prefer more concise answers.
On the safety front, Claude 2 inherits Anthropic’s Constitutional AI guardrails, meaning it follows a set of guidelines to avoid toxic or illegal content. Tests by users show it generally refuses to engage in extremist or obviously harmful requests, and it sometimes even gives a brief explanation of why it can’t comply (citing its principles).
Notably, because of its large context, some users are leveraging Claude 2 to analyze big data sets or code bases; one Reddit user, for example, had Claude 2 review a long open-source license document for potential issues – something hard to do with other models without splitting the text.
With Claude 2’s launch, Anthropic has firmly planted itself as the main competitor to OpenAI in the large-scale chatbot arena. TechCrunch dubbed Claude 2 “a new text-generating model that is ostensibly improved in several ways” over its predecessor.
The public rollout in the U.S. and UK is cautious – Anthropic is likely testing the waters of broader use, ensuring their safety measures hold up under diverse inputs from millions of users.
As AI enthusiasts and developers experiment with Claude 2, Anthropic will be watching closely. The company’s approach is iterative: learn from real-world usage, then refine the model (and possibly expand availability).
If Claude 2’s debut goes well, we can expect Anthropic to open it to more countries and perhaps even release more powerful versions (Claude 2.1, 2.2, etc.) in the coming months.
For now, users have another state-of-the-art AI chatbot at their fingertips – one with a prodigious memory and a design philosophy that places a premium on helpfulness and harmlessness in equal measure.