How to Use Claude with Python (Requests + Examples)

Claude is a powerful large language model (LLM) developed by Anthropic, known for its strong reasoning abilities and safety-oriented responses. Integrating Claude into your Python projects can unlock advanced AI features like summarization, content generation, and conversational intelligence. This guide will walk intermediate Python developers and backend engineers through using the Claude API with Python’s requests library, supplemented by examples using httpx for asynchronous calls.

We’ll cover everything from basic API requests and authentication to error handling, streaming responses, and deploying Claude in real applications (FastAPI services, automation scripts, and scheduled tasks). By the end, you’ll have a comprehensive understanding of how to call Claude’s API in Python and best practices to make it reliable and production-ready.

Target Audience & Prerequisites: This guide assumes you have experience with Python scripting, virtual environments, RESTful APIs, JSON data, and basic error handling. The content is technical and structured, aiming to be both a practical tutorial and a reference for best practices. If you’re comfortable with Python and want to integrate Claude’s AI capabilities into your backend services or automation workflows, this article is for you.

Setting Up Your Python Environment and API Access

Before diving into code, make sure your environment is prepared:

Python Installation: Ensure you have Python 3.7+ installed (Python 3.10 or newer recommended). You can check by running python --version in your terminal.
Virtual Environment: It’s good practice to use a virtual environment for your project to manage dependencies and isolate your setup. For example, create one with python -m venv venv and activate it (source venv/bin/activate on Linux/Mac, or venv\Scripts\activate on Windows).
Install Required Libraries: This guide uses Python’s requests library for HTTP calls. Install it with pip install requests. For optional examples, we’ll also use httpx (for async support) – install with pip install httpx. If working with FastAPI later, also do pip install fastapi uvicorn.
Obtain a Claude API Key: Sign up for an Anthropic developer account and obtain an API key from the Anthropic Console (in the API Keys section of your account). Anthropic may require an application or approval to grant API access, and new accounts come with a free credit to test usage.
Store the API Key Securely: Do not hardcode your API key in your scripts. Instead, store it as an environment variable for security. For example, on Linux/Mac, add a line like export ANTHROPIC_API_KEY="your-key-here" to your ~/.bash_profile or relevant shell profile, then reload it. On Windows, you can use the Environment Variables settings or setx ANTHROPIC_API_KEY "your-key-here" in Command Prompt. This way, your Python code can read the key from the environment at runtime. (Using a .env file with python-dotenv is another convenient approach, if preferred.)

Once you have your API key configured, you’re ready to start making requests to Claude.

Basic Usage: Calling the Claude API with Python `requests`

The Claude API is a RESTful interface that expects HTTP POST requests with JSON payloads. In this section, we’ll construct a basic API call using the requests library, demonstrating how to send a prompt to Claude and receive a completion.

API Endpoint and Authentication: The base URL for Claude’s API is https://api.anthropic.com/v1/messages . Every request must include your API key in the headers. Anthropic’s API expects the header x-api-key: <YOUR_API_KEY> for authentication. It also requires a version header anthropic-version to specify the API version (e.g. anthropic-version: 2023-06-01). Additionally, include Content-Type: application/json to indicate you’re sending JSON data.

Request Payload Format: Claude’s API uses a chat-style Messages API, similar to OpenAI’s chat completions. Your JSON payload should include:

model – the ID of the Claude model you want to use (for example, "claude-3-opus-20240229" for Claude 3 Opus, or a newer model name as appropriate). Anthropic provides different model variants (like Claude 4, Claude Instant, etc.), and model IDs often include version dates – check the Anthropic documentation for the latest model names.
messages – an array of message objects, each with a role and content. Typically you start with a user message. For example: [ {"role": "user", "content": "Hello, world"} ]. Claude will then respond with an assistant message. (If you have a multi-turn conversation, you can include prior messages in this list, including assistant responses, to give context.)
max_tokens – the maximum number of tokens to generate in the response. This bounds the length of Claude’s reply. For instance, max_tokens: 100 would allow up to roughly 100 tokens in the output.
Optional parameters: You can include additional fields like system, temperature, etc. A system message is a special instruction that sets the context or behavior for Claude throughout the conversation (for example, you might set system: "You are an AI writing assistant that responds in pirate slang."). The temperature parameter (0.0 to 1.0) controls randomness – higher values yield more creative or varied outputs, while lower values produce more deterministic results. If not provided, Claude uses a default temperature (often 1.0 or 0.7 depending on model). You can also specify stop_sequences to tell Claude when to stop (e.g., you might set a stop sequence to "\nHuman:" if simulating a chat exchange), but by default Claude will stop at a natural completion or when hitting the token limit.

Let’s put this together in a Python example. We’ll send a simple prompt to Claude and print the response:

import os, requests, json

API_URL = "https://api.anthropic.com/v1/messages"
API_KEY = os.environ.get("ANTHROPIC_API_KEY")  # ensure your env var is set

# Prepare headers for authentication and content type
headers = {
    "x-api-key": API_KEY,
    "anthropic-version": "2023-06-01",        # required API version header:contentReference[oaicite:24]{index=24}
    "Content-Type": "application/json"
}

# Construct the request payload
data = {
    "model": "claude-2.1",  # example model (use a valid model name available to you)
    "max_tokens": 100,
    "messages": [
        {"role": "user", "content": "Hello Claude, can you tell me a joke?"}
    ]
    # We could add "system": "You are a friendly assistant..." or other params if needed
}

# Make the API request
response = requests.post(API_URL, headers=headers, json=data)
# Check for HTTP errors
if response.status_code != 200:
    print(f"Request failed: {response.status_code} - {response.text}")
else:
    result = response.json()  # parse JSON response
    print(result.get("content"))

In this code:

We read the API key from an environment variable for safety. If API_KEY is None, ensure you set the ANTHROPIC_API_KEY environment variable as described earlier.
We define the headers with the API key and version. (Anthropic’s API documentation mandates an x-api-key header and a specific anthropic-version for compatibility. If these are missing or incorrect, you’ll get an authentication error or version error.)
The payload specifies a model, a max token limit, and a messages list with one user message. We keep it simple: the user asks for a joke.
We send a POST request to the v1/messages endpoint with this JSON data.
If the response status is 200 OK, we parse the JSON and print the assistant’s reply (result["content"]). The Claude API response JSON will contain an assistant message with the content of Claude’s answer, along with metadata like stop_reason and token usage. For example, a successful response might look like:

{
  "id": "msg_abcdef1234567890",
  "type": "message",
  "role": "assistant",
  "content": "Sure! Here's a joke: Why did the developer go broke? Because they used up all their cache.",
  "model": "claude-2.1",
  "stop_reason": "stop_sequence",
  "usage": {
    "input_tokens": 9,
    "output_tokens": 22
  }
}

The content field contains Claude’s generated answer (in this case, a joke), and stop_reason might be "stop_sequence" (meaning Claude stopped naturally or encountered a stop sequence). The usage section shows how many tokens were consumed by the prompt and response, which is useful for tracking costs.

Authentication and Security Best Practices

When working with the Claude API (or any API), proper handling of authentication credentials is crucial:

API Key in Headers: As shown, use the x-api-key header to pass your key. (Anthropic allows a Bearer token format as well, but the official spec uses x-api-key.) Keep this key secret – treat it like a password. Do not expose it in front-end code or public repos.
Environment Variables: Use environment variables or a secure key vault to store the API key, rather than hardcoding it. This reduces the risk of accidental exposure of the key in your codebase. For example, in your shell you might run export ANTHROPIC_API_KEY="abc123..." and then access it in Python with os.environ.get("ANTHROPIC_API_KEY").
Multiple Environments: For deployment (staging, production), manage keys via config files or environment-specific settings. Avoid mixing up keys by clearly labeling them, and consider using separate keys for development vs production as needed (Anthropic lets you create multiple API keys).
Rotation and Revocation: If you suspect a key is compromised, revoke it from the Anthropic console (you can disable/delete keys) and generate a new one. Periodic rotation of API keys is a good security practice.
Rate Limits and Quotas: Your API key is associated with usage limits (more on rate limiting below). Keep track of your usage in the Anthropic dashboard to ensure you don’t exceed monthly credit or rate limits.
Avoid Client-Side Exposure: The Claude API doesn’t support direct calls from the browser (it will fail CORS checks). Always route requests through a secure backend where your API key can be kept hidden.

By following these practices, you help ensure that your Claude integration remains secure and reliable.

Handling API Errors and Exceptions

When calling an external API, robust error handling is essential. The Claude API will return standard HTTP status codes to indicate success or various error conditions. Here are common status codes and how to handle them:

200 OK: Success – your request was processed and you received a response. The response body will contain the assistant’s message or other data.
400 Bad Request (invalid_request_error): The request was malformed or missing required fields. This could happen if your JSON is structured incorrectly, a parameter value is invalid, or you hit some format constraint. Check the error message in the response JSON for details.
401 Unauthorized (authentication_error): Your API key is missing, invalid, or not authorized. Ensure the x-api-key header is set correctly and the key is active (if you just generated it, confirm you copied it right).
403 Forbidden (permission_error): The API key is valid but doesn’t have permission for the operation. For example, trying to use a feature your account isn’t allowed to, or using an admin-only endpoint.
404 Not Found (not_found_error): The endpoint or resource wasn’t found. This might occur if the URL is wrong. Double-check the endpoint (/v1/messages is correct for the chat API).
413 Request Too Large (request_too_large): Your request payload is too big. Claude’s API has a maximum request size of 32 MB (which includes your prompt and any attached data). This error can occur if you send an excessively large prompt or file. To fix, reduce the input size (you may need to chunk the input, discussed later) or use the Files API for large documents.
429 Too Many Requests (rate_limit_error): You’ve hit a rate limit. This means you are sending requests too frequently or using too many tokens per minute. The response headers will include a Retry-After value indicating how long to wait before retrying. We’ll discuss strategies to avoid and handle rate limits in the next section.
500 Internal Server Error (api_error): A generic error on Anthropic’s side. You might rarely see this if something goes wrong with the service.
529 Service Overloaded (overloaded_error): Anthropic’s service is temporarily overloaded with requests. This is similar to HTTP 503 but Anthropic uses 529. The best response is to wait a bit and try again. In streaming mode, an overload might be communicated via an SSE error event instead of HTTP status.

When you get an error status, the response body will usually contain a JSON structure with details. For example, a 401 might return:

{
  "type": "error",
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key."
  },
  "request_id": "req_0123456789ABCDEF"
}

The request_id is a unique ID for your call – useful for debugging or when contacting support.

Python Exception Handling: Use try/except around your request call to handle exceptions that the requests library might raise (timeouts, connection errors, etc.). For example:

import requests, time

try:
    resp = requests.post(API_URL, headers=headers, json=data, timeout=10)
    resp.raise_for_status()  # Raise HTTPError for bad status codes (4xx/5xx)
except requests.exceptions.HTTPError as http_err:
    status = resp.status_code
    if status == 401:
        print("Authentication failed – check your API key.")
    elif status == 429:
        retry_after = resp.headers.get("Retry-After", "30")  # default to 30s if not provided
        print(f"Rate limit hit. Retry after {retry_after} seconds.")
    else:
        # Print error message from Claude if available
        try:
            err_info = resp.json()
            err_msg = err_info.get("error", {}).get("message")
        except ValueError:
            err_msg = resp.text  # not JSON, use raw text
        print(f"HTTP {status} error: {err_msg}")
except requests.exceptions.Timeout:
    print("Request timed out. The operation took too long.")
except requests.exceptions.RequestException as e:
    # Catch-all for other request errors (network issues, etc.)
    print(f"Request failed: {e}")
else:
    # No exception, proceed to process response
    result = resp.json()
    print("Claude response:", result.get("content"))

In this snippet:

We set a timeout of 10 seconds on the request. If Claude hasn’t responded by then, requests will raise a Timeout exception, which we catch separately. (Timeouts are important to prevent hanging indefinitely – more on this in the next section.)
We use resp.raise_for_status() to throw an HTTPError if a bad status code was returned. We then inspect resp.status_code to handle specific errors.
For 429, we read the Retry-After header to know how long to wait. (Anthropic uses a token-bucket rate limiting; exceeding the limit triggers 429 with a retry window.)
We attempt to parse the error message from the JSON (if present) to print a meaningful message. Anthropic’s errors usually have an "error": {"message": "..."} field.
We separate handling for Timeout and a generic RequestException which could indicate a network error (DNS failure, no internet, etc.).

By implementing error handling, your integration can gracefully inform you (or the end-user) what went wrong and possibly recover (for example, by retrying after a delay when rate-limited or on transient network errors).

Retry Logic and Rate Limiting Strategies

When working with an API in production, you should plan for retries in case of transient failures or rate limits. However, retries should be done carefully to avoid exacerbating the problem (like a thundering herd of retries). Let’s break this down into two concerns: rate limiting and general retry strategy.

Dealing with Rate Limits

Anthropic imposes rate limits on the number of requests and tokens per minute your organization can use. If you exceed these, you’ll get HTTP 429 errors (rate_limit_error). The response will include a Retry-After header specifying how many seconds to wait before retrying. Here are some strategies to handle and avoid hitting the limits:

Respect Retry-After: If you receive a 429, always pause for at least the duration specified in the Retry-After header before retrying. The API is explicitly telling you the cooldown time.
Exponential Backoff: For multiple consecutive 429s or 529s (overload), implement exponential backoff in your retries. For example, wait 1 second, then 2, then 4, etc., up to a maximum interval. This gives the service time to recover and avoids constant hammering.
Token-Based Throttling: Keep track of how many tokens you’re sending/receiving per minute. Claude’s response includes a usage field with token counts, and headers like anthropic-ratelimit-tokens-remaining might be present in responses to indicate your remaining quota. If you are processing very large prompts or outputs, insert deliberate pauses between calls to stay under token-per-minute limits.
Request Rate Throttling: Similarly, if you know your plan allows (for example) 60 requests per minute, avoid bursts above 1 request/second. You can enforce a short sleep (e.g., time.sleep(0.2)) between calls in a tight loop, or use a rate limiting utility or queue system in your application.
Batch Requests if Possible: Anthropic offers a Batch API to send multiple prompts in one request (with cost savings and shared overhead). If you have the need to process many small prompts, batching them into one API call can be more efficient and less likely to hit rate limits (since it counts as one request). However, using the Batch API is beyond the scope of this article (and it may have its own token limit per batch). For straightforward use, you can also just collect results concurrently as shown with async later.
Monitor Usage: Use the Anthropic Console’s usage dashboards or the API usage reports to monitor your consumption. This helps anticipate hitting quotas before they become an issue. You can also programmatically check usage if needed (Anthropic provides usage/cost endpoints).
Graceful Degradation: In a production app, decide what to do if the rate limit is reached – perhaps queue the request for later processing, or return a friendly error message to the user indicating the service is busy and to try again later.

Implementing Retries in Code

We partially demonstrated retry logic in the error handling snippet above. Here’s a more complete example using a simple loop for retries:

import time

MAX_RETRIES = 3
backoff = 1  # starting backoff in seconds

for attempt in range(1, MAX_RETRIES+1):
    try:
        resp = requests.post(API_URL, headers=headers, json=data, timeout=10)
    except requests.exceptions.RequestException as e:
        # Network or other low-level error (could retry or break depending on error)
        print(f"Attempt {attempt}: Request failed due to network error: {e}")
        success = False
    else:
        if resp.status_code == 200:
            result = resp.json()
            print("Got response:", result.get("content"))
            success = True
            break  # exit loop on success
        elif resp.status_code == 429:
            # Rate limit hit – get retry-after if available
            retry_after = int(resp.headers.get("Retry-After", "1"))
            print(f"Rate limited. Waiting {retry_after} seconds before retry...")
            time.sleep(retry_after)
            success = False
        elif resp.status_code == 529:
            # Overloaded – treat similarly by waiting
            wait_time = backoff * 2
            print(f"Server overloaded (529). Backing off for {wait_time}s...")
            time.sleep(wait_time)
            backoff *= 2  # exponential backoff
            success = False
        else:
            # Some other error, print and decide not to retry (could retry on 500 if desired)
            err_info = resp.text
            print(f"Error {resp.status_code}: {err_info}")
            success = False
            break  # break out on non-retriable error
    # small delay before next attempt to avoid tight loop (if not already sleeping)
    time.sleep(0.1)

if not success:
    print("Failed to get a successful response after retries.")

Key points in this retry logic:

We limit the number of attempts (MAX_RETRIES) to avoid infinite loops.
On a 429, we strictly wait the Retry-After seconds before retrying.
On a 529, we apply an exponential backoff (starting at 1s, then 2s, etc.). The code above uses a doubling strategy for simplicity.
We consider network errors (RequestException) as potentially transient and we do retry them, but you might decide to break immediately on certain errors (e.g., DNS not found probably shouldn’t retry immediately).
For other HTTP errors (400, 401, 403, etc.), we break out without retry because those won’t succeed with retries unless something external changes (e.g., fixing a bug or waiting for permissions).
Between attempts, we add a tiny sleep (0.1s) to avoid a tight loop in case an error is thrown very quickly.

Note: In a more advanced setup, you might use a library like Tenacity (which simplifies retry logic with decorators) or leverage requests.adapters.Retry with urllib3. But the above manual approach makes it clear what’s happening and is customizable to Claude’s specific statuses.

Processing Large Inputs with Chunking

Claude’s models (especially the latest versions) support very large context windows (potentially up to 100K or even more tokens in some cases). However, you may still encounter scenarios where your input data (or conversation history) is too large to send in a single API call or approaches the limits of what’s practical. Also, remember the 32 MB request size limit – sending extremely large payloads might hit that cap and get a 413 error. In such cases, you need to chunk your input and process it piecewise.

When to Chunk: If you’re feeding a long document (e.g., a book or lengthy report) to Claude for summarization or QA, or if you have a huge conversation history, it might be wise to break the text into smaller chunks rather than one giant prompt. Chunking can also improve response speed and allow you to parallelize work (summarizing multiple sections concurrently).

Basic Chunking Strategy:

Split the Input: Divide your text into manageable chunks. The chunk size could be based on characters, words, or tokens. For example, you might split by paragraphs or by a fixed number of tokens (say 1000 tokens per chunk). Make sure splits occur at sensible boundaries (e.g., end of a sentence or paragraph) to avoid cutting in the middle of a thought.
Process Each Chunk: For each chunk, send a prompt to Claude. What the prompt is depends on your task. If summarizing, you might prompt Claude with something like: “Summarize the following text:\n<CHUNK_TEXT>”. If doing question-answering, you might do: “Given the following context, answer the question X:\n<CHUNK_TEXT>”. Each chunk will yield a partial result (e.g., a summary of that chunk or an answer).
Combine or Further Process Results: If you got summaries of each chunk, you might then combine those summaries and ask Claude to summarize the summaries (a recursive approach to handle very large documents). If you got answers from each chunk and need a final answer, you might need to decide which chunk’s answer is best or ask Claude to synthesize from all answers.
Iterate or refine: Sometimes you might do multiple passes – e.g., first extract key points from each chunk, then feed those into another Claude call to get an overall summary or conclusion.

Example – Summarizing a Large Text File: Suppose we have a large log file or document and we want Claude to summarize it. We’ll read the file, chunk it by lines, and have Claude summarize each chunk, then summarize the summaries:

import math

# Read a large text file
with open("large_report.txt", "r") as f:
    text = f.read()

# Split into roughly N chunks
N = 5  # say we want 5 chunks
lines = text.splitlines()
chunk_size = math.ceil(len(lines) / N)
chunks = [ "\n".join(lines[i:i+chunk_size]) for i in range(0, len(lines), chunk_size) ]

summaries = []
for i, chunk in enumerate(chunks, start=1):
    prompt = f"Text chunk {i} of {len(chunks)}:\n\"\"\"\n{chunk}\n\"\"\"\n\nProvide a concise summary of the above text."
    data = {
        "model": "claude-2.1",
        "max_tokens": 300,
        "messages": [ {"role": "user", "content": prompt} ]
    }
    resp = requests.post(API_URL, headers=headers, json=data)
    resp.raise_for_status()
    summary = resp.json().get("content")
    summaries.append(summary)
    print(f"Chunk {i} summary:\n{summary}\n{'-'*40}")

# Combine summaries and get an overall summary
overall_prompt = "Here are summaries of parts of a report:\n" + "\n\n".join(summaries) + "\n\nProvide an overall summary of the report."
data = {
    "model": "claude-2.1",
    "max_tokens": 200,
    "messages": [ {"role": "user", "content": overall_prompt} ]
}
resp = requests.post(API_URL, headers=headers, json=data)
resp.raise_for_status()
overall_summary = resp.json().get("content")
print("Overall Summary:\n", overall_summary)

In this example, we manually divided the text into 5 chunks. Depending on the length, you might choose chunk sizes by token count instead (there are libraries like tiktoken for token counting, but here we did by lines for simplicity). Each chunk is summarized, then we ask Claude to summarize the summaries. This technique allows us to handle texts much longer than Claude’s single-run capacity by recursively condensing information. It’s a common approach to tackling long documents with LLMs.

Important: When chunking, be mindful of context overlap. If each chunk has no overlap, Claude sees them in isolation. For some tasks (like Q&A), you might miss references that cross chunk boundaries. A strategy is to include some overlap between chunks (e.g., last sentence of chunk i in chunk i+1) or to include a brief recap when moving to the next chunk (“Previously, the text discussed XYZ. Now continue with…”). However, overlapping too much can introduce redundancy and extra token usage. Tailor the approach to your use case.

Also note, Anthropic has introduced a Files API and features like a 1M-token context window for certain models. In some cases, uploading a file via the Files API (if available to you) might be better than sending a huge prompt. The Files API lets Claude refer to an uploaded document without including the full content in each request. Since we’re focusing on direct requests usage and not relying on SDKs, we won’t deep dive into that, but it’s good to know such options exist for handling large data.

Streaming Responses from Claude

By default, when you call Claude’s API, the response is returned only after the completion is fully generated. However, Claude supports streaming responses, meaning the API can send back tokens incrementally as they are generated. This is useful for real-time applications or simply to start processing the output sooner (especially for long answers). Streaming in Claude’s API is implemented via Server-Sent Events (SSE), which is a standard for real-time, event-driven data over HTTP.

To receive a streamed response, you need to include "stream": true in your request JSON. When stream is enabled, the HTTP response will have Content-Type: text/event-stream and Claude’s answer will come in chunks as a series of SSE events rather than one JSON blob.

How SSE Streaming Works: SSE sends events in text form. Each event is separated by a double newline (\r\n\r\n). An event typically has an event: <event_name> line and a data: <json_payload> line. For Claude streaming, you will receive events like:

event: message_start – indicates the start of the assistant’s message (content might be empty here).
A sequence of content_block_delta events – these contain partial pieces of Claude’s reply. Each such event’s data will include a snippet of text (e.g., a few tokens).
Periodically, message_delta events – these provide updates to the overall message object (like updated token usage so far, or stop reason if it ends).
event: message_stop – indicates the end of the streaming response.
You may also see ping events (keep-alives) or error events if something goes wrong mid-stream.

Consuming Streaming Responses in Python (requests): The requests library can handle streamed responses by setting stream=True in the request. Then you can iterate over the response’s content in chunks. However, parsing SSE format manually can be a bit involved. Essentially, you read line by line (using response.iter_lines()) and buffer until you hit a blank line that denotes the end of an event. There are libraries like sseclient that can help parse SSE streams from requests, or you can use the Async approach with httpx (which we’ll show next) for a cleaner solution.

For illustration, here’s how you might use requests to stream:

response = requests.post(API_URL, headers=headers, json={**data, "stream": True}, stream=True)
if response.status_code != 200:
    print("Stream request failed:", response.text)
else:
    for line in response.iter_lines(decode_unicode=True):
        if line is None or line.strip() == "":
            # empty line indicates event separation
            continue
        if line.startswith("data:"):
            json_payload = line[len("data:"):].strip()
            if json_payload == "[DONE]":
                # In some APIs like OpenAI, [DONE] marks end (Anthropic may use message_stop instead)
                continue
            try:
                event = json.loads(json_payload)
            except json.JSONDecodeError:
                continue
            # event is a dict with fields like "type", etc.
            if event.get("type") == "content_block_delta":
                delta = event.get("delta", {})
                if delta.get("type") == "text_delta":
                    text_chunk = delta.get("text", "")
                    print(text_chunk, end="", flush=True)  # print partial text
            elif event.get("type") == "message_stop":
                print("\n[Stream complete]")

This code is a bit low-level: it looks for lines starting with data:, then parses the JSON after it. Claude’s SSE events provide JSON containing the content deltas. For simplicity, we print out text as it comes. In a real app, you might update a UI incrementally with this output. Notice we check for event["type"] == "content_block_delta" with a text_delta inside – that’s where the actual text is. Other event types (like message_delta for usage or final message_stop) we handled minimally. (Anthropic doesn’t use the data: [DONE] convention that OpenAI does; instead you look for the message_stop event.)

One caveat: if a streaming connection is open for a long time, you might need to handle network hiccups or timeouts. Anthropic recommends using streaming for long responses, or using their batch API to avoid long single requests. Also, some networks/proxies may buffer or disconnect long SSE streams, so ensure your infrastructure supports SSE.

Async Streaming with httpx: If you prefer an asynchronous approach, the httpx library combined with an SSE client can make things cleaner. For example, using the httpx_sse extension, you can do:

import httpx, asyncio
from httpx_sse import aconnect_sse

async def stream_claude(prompt):
    data = {
        "model": "claude-2.1",
        "messages": [ {"role": "user", "content": prompt} ],
        "max_tokens": 200,
        "stream": True
    }
    headers = {
        "x-api-key": API_KEY,
        "anthropic-version": "2023-06-01",
        "Content-Type": "application/json"
    }
    async with httpx.AsyncClient() as client:
        async with aconnect_sse(client, "POST", API_URL, json=data, headers=headers) as event_source:
            async for event in event_source:
                if event.event == "content_block_delta":
                    delta = event.data.get("delta", {})
                    if delta.get("type") == "text_delta":
                        print(delta.get("text", ""), end="", flush=True)
                elif event.event == "message_stop":
                    print("\n[done]")

# Run the async function
asyncio.run(stream_claude("Hello Claude, tell me a story."))

Here, httpx_sse.aconnect_sse handles the SSE connection for us, yielding event objects where event.event is the SSE event name and event.data is already parsed JSON. We then filter for the same event types and print out text chunks. Using asyncio allows your program to remain responsive (or handle multiple streams) while waiting for data.

When to use streaming? Whenever the latency of getting the first part of the answer is important (e.g. interactive chat UIs), or the answer might be very large. For example, if you use Claude to generate a long report or piece of code, streaming lets you start showing progress or partial output to the user. If you’re just doing backend processing where waiting for the full response is fine, streaming might not be necessary. Keep in mind that streaming responses require your client to keep the connection open and process events, which adds complexity.

Using `httpx` for Asynchronous and High-Throughput Calls

While requests is simple and works well for synchronous use, sometimes you need to call the Claude API many times or handle multiple requests in parallel. For instance, you might be processing a batch of data or handling concurrent user requests in a web service. Python’s httpx library is a great alternative that supports asyncio for concurrency and generally has better performance for multiple requests (it supports HTTP/2 and connection pooling out of the box).

Here’s how you can use httpx in async mode to send multiple requests concurrently:

import httpx, asyncio

API_URL = "https://api.anthropic.com/v1/messages"
headers = {
    "x-api-key": API_KEY,
    "anthropic-version": "2023-06-01",
    "Content-Type": "application/json"
}

async def ask_claude(prompt):
    """Send a single prompt to Claude and return the assistant's response."""
    data = {
        "model": "claude-2.1",
        "max_tokens": 100,
        "messages": [ {"role": "user", "content": prompt} ]
    }
    async with httpx.AsyncClient(timeout=10) as client:
        resp = await client.post(API_URL, json=data, headers=headers)
        resp.raise_for_status()
        result = resp.json()
        return result.get("content")

async def main():
    prompts = [
        "Give me a quick fact about Python programming.",
        "What's the capital of France?",
        "Summarize the philosophy of Stoicism in one sentence."
    ]
    # Schedule all requests concurrently
    tasks = [ask_claude(p) for p in prompts]
    responses = await asyncio.gather(*tasks, return_exceptions=True)
    for i, res in enumerate(responses, 1):
        if isinstance(res, Exception):
            print(f"Prompt {i} failed: {res}")
        else:
            print(f"Prompt {i} answer: {res}")

# Run the async main function
asyncio.run(main())

In this code:

We define ask_claude(prompt) as an async function that creates an httpx.AsyncClient and posts to the Claude API. The logic is similar to before, but using await for the post request.
In main(), we prepare a list of example prompts. We create a list of coroutines (tasks) to handle each prompt by calling ask_claude for each.
We use asyncio.gather to run them all in parallel. This will send all requests nearly simultaneously, rather than one after the other. Under the hood, httpx will manage multiple connections (and reuse them if possible) to handle the concurrency.
We collect the results. If any task raised an exception (like an HTTP error we didn’t catch inside ask_claude), gather returns that as an Exception in the results, so we check for that and print an error.
Each successful result is printed.

By running the three prompts concurrently, the overall time taken will be roughly the slowest single API call, rather than the sum of all three. This approach can dramatically speed up batch processing. For example, if one Claude call takes ~1 second, running 10 of them sequentially takes ~10 seconds, but with asyncio they might still complete in ~1–2 seconds (plus overhead), assuming your CPU and network can handle it and you aren’t hitting rate limits.

Performance considerations: If you have a very high volume of requests, be mindful of:

Connection limits: httpx’s AsyncClient will default to a certain number of connections. You can configure a limits=httpx.Limits(max_connections=..., max_keepalive_connections=...) if needed.
HTTP/2: Anthropic’s API supports HTTP/2, which httpx uses by default if available. This can allow multiple requests to reuse a single TCP connection simultaneously (multiplexing), which is efficient.
Error Handling: In the async example, we relied on raise_for_status() to propagate HTTP errors as exceptions. We might enhance ask_claude to catch and handle specific status codes similar to the earlier section.
Timeouts: We set a 10-second timeout on the client. If a request takes longer, it will raise httpx.TimeoutException. Adjust timeouts as appropriate (especially if you expect Claude to sometimes take longer for big requests or if using streaming which could be open for minutes).

In summary, httpx allows you to boost throughput with concurrency. If you integrate Claude into a web framework like FastAPI (which is async by nature), using httpx (or the official Anthropic SDK’s async methods) will let you take advantage of non-blocking IO so your server can handle other requests while waiting for Claude’s response.

Integrating Claude with a FastAPI Web Service

One common scenario is to wrap Claude’s functionality behind your own API – for example, a FastAPI service that receives requests from a frontend or other services, calls Claude, and returns results. FastAPI is a popular Python web framework for building APIs quickly, and it works well with async code (though you can also call sync code in threads). We’ll demonstrate a simple FastAPI app that exposes an endpoint to query Claude.

Setup: Ensure you have FastAPI and Uvicorn installed (pip install fastapi uvicorn). We’ll assume the same environment variable ANTHROPIC_API_KEY is set for the API key.

Example FastAPI App (sync example):

from fastapi import FastAPI, HTTPException
import os, requests

app = FastAPI()
API_URL = "https://api.anthropic.com/v1/messages"
API_KEY = os.environ.get("ANTHROPIC_API_KEY")
HEADERS = {
    "x-api-key": API_KEY,
    "anthropic-version": "2023-06-01",
    "Content-Type": "application/json"
}

@app.post("/claude-complete")
def claude_complete(prompt: str, max_tokens: int = 100):
    """
    Accepts a prompt and returns Claude's completion as JSON.
    """
    if not API_KEY:
        # If key is not set, return a server error
        raise HTTPException(status_code=500, detail="API key not configured")
    payload = {
        "model": "claude-2.1",
        "max_tokens": max_tokens,
        "messages": [ {"role": "user", "content": prompt} ]
    }
    try:
        resp = requests.post(API_URL, headers=HEADERS, json=payload, timeout=15)
    except requests.exceptions.RequestException as e:
        raise HTTPException(status_code=502, detail=f"Failed to reach Claude API: {e}")
    if resp.status_code != 200:
        # propagate error from Claude to client
        detail = resp.text
        try:
            err = resp.json()
            detail = err.get("error", {}).get("message", detail)
        except ValueError:
            pass
        raise HTTPException(status_code=resp.status_code, detail=f"Claude API error: {detail}")
    data = resp.json()
    return {"prompt": prompt, "completion": data.get("content")}

This FastAPI endpoint /claude-complete expects a form data or JSON field prompt (FastAPI will parse query parameters or JSON body automatically depending on how you send the request; for simplicity we used a function parameter). It then calls Claude synchronously. Key points:

We prepare the payload and use requests.post just like before.
If the request fails due to network issues, we return an HTTP 502 (Bad Gateway) to indicate an upstream failure.
If Claude returns a non-200 status, we capture the error and return an HTTP error with Claude’s message. For instance, if you exceed your quota and Anthropic returns 429, the API user will get a 429 with message “Claude API error: <message>”.
On success, we return a JSON containing the original prompt and the completion. (You might structure your API response differently, e.g. just return the completion text directly. Here we show both for clarity.)

To run this FastAPI app, save it as main.py and run uvicorn main:app --reload. You can then test it (e.g. with curl or a REST client):

curl -X POST "http://localhost:8000/claude-complete" -H "Content-Type: application/json" \
  -d '{"prompt": "Hello Claude, how are you?"}'

The response should be a JSON with Claude’s reply.

Using Async in FastAPI: The example above uses a normal def (which FastAPI treats as sync and runs in a threadpool). Alternatively, you can define it with async def and use httpx inside for non-blocking behavior:

@app.post("/claude-complete-async")
async def claude_complete_async(prompt: str, max_tokens: int = 100):
    payload = { ...same as above... }
    async with httpx.AsyncClient() as client:
        try:
            resp = await client.post(API_URL, headers=HEADERS, json=payload, timeout=15.0)
        except httpx.RequestError as e:
            raise HTTPException(status_code=502, detail=f"Claude API connection error: {e}")
    if resp.status_code != 200:
        # similar error handling
        ...
    data = resp.json()
    return {"completion": data.get("content")}

Using async avoids blocking the event loop on I/O. If your service expects high load or you want to handle many requests concurrently, the async approach is more scalable. On the other hand, if calls to Claude are infrequent or latency is not critical, the simpler sync approach is fine. (FastAPI will handle executing the sync function in a separate worker so it doesn’t block other requests.)

Logging and Middleware: In a production API, you’d want to log requests and responses, both for monitoring usage and debugging. FastAPI allows middleware to intercept requests. For example, to log each request’s method, URL, and processing time:

import time, logging

logger = logging.getLogger("claude_service")
logging.basicConfig(level=logging.INFO)

@app.middleware("http")
async def log_requests(request, call_next):
    start_time = time.time()
    response = await call_next(request)
    duration = time.time() - start_time
    client_ip = request.client.host
    logger.info(f"{request.method} {request.url.path} - {response.status_code} [{client_ip}] completed in {duration:.2f}s")
    return response

This middleware logs the method, path, status, client IP, and time taken for each request. You can expand this to log request bodies or response content for debugging (just be careful with logging sensitive data or too much information).

Configuration Management: For an actual deployment, instead of scattering API_KEY and HEADERS in your code, you might use a config file or environment variables. FastAPI can read environment variables (like our ANTHROPIC_API_KEY) directly, or you can use a library like Pydantic’s BaseSettings to manage configuration. The key point is to ensure your API key is available to the app (for example, if you use Docker, you’d pass the env var into the container).

Scaling Considerations: Hosting a Claude-integrated service means considering:

Throughput vs Rate Limits: If your FastAPI endpoint is called by many clients, ensure you handle the Anthropic rate limits (you might need a queue or a shared throttle if it’s high volume).
Latency: Each request to Claude adds latency (hundreds of milliseconds to seconds). If your app requires faster responses or many calls, you might need to design around that (maybe by caching results or using smaller models for certain queries).
Error Handling for Clients: Translate Anthropic errors into meaningful HTTP responses as we did, so that clients of your API know what happened (e.g., 429 Too Many Requests if your service itself is overloaded or hit the upstream limit).

FastAPI integration allows you to embed Claude in web applications, chatbots, or microservices easily. With the examples above, you can build endpoints that harness Claude’s intelligence within a standard API that your frontend or other systems can consume.

Automation and Scheduling: Running Claude API Tasks in Scripts

Beyond interactive use and web services, a powerful way to use Claude is in automated scripts – for instance, to generate reports daily, process logs, or periodically analyze data. Python’s versatility makes it easy to set up such automation. Here we discuss how to incorporate Claude API calls into scripts and schedule them to run regularly (cron jobs or scheduled tasks).

Example Use Cases:

Daily Report Generation: Every day, summarize yesterday’s sales or support tickets using Claude and email it to your team.
Log Analysis: Every hour, have Claude read a log file and extract anomalies or important events.
Content Updates: On a schedule, use Claude to draft social media posts or summaries of new articles from a feed.
Batch Data Processing: Process a batch of inputs (e.g., a list of customer reviews) to produce outputs (e.g., sentiment analysis or summaries) in an offline manner.

Writing a Script for Claude API Calls

Suppose we want Claude to summarize a certain file daily. We can write a Python script daily_summary.py:

import os, requests, datetime

API_URL = "https://api.anthropic.com/v1/messages"
API_KEY = os.environ.get("ANTHROPIC_API_KEY")
if not API_KEY:
    raise RuntimeError("Missing API key! Set ANTHROPIC_API_KEY environment variable.")

headers = {
    "x-api-key": API_KEY,
    "anthropic-version": "2023-06-01",
    "Content-Type": "application/json"
}

# Define the target file (yesterday's log, for example)
yesterday = (datetime.date.today() - datetime.timedelta(days=1)).strftime("%Y-%m-%d")
log_file = f"/var/log/myapp/log-{yesterday}.txt"

if not os.path.exists(log_file):
    print(f"No log file for {yesterday}, exiting.")
    exit(0)

with open(log_file, "r") as f:
    log_text = f.read()

prompt = f"Summarize the following application log and highlight any errors or unusual events:\n\n{log_text}"
payload = {
    "model": "claude-2.1",
    "max_tokens": 300,
    "messages": [ {"role": "user", "content": prompt} ]
}

try:
    resp = requests.post(API_URL, headers=headers, json=payload, timeout=30)
    resp.raise_for_status()
except requests.RequestException as e:
    print(f"Error calling Claude API: {e}")
    exit(1)

summary = resp.json().get("content")
# Save the summary to a file (or send it via email, etc.)
out_file = f"/var/log/myapp/summary-{yesterday}.txt"
with open(out_file, "w") as f:
    f.write(summary or "")
print(f"Summary for {yesterday} saved to {out_file}")

This script does the following:

Loads the API key and checks that it’s set.
Determines yesterday’s date and constructs a log file path (assuming logs are named by date).
If the file exists, reads its content.
Creates a prompt asking Claude to summarize the log and point out issues.
Calls Claude API with a generous timeout (30 seconds, since logs might be long and summarization could take a bit).
If successful, writes the summary to an output file. (You could extend this to email the summary or upload it somewhere.)
If there’s an error calling Claude (network issue or API error), it prints an error and exits with a non-zero status (so cron can detect a failure via exit code if needed).

Scheduling the Script with Cron

To have this run daily, you can use cron (on Linux/Mac). Edit your crontab with crontab -e and add a line like:

0 6 * * * /usr/bin/python3 /path/to/daily_summary.py

This would run the script every day at 6:00 AM. Adjust the schedule and paths as needed. Make sure the environment variable ANTHROPIC_API_KEY is available in the cron environment – you might need to export it at the top of the crontab or source a profile. Alternatively, you can embed the key in a config file the script reads (but keep it secure).

For Windows, you’d use Task Scheduler to run the script on a schedule (or use a third-party scheduler, or the schedule Python package to create a long-running script that sleeps and runs tasks at intervals).

Batch Processing in Scripts

If you have many items to process (say hundreds of files, or a CSV with thousands of rows needing an AI-generated field), you can still use Claude in a script, but consider the rate limit and cost. Here are some tips:

Process in batches with pauses if needed to avoid 429 errors (you can reuse the Rate Limiting strategies discussed).

If each item is small, you might use the Batch API to send e.g. up to 10 prompts in one request to save overhead. Without the official SDK, you’d call the batch endpoint (/v1/batches) with a list of messages. But a simpler approach if not using that: combine multiple small prompts into one prompt that Claude can handle at once (though that might complicate parsing results).

Use asynchronous calls (like with httpx) in a script to speed it up. You can use asyncio.run() even in a normal script (as shown above) to concurrently process items. Just be careful to not spawn too many concurrent tasks – you might want to limit concurrency (e.g., process 5 or 10 at a time) to avoid hitting the API too hard or running out of memory.

Monitor partial results. If your script is long-running, write results to disk as you go, so if it crashes in the middle, you don’t lose everything. Also log any failures to revisit later.

Example: Batch Summarizing Multiple Files

Imagine you have a folder of text files and you want summaries for each:

import glob, asyncio, httpx

files = glob.glob("/data/reports/*.txt")

async def summarize_file(path):
    with open(path, "r") as f:
        content = f.read(5000)  # read first 5000 chars to limit size (if huge)
    prompt = f"Summarize the following document:\n\n{content}"
    data = {
        "model": "claude-2.1",
        "max_tokens": 150,
        "messages": [ {"role": "user", "content": prompt} ]
    }
    async with httpx.AsyncClient(timeout=20) as client:
        resp = await client.post(API_URL, headers=headers, json=data)
        resp.raise_for_status()
        summary = resp.json().get("content")
    out_path = path.replace(".txt", "_summary.txt")
    with open(out_path, "w") as f:
        f.write(summary or "")
    print(f"Summarized {path} -> {out_path}")

async def process_all():
    tasks = [summarize_file(path) for path in files]
    # limit concurrency to 5 at a time
    for i in range(0, len(tasks), 5):
        batch = tasks[i:i+5]
        await asyncio.gather(*batch)

asyncio.run(process_all())

This script finds all .txt files in a folder, then uses httpx async to summarize each. We throttle by only running 5 at a time (by slicing tasks) to be polite to the API. You could adjust that number or make it dynamic based on observed latency and rate limits. After running, each file will have a corresponding _summary.txt.

Ensuring Reliability in Automation

When running unattended, consider adding extra safeguards:

Retries: As shown earlier, incorporate retry logic, because a transient error shouldn’t abort the whole scheduled task. Perhaps wrap each file’s processing in a retry loop or use httpx.Retry if available.

Logging: Print timestamps and results to a log file so you can audit what happened later. Cron typically emails the output of a job if there’s any – redirect output to a log or use Python’s logging module to append to a file.

Notifications: If a job fails (non-zero exit), you might want an email or alert. Cron can be configured to send email on failure, or your script can integrate with a chat/alert system to notify you (this is beyond scope, but consider it for critical tasks).

Graceful degradation: If the API is down or you run out of credits, have the script handle that (maybe skip processing and try next time rather than crash repeatedly). The Anthropic API will not fulfill requests if your credit is exhausted, so watch for messages like “insufficient funds” in errors.

By automating tasks with Claude, you can leverage AI to continuously extract insights or generate content without manual intervention. Just be mindful of the usage limits and cost implications of running such jobs regularly – always test on small scale and estimate token usage so there are no surprises in your bill.

Conclusion

Using Claude via its Python API opens up a world of possibilities – from building smart chatbots and assistants, to summarizing vast amounts of data, to automating creative tasks. In this comprehensive guide, we covered how to set up your environment and authenticate to Anthropic’s Claude API, how to craft requests with requests (and the request/response format expected by Claude), and included example code for robust usage: handling errors, implementing retries, and managing timeouts and sessions. We also explored advanced topics like streaming responses for real-time capabilities, using the httpx library for asynchronous concurrency, and integrating Claude into a FastAPI application to serve responses via a web API. Finally, we discussed deploying Claude-powered scripts in production-like scenarios – from FastAPI web services with proper logging to background jobs scheduled with cron.

A few parting best practices and tips:

Keep your API key safe – use environment variables and never expose it publicly.
Start small and experiment – test your prompts and settings (model, max_tokens, etc.) with short scripts before scaling up to ensure you get the desired outputs.
Monitor usage and costs – Claude’s capabilities are powerful, but token usage can add up. Use the provided usage reports and design your application to stay within reasonable limits.
Handle errors gracefully – network issues and rate limits happen; your code should handle these so that a temporary glitch doesn’t break your entire app.
Stay updated – AI APIs evolve quickly. Anthropic may release new model versions or features (like larger context windows, new parameters, or improved SDKs). Keep an eye on the official Claude documentation and update your code as needed for new versions (for example, the required anthropic-version header may change over time).

With the information and examples provided here, you should be well-equipped to build and deploy Python applications that harness Claude’s AI strengths. Whether you’re enhancing an existing backend service or creating a new AI-driven tool, Claude’s API combined with Python gives you a flexible and powerful toolkit. Happy coding, and may your integration with Claude be both fun and fruitful!