API Reference
Claude Batch API
Thousands of prompts, 50% lower cost
The Claude Batch API lets you submit thousands of prompts at once for asynchronous processing at 50% lower cost. Instead of sending requests one at a time and waiting for each response, you package your entire workload into a single batch and retrieve results when processing completes. This is ideal for data labeling, content generation, document analysis, and any high-volume AI workflow where real-time responses aren't required.
How It Works
Building a batch pipeline
The batch workflow is straightforward: prepare your requests in JSONL format, submit them, wait for processing, and download structured results. Each request is independent, so failures in one don't affect others.
Each line in the file contains one request with a custom_id and the messages payload. The custom_id lets you match results back to your inputs when processing completes.
{"custom_id":"task-001","params":{"model":"claude-sonnet-4-20250514","max_tokens":1024,"messages":[{"role":"user","content":"Classify this support ticket: ..."}]}}Upload your JSONL file and create a batch via the API. You get back a batch ID that you use to check status and retrieve results.
import anthropic
client = anthropic.Anthropic()
batch = client.messages.batches.create(
requests=[...], # Your list of batch requests
)
print(batch.id) # Save this for pollingBatches process asynchronously. Check the status endpoint periodically or set up a webhook. Processing time depends on batch size and current load, but most batches complete within 24 hours.
batch = client.messages.batches.retrieve(batch.id) print(batch.processing_status) # "in_progress" | "ended"
Once complete, download the results file. Each line contains the custom_id and the model's response, making it straightforward to rejoin with your original data.
for result in client.messages.batches.results(batch.id):
print(result.custom_id, result.result.message.content)Comparison
When to use batch vs. real-time
The Batch API is not a replacement for real-time requests — it's a complement. Use real-time for interactive applications where users are waiting for responses. Use batch for everything else.
| Feature | Real-Time API | Batch API |
|---|---|---|
| Latency | Seconds | Up to 24 hours |
| Cost | Standard pricing | 50% discount |
| Rate limits | Per-minute limits apply | Higher effective throughput |
| Best for | Interactive apps, chat, real-time UX | Offline processing, bulk analysis, data pipelines |
| Error handling | Retry individual requests | Per-request status in results file |
If your workflow can tolerate latency measured in minutes or hours rather than seconds, use the Batch API. The 50% cost reduction compounds significantly at scale — a workflow that costs $1,000/month in real-time mode costs $500/month in batch mode with identical results.
Use Cases
Common batch processing use cases
Sort thousands of support tickets, legal documents, or research papers into categories. Feed each document as a separate batch request with classification instructions. Typical accuracy exceeds 95% with well-crafted prompts.
Screen user-generated content at scale. Claude evaluates each piece of content against your moderation policy and returns a structured decision (approve, flag, reject) with reasoning. Process an entire day's content queue in one batch.
Extract structured fields from unstructured text — names, dates, amounts, sentiment, categories. Transform messy data into clean, typed records. One batch can process an entire dataset that would take a team of analysts weeks.
Generate product descriptions, email variations, social media posts, or translations in bulk. Each request can include different product data and target audience parameters. Useful for e-commerce catalogs and marketing campaigns.
FAQ
Frequently asked questions
Each batch can contain up to 10,000 requests. For larger workloads, split your data into multiple batches and submit them in parallel. There is no limit on how many batches you can have running simultaneously, though throughput depends on your API tier.
Most batches complete within 24 hours, and many finish in under an hour depending on size and current system load. Batches are designed for workloads where minutes-to-hours latency is acceptable in exchange for the 50% cost reduction.
Each request in a batch is processed independently. If one request fails (e.g., due to content filtering or token limits), the others still complete. The results file includes a status field for each request so you can identify and retry failures without reprocessing the entire batch.
Batch API pricing is 50% of the standard per-token rate for the model you use. Input tokens and output tokens are both discounted. For example, if Claude Sonnet costs $3 per million input tokens in real-time mode, it costs $1.50 per million input tokens in batch mode. This makes batch processing one of the most cost-effective ways to use frontier AI models.
Learn to build with the Claude API
Your first batch pipeline. 20 minutes.