Learn to GPT

Learn to GPT

API Reference

Claude Batch API

Thousands of prompts, 50% lower cost

The Claude Batch API lets you submit thousands of prompts at once for asynchronous processing at 50% lower cost. Instead of sending requests one at a time and waiting for each response, you package your entire workload into a single batch and retrieve results when processing completes. This is ideal for data labeling, content generation, document analysis, and any high-volume AI workflow where real-time responses aren't required.

Batch API Quickstart ChatGPT for Developers

How It Works

Building a batch pipeline

The batch workflow is straightforward: prepare your requests in JSONL format, submit them, wait for processing, and download structured results. Each request is independent, so failures in one don't affect others.

1

Prepare your JSONL file

Each line in the file contains one request with a custom_id and the messages payload. The custom_id lets you match results back to your inputs when processing completes.

{"custom_id":"task-001","params":{"model":"claude-sonnet-4-20250514","max_tokens":1024,"messages":[{"role":"user","content":"Classify this support ticket: ..."}]}}

2

Submit the batch

Upload your JSONL file and create a batch via the API. You get back a batch ID that you use to check status and retrieve results.

import anthropic

client = anthropic.Anthropic()
batch = client.messages.batches.create(
    requests=[...],  # Your list of batch requests
)
print(batch.id)  # Save this for polling

3

Poll for completion

Batches process asynchronously. Check the status endpoint periodically or set up a webhook. Processing time depends on batch size and current load, but most batches complete within 24 hours.

batch = client.messages.batches.retrieve(batch.id)
print(batch.processing_status)
# "in_progress" | "ended"

4

Download results

Once complete, download the results file. Each line contains the custom_id and the model's response, making it straightforward to rejoin with your original data.

for result in client.messages.batches.results(batch.id):
    print(result.custom_id, result.result.message.content)

Comparison

When to use batch vs. real-time

The Batch API is not a replacement for real-time requests — it's a complement. Use real-time for interactive applications where users are waiting for responses. Use batch for everything else.

Feature	Real-Time API	Batch API
Latency	Seconds	Up to 24 hours
Cost	Standard pricing	50% discount
Rate limits	Per-minute limits apply	Higher effective throughput
Best for	Interactive apps, chat, real-time UX	Offline processing, bulk analysis, data pipelines
Error handling	Retry individual requests	Per-request status in results file

Rule of thumb

If your workflow can tolerate latency measured in minutes or hours rather than seconds, use the Batch API. The 50% cost reduction compounds significantly at scale — a workflow that costs $1,000/month in real-time mode costs $500/month in batch mode with identical results.

Use Cases

Common batch processing use cases

Document classification

Sort thousands of support tickets, legal documents, or research papers into categories. Feed each document as a separate batch request with classification instructions. Typical accuracy exceeds 95% with well-crafted prompts.

Content moderation

Screen user-generated content at scale. Claude evaluates each piece of content against your moderation policy and returns a structured decision (approve, flag, reject) with reasoning. Process an entire day's content queue in one batch.

Data extraction and labeling

Extract structured fields from unstructured text — names, dates, amounts, sentiment, categories. Transform messy data into clean, typed records. One batch can process an entire dataset that would take a team of analysts weeks.

Content generation

Generate product descriptions, email variations, social media posts, or translations in bulk. Each request can include different product data and target audience parameters. Useful for e-commerce catalogs and marketing campaigns.

FAQ

Frequently asked questions

What is the maximum batch size?

Each batch can contain up to 10,000 requests. For larger workloads, split your data into multiple batches and submit them in parallel. There is no limit on how many batches you can have running simultaneously, though throughput depends on your API tier.

How long does batch processing take?

Most batches complete within 24 hours, and many finish in under an hour depending on size and current system load. Batches are designed for workloads where minutes-to-hours latency is acceptable in exchange for the 50% cost reduction.

How does error handling work?

Each request in a batch is processed independently. If one request fails (e.g., due to content filtering or token limits), the others still complete. The results file includes a status field for each request so you can identify and retry failures without reprocessing the entire batch.

How is batch pricing calculated?

Batch API pricing is 50% of the standard per-token rate for the model you use. Input tokens and output tokens are both discounted. For example, if Claude Sonnet costs $3 per million input tokens in real-time mode, it costs $1.50 per million input tokens in batch mode. This makes batch processing one of the most cost-effective ways to use frontier AI models.

Learn to build with the Claude API

Your first batch pipeline. 20 minutes.

Batch API Quickstart Tool Use Guide

Explore More

API Tutorial

Get started with the Claude API

Explore

For Developers

Developer workflows and tools

Explore

Claude Agents

Build autonomous AI agents

Explore