What Is Prompt Caching TTL?

TTL stands for Time To Live: the amount of time a cache entry can survive without being used.

For prompt caching, TTL tells you how long a cached prefix can stay alive before the system discards it.

Start With a Concrete Example

A carton of milk has the same timing problem:

You buy milk with a shelf life of 5 days

Day 1  still fresh
Day 2  still usable
Day 5  last valid day
Day 6  expired and thrown away

Prompt cache TTL works the same way:

00:00  First request writes a prefix into cache
02:30  Another request hits the cache, TTL resets to 5 minutes
07:29  Another hit refreshes the timer again
12:30  No one uses it for 5 minutes, cache expires
12:31  A new request arrives, misses cache, and must rewrite it at full cost

Typical TTL Differences Across Providers

Provider	TTL	Configurable	Refreshed on hit
Anthropic Claude	5 minutes	No	Yes
OpenAI	Minutes to hours	No, opaque	Opaque
Google Gemini	Developer-defined, 1 hour by default	Yes	Yes

If requests keep arriving, the cache stays alive. If usage pauses for too long, the entry expires and the next request has to write it again at full cost.

TTL stands for Time To Live: the amount of time a cache entry can survive without being used.

For prompt caching, TTL tells you how long a cached prefix can stay alive before the system discards it.

Start With a Concrete Example

A carton of milk has the same timing problem:

You buy milk with a shelf life of 5 days

Day 1  still fresh
Day 2  still usable
Day 5  last valid day
Day 6  expired and thrown away

Prompt cache TTL works the same way:

00:00  First request writes a prefix into cache
02:30  Another request hits the cache, TTL resets to 5 minutes
07:29  Another hit refreshes the timer again
12:30  No one uses it for 5 minutes, cache expires
12:31  A new request arrives, misses cache, and must rewrite it at full cost

Typical TTL Differences Across Providers

Provider	TTL	Configurable	Refreshed on hit
Anthropic Claude	5 minutes	No	Yes
OpenAI	Minutes to hours	No, opaque	Opaque
Google Gemini	Developer-defined, 1 hour by default	Yes	Yes

If requests keep arriving, the cache stays alive. If usage pauses for too long, the entry expires and the next request has to write it again at full cost.

What Is Prompt Caching TTL?

Start With a Concrete Example

Typical TTL Differences Across Providers

Author

Categories

More Posts

How I Think About Agent Memory

Agent vs Harnessed Agent

How Claude Code Organizes System Prompts and Context

What Is Prompt Caching TTL?

Start With a Concrete Example

Typical TTL Differences Across Providers

Author

Categories

More Posts

How I Think About Agent Memory

Agent vs Harnessed Agent

How Claude Code Organizes System Prompts and Context