Skip to content

Caching

sheaf.cache

In-process LRU response cache for Sheaf deployments.

Each deployment can opt in to caching via ModelSpec.cache. When enabled, a SHA-256 of the canonical request JSON (excluding request_id and any caller-specified fields) is used as the cache key. Cache hits bypass inference entirely — no batching, no backend call.

Usage::

from sheaf.spec import ModelSpec, CacheConfig

spec = ModelSpec(
    name="chronos-small",
    ...
    cache=CacheConfig(enabled=True, max_size=512, ttl_s=300.0),
)
Good candidates for caching
  • Embedding models (same image/text → same vector)
  • Time-series forecasts with fixed history
  • Tabular models with fixed feature rows

Poor candidates (non-deterministic or privacy-sensitive): - Diffusion with random seeds (include seed in the key — same seed IS same image) - Any model where the caller explicitly needs fresh output each time

SHEAF_CACHE_DISABLED=1 disables all caches regardless of config (useful in smoke tests and integration runs where you want to exercise the backend).

CacheConfig

Bases: BaseModel

Cache configuration for a single deployment.

Attributes:

Name Type Description
enabled bool

Enable the cache (default False — opt-in).

max_size int

Maximum number of LRU entries (default 1024).

ttl_s float | None

Time-to-live in seconds. None (default) means entries never expire. Set e.g. ttl_s=300 to expire cached forecasts after five minutes.

exclude_fields list[str]

Request field names to omit from the cache key. request_id is always excluded automatically. Use this to exclude e.g. "seed" when you want different diffusion seeds to produce distinct images yet still benefit from caching same-seed repeats.

ResponseCache

ResponseCache(config: CacheConfig)

Thread-safe in-process LRU cache for predict responses.

Keys are SHA-256 hex digests of the canonical request JSON. Values are the serialised response dicts returned by model_dump.

Parameters:

Name Type Description Default
config CacheConfig

CacheConfig instance from the deployment's ModelSpec.

required

size property

size: int

Current number of entries in the cache.

make_key

make_key(deployment: str, request: Any) -> str

Return a SHA-256 cache key for (deployment, request).

request_id is always excluded (it is unique per call and must not affect the key). Fields listed in config.exclude_fields are also dropped before hashing.

Parameters:

Name Type Description Default
deployment str

Deployment name (ModelSpec.name).

required
request Any

Any Pydantic BaseRequest subclass.

required

Returns:

Type Description
str

64-character lowercase hex digest.

get

get(key: str) -> dict[str, Any] | None

Return the cached response dict, or None on miss / expiry.

Parameters:

Name Type Description Default
key str

Cache key from :meth:make_key.

required

set

set(key: str, value: dict[str, Any]) -> None

Store a response dict under key.

If the cache is at capacity the least-recently-used entry is evicted.

Parameters:

Name Type Description Default
key str

Cache key from :meth:make_key.

required
value dict[str, Any]

Serialised response dict (response.model_dump(mode="json")).

required