Caching¶

sheaf.cache ¶

In-process LRU response cache for Sheaf deployments.

Each deployment can opt in to caching via ModelSpec.cache. When enabled, a SHA-256 of the canonical request JSON (excluding request_id and any caller-specified fields) is used as the cache key. Cache hits bypass inference entirely — no batching, no backend call.

Usage::

from sheaf.spec import ModelSpec, CacheConfig

spec = ModelSpec(
    name="chronos-small",
    ...
    cache=CacheConfig(enabled=True, max_size=512, ttl_s=300.0),
)

Good candidates for caching

Embedding models (same image/text → same vector)
Time-series forecasts with fixed history
Tabular models with fixed feature rows

Poor candidates (non-deterministic or privacy-sensitive): - Diffusion with random seeds (include seed in the key — same seed IS same image) - Any model where the caller explicitly needs fresh output each time

SHEAF_CACHE_DISABLED=1 disables all caches regardless of config (useful in smoke tests and integration runs where you want to exercise the backend).

CacheConfig ¶

Bases: BaseModel

Cache configuration for a single deployment.

Attributes:

Name	Type	Description
`enabled`	`bool`	Enable the cache (default `False` — opt-in).
`max_size`	`int`	Maximum number of LRU entries (default 1024).
`ttl_s`	`float \| None`	Time-to-live in seconds. `None` (default) means entries never expire. Set e.g. `ttl_s=300` to expire cached forecasts after five minutes.
`exclude_fields`	`list[str]`	Request field names to omit from the cache key. `request_id` is always excluded automatically. Use this to exclude e.g. `"seed"` when you want different diffusion seeds to produce distinct images yet still benefit from caching same-seed repeats.

ResponseCache ¶

ResponseCache(config: CacheConfig)

Thread-safe in-process LRU cache for predict responses.

Keys are SHA-256 hex digests of the canonical request JSON. Values are the serialised response dicts returned by model_dump.

Parameters:

Name	Type	Description	Default
`config`	`CacheConfig`	`CacheConfig` instance from the deployment's `ModelSpec`.	required

size `property` ¶

size: int

Current number of entries in the cache.

make_key ¶

make_key(deployment: str, request: Any) -> str

Return a SHA-256 cache key for (deployment, request).

request_id is always excluded (it is unique per call and must not affect the key). Fields listed in config.exclude_fields are also dropped before hashing.

Parameters:

Name	Type	Description	Default
`deployment`	`str`	Deployment name (`ModelSpec.name`).	required
`request`	`Any`	Any Pydantic `BaseRequest` subclass.	required

Returns:

Type	Description
`str`	64-character lowercase hex digest.

get ¶

get(key: str) -> dict[str, Any] | None

Return the cached response dict, or None on miss / expiry.

Parameters:

Name	Type	Description	Default
`key`	`str`	Cache key from :meth:`make_key`.	required

set ¶

set(key: str, value: dict[str, Any]) -> None

Store a response dict under key.

If the cache is at capacity the least-recently-used entry is evicted.

Parameters:

Name	Type	Description	Default
`key`	`str`	Cache key from :meth:`make_key`.	required
`value`	`dict[str, Any]`	Serialised response dict (`response.model_dump(mode="json")`).	required

Caching¶

sheaf.cache ¶

CacheConfig ¶

ResponseCache ¶

size property ¶

make_key ¶

get ¶

set ¶

size `property` ¶