Caching
Cache LLM Responses
Quick Start
Caching can be enabled by adding the cache key in the config.yaml
Step 1: Add cache to the config.yaml
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
  - model_name: text-embedding-ada-002
    litellm_params:
      model: text-embedding-ada-002
litellm_settings:
  set_verbose: True
  cache: True          # set cache responses to True, litellm defaults to using a redis cache
Step 2: Add Redis Credentials to .env
Set either REDIS_URL or the REDIS_HOST in your os environment, to enable caching.
REDIS_URL = ""        # REDIS_URL='redis://username:password@hostname:port/database'
## OR ## 
REDIS_HOST = ""       # REDIS_HOST='redis-18841.c274.us-east-1-3.ec2.cloud.redislabs.com'
REDIS_PORT = ""       # REDIS_PORT='18841'
REDIS_PASSWORD = ""   # REDIS_PASSWORD='liteLlmIsAmazing'
Additional kwargs
You can pass in any additional redis.Redis arg, by storing the variable + value in your os environment, like this: 
REDIS_<redis-kwarg-name> = ""
See how it's read from the environment
Step 3: Run proxy with config
$ litellm --config /path/to/config.yaml
Using Caching - /chat/completions
Send the same request twice:
curl http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "gpt-3.5-turbo",
     "messages": [{"role": "user", "content": "write a poem about litellm!"}],
     "temperature": 0.7
   }'
curl http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "gpt-3.5-turbo",
     "messages": [{"role": "user", "content": "write a poem about litellm!"}],
     "temperature": 0.7
   }'
Using Caching - /embeddings
Send the same request twice:
curl --location 'http://0.0.0.0:8000/embeddings' \
  --header 'Content-Type: application/json' \
  --data ' {
  "model": "text-embedding-ada-002",
  "input": ["write a litellm poem"]
  }'
curl --location 'http://0.0.0.0:8000/embeddings' \
  --header 'Content-Type: application/json' \
  --data ' {
  "model": "text-embedding-ada-002",
  "input": ["write a litellm poem"]
  }'
Advanced
Set Cache Params on config.yaml
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
  - model_name: text-embedding-ada-002
    litellm_params:
      model: text-embedding-ada-002
litellm_settings:
  set_verbose: True
  cache: True          # set cache responses to True, litellm defaults to using a redis cache
  cache_params:         # cache_params are optional
    type: "redis"  # The type of cache to initialize. Can be "local" or "redis". Defaults to "local".
    host: "localhost"  # The host address for the Redis cache. Required if type is "redis".
    port: 6379  # The port number for the Redis cache. Required if type is "redis".
    password: "your_password"  # The password for the Redis cache. Required if type is "redis".
    
    # Optional configurations
    supported_call_types: ["acompletion", "completion", "embedding", "aembedding"] # defaults to all litellm call types
Override caching per chat/completions request
Caching can be switched on/off per /chat/completions request
- Caching on for individual completion - pass caching=True:curl http://0.0.0.0:8000/v1/chat/completions \
 -H "Content-Type: application/json" \
 -d '{
 "model": "gpt-3.5-turbo",
 "messages": [{"role": "user", "content": "write a poem about litellm!"}],
 "temperature": 0.7,
 "caching": true
 }'
- Caching off for individual completion - pass caching=False:curl http://0.0.0.0:8000/v1/chat/completions \
 -H "Content-Type: application/json" \
 -d '{
 "model": "gpt-3.5-turbo",
 "messages": [{"role": "user", "content": "write a poem about litellm!"}],
 "temperature": 0.7,
 "caching": false
 }'
Override caching per /embeddings request
Caching can be switched on/off per /embeddings request
- Caching on for embedding - pass caching=True:curl --location 'http://0.0.0.0:8000/embeddings' \
 --header 'Content-Type: application/json' \
 --data ' {
 "model": "text-embedding-ada-002",
 "input": ["write a litellm poem"],
 "caching": true
 }'
- Caching off for completion - pass caching=False:curl --location 'http://0.0.0.0:8000/embeddings' \
 --header 'Content-Type: application/json' \
 --data ' {
 "model": "text-embedding-ada-002",
 "input": ["write a litellm poem"],
 "caching": false
 }'