Language Translation#

This example will show you how to translates English to other languages, the original example is on OpenAI Example, the difference is that we will teach you how to cache the response for exact and similar matches with gptcache, it will be very simple, you just need to add an extra step to initialize the cache.

Before running the example, make sure the OPENAI_API_KEY environment variable is set by executing echo $OPENAI_API_KEY. If it is not already set, it can be set by using export OPENAI_API_KEY=YOUR_API_KEY on Unix/Linux/MacOS systems or set OPENAI_API_KEY=YOUR_API_KEY on Windows systems.

Then we can learn the usage and acceleration effect of gptcache by the following code, which consists of three parts, the original openai way, the exact search and the similar search.

OpenAI API original usage#

import time
import openai

def response_text(openai_resp):
    return openai_resp["choices"][0]["text"]

start_time = time.time()
response = openai.Completion.create(
  model="text-davinci-003",
  prompt="Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhat rooms do you have available?\n\n1.",
  temperature=0.3,
  max_tokens=100,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)

print(f"\nAnswer: 1.{response_text(response)}")
print("Time consuming: {:.2f}s".format(time.time() - start_time))
Answer: 1. Quels salles avez-vous disponibles?
2. ¿Qué habitaciones tienen disponibles?
3. どの部屋が利用可能ですか?
Time consuming: 2.06s

OpenAI API + GPTCache, exact match cache#

Initalize the cache to run GPTCache and import openai form gptcache.adapter, which will automatically set the map data manager to match the exact cahe, more details refer to build your cache.

import time

def response_text(openai_resp):
    return openai_resp["choices"][0]["text"]

print("Cache loading.....")

# To use GPTCache, that's all you need
# -------------------------------------------------
from gptcache import cache
from gptcache.processor.pre import get_prompt

cache.init(pre_embedding_func=get_prompt)
cache.set_openai_key()
# -------------------------------------------------

questions = [
    "Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhat rooms do you have available?\n\n1.",
    "Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhich rooms do you have available?\n\n1.",
    "Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhat kind of rooms do you have available?\n\n1.",
]

for question in questions:
    start_time = time.time()
    response = openai.Completion.create(
                  model="text-davinci-003",
                  prompt=question,
                  temperature=0.3,
                  max_tokens=100,
                  top_p=1.0,
                  frequency_penalty=0.0,
                  presence_penalty=0.0
                )
    print(f"\nAnswer: 1.{response_text(response)}")
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
Cache loading.....

Answer: 1. Quels sont les chambres que vous avez disponibles ?
2. ¿Qué habitaciones tienes disponibles?
3. どの部屋が利用可能ですか?
Time consuming: 1.81s

Answer: 1. Quelles pièces avez-vous disponibles?
2. ¿Qué habitaciones tienen disponibles?
3. どの部屋が利用可能ですか?
Time consuming: 4.47s

Answer: 1. Quels types de chambres avez-vous disponibles ?
2. ¿Qué tipos de habitaciones tienen disponibles?
3. どんな部屋が利用可能ですか?
Time consuming: 1.40s

OpenAI API + GPTCache, similar search cache#

Set the cache with pre_embedding_func to preprocess the input data, embedding_func to generate embedding for the text, and data_manager to manager the cache data, similarity_evaluation to evaluate the similarities, more details refer to build your cache.

import time


def response_text(openai_resp):
    return openai_resp["choices"][0]["text"]

from gptcache import cache
from gptcache.adapter import openai
from gptcache.embedding import Onnx
from gptcache.processor.pre import get_prompt
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation

print("Cache loading.....")

onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
cache.init(pre_embedding_func=get_prompt,
    embedding_func=onnx.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(),
    )
cache.set_openai_key()

questions = [
    "Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhat rooms do you have available?\n\n1.",
    "Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhich rooms do you have available?\n\n1.",
    "Translate this into 1. French, 2. Spanish and 3. Japanese:\n\nWhat kind of rooms do you have available?\n\n1.",
]

for question in questions:
    start_time = time.time()
    response = openai.Completion.create(
                  model="text-davinci-003",
                  prompt=question,
                  temperature=0.3,
                  max_tokens=100,
                  top_p=1.0,
                  frequency_penalty=0.0,
                  presence_penalty=0.0
                )
    print(f"\nAnswer: 1.{response_text(response)}")
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
Cache loading.....

Answer: 1. Quels salles avez-vous disponibles?
2. ¿Qué habitaciones tienes disponibles?
3. どの部屋が利用可能ですか?
Time consuming: 4.40s

Answer: 1. Quels salles avez-vous disponibles?
2. ¿Qué habitaciones tienes disponibles?
3. どの部屋が利用可能ですか?
Time consuming: 0.19s

Answer: 1. Quels salles avez-vous disponibles?
2. ¿Qué habitaciones tienes disponibles?
3. どの部屋が利用可能ですか?
Time consuming: 0.21s

We find that the performance improvement when searching the similar because the three statements of the query are similar, and hitting cache in gptcache, so it will return the cached results directly instead of requesting. And you can then also try running the query again for exact search, which will also speed up.