Tweet Classifier#

This example will show you how to determine sentiment of tweets, the original example is on OpenAI Example, the difference is that we will teach you how to cache the response for exact and similar matches with gptcache, it will be very simple, you just need to add an extra step to initialize the cache.

Before running the example, make sure the OPENAI_API_KEY environment variable is set by executing echo $OPENAI_API_KEY. If it is not already set, it can be set by using export OPENAI_API_KEY=YOUR_API_KEY on Unix/Linux/MacOS systems or set OPENAI_API_KEY=YOUR_API_KEY on Windows systems.

Then we can learn the usage and acceleration effect of gptcache by the following code, which consists of three parts, the original openai way, the exact search and the similar search.

OpenAI API original usage#

import time
import openai


def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

tweet = "I loved the new Batman movie!"

# OpenAI API original usage
start_time = time.time()
response = openai.ChatCompletion.create(
  model='gpt-3.5-turbo',
  messages=[
    {
        'role': 'user',
        'content': f"Decide whether a Tweet's sentiment is positive, neutral, or negative.\n\nTweet: \"{tweet}\"\nSentiment:",
    }
  ],
)
print(f'Tweet: {tweet}')
print("Time consuming: {:.2f}s".format(time.time() - start_time))
print(f'Sentiment: {response_text(response)}\n')
Tweet: I loved the new Batman movie!
Time consuming: 0.81s
Sentiment: Positive

OpenAI API + GPTCache, exact match cache#

Initalize the cache to run GPTCache and import openai form gptcache.adapter, which will automatically set the map data manager to match the exact cahe, more details refer to build your cache.

And if you send the exact same tweets, the answer to the second tweet will be obtained from the cache without requesting ChatGPT again.

import time

def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

print("Cache loading.....")

# To use GPTCache, that's all you need
# -------------------------------------------------
from gptcache import cache
from gptcache.adapter import openai

cache.init()
cache.set_openai_key()
# -------------------------------------------------

tweet = "The weather today is neither good nor bad"
for _ in range(2):
    start_time = time.time()
    response = openai.ChatCompletion.create(
      model='gpt-3.5-turbo',
      messages=[
        {
            'role': 'user',
            'content': f"Decide whether a Tweet's sentiment is positive, neutral, or negative.\n\nTweet: \"{tweet}\"\nSentiment:",
        }
      ],
    )
    print(f'Tweet: {tweet}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print(f'Sentiment: {response_text(response)}\n')
Cache loading.....
Tweet: The weather today is neither good nor bad
Time consuming: 0.62s
Sentiment: neutral

Tweet: The weather today is neither good nor bad
Time consuming: 0.00s
Sentiment: neutral

OpenAI API + GPTCache, similar search cache#

We are going to use DocArray’s in-memory index to perform similarity search.

Set the cache with embedding_func to generate embedding for the text, and data_manager to manager the cache data, similarity_evaluation to evaluate the similarities, more details refer to build your cache.

After obtaining an answer from ChatGPT in response to several similar tweets, the answers to subsequent questions can be retrieved from the cache without the need to request ChatGPT again.

import time

def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

from gptcache import cache
from gptcache.adapter import openai
from gptcache.embedding import Onnx
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation

print("Cache loading.....")

onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("docarray"))
cache.init(
    embedding_func=onnx.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(),
    )
cache.set_openai_key()

tweets = [
    "The new restaurant in town exceeded my expectations with its delectable cuisine and impeccable service",
    "New restaurant in town exceeded my expectations with its delectable cuisine and impeccable service",
    "The new restaurant exceeded my expectations with its delectable cuisine and impeccable service",
]

for tweet in tweets:
    start_time = time.time()
    response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo',
        messages=[
            {
                'role': 'user',
                'content': f"Decide whether a Tweet's sentiment is positive, neutral, or negative.\n\nTweet: \"{tweet}\"\nSentiment:",
            }
        ],
    )
    print(f'Tweet: {tweet}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print(f'Sentiment: {response_text(response)}\n')
/Users/jinaai/Desktop/GPTCache/venv1/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Cache loading.....
Tweet: The new restaurant in town exceeded my expectations with its delectable cuisine and impeccable service
Time consuming: 0.70s
Sentiment:  Positive

Tweet: New restaurant in town exceeded my expectations with its delectable cuisine and impeccable service
Time consuming: 0.59s
Sentiment:  Positive

Tweet: The new restaurant exceeded my expectations with its delectable cuisine and impeccable service
Time consuming: 0.74s
Sentiment:  Positive