Caching chat response#

This notebook is to show you how to use Vertex AI to answer questions and teach you how to cache the response for exact and similar matches with gptcache. It is relatively simple, you just need to add an extra step to initialize the cache.

!pip install google-cloud-aiplatform
! pip install -q gptcache langchain
Collecting google-cloud-aiplatform
  Downloading google_cloud_aiplatform-1.27.0-py2.py3-none-any.whl (2.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 39.2 MB/s eta 0:00:00
?25hRequirement already satisfied: google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-aiplatform) (2.11.1)
Requirement already satisfied: proto-plus<2.0.0dev,>=1.22.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-aiplatform) (1.22.3)
Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5 in /usr/local/lib/python3.10/dist-packages (from google-cloud-aiplatform) (3.20.3)
Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.10/dist-packages (from google-cloud-aiplatform) (23.1)
Requirement already satisfied: google-cloud-storage<3.0.0dev,>=1.32.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-aiplatform) (2.8.0)
Requirement already satisfied: google-cloud-bigquery<4.0.0dev,>=1.15.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-aiplatform) (3.10.0)
Collecting google-cloud-resource-manager<3.0.0dev,>=1.3.3 (from google-cloud-aiplatform)
  Downloading google_cloud_resource_manager-1.10.2-py2.py3-none-any.whl (321 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 321.3/321.3 kB 32.8 MB/s eta 0:00:00
?25hCollecting shapely<2.0.0 (from google-cloud-aiplatform)
  Downloading Shapely-1.8.5.post1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 65.6 MB/s eta 0:00:00
?25hRequirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /usr/local/lib/python3.10/dist-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.59.1)
Requirement already satisfied: google-auth<3.0.dev0,>=2.14.1 in /usr/local/lib/python3.10/dist-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (2.17.3)
Requirement already satisfied: requests<3.0.0.dev0,>=2.18.0 in /usr/local/lib/python3.10/dist-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (2.27.1)
Requirement already satisfied: grpcio<2.0dev,>=1.33.2 in /usr/local/lib/python3.10/dist-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.56.0)
Requirement already satisfied: grpcio-status<2.0.dev0,>=1.33.2 in /usr/local/lib/python3.10/dist-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.48.2)
Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.3.2)
Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.5.0)
Requirement already satisfied: python-dateutil<3.0dev,>=2.7.2 in /usr/local/lib/python3.10/dist-packages (from google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.8.2)
Requirement already satisfied: grpc-google-iam-v1<1.0.0dev,>=0.12.4 in /usr/local/lib/python3.10/dist-packages (from google-cloud-resource-manager<3.0.0dev,>=1.3.3->google-cloud-aiplatform) (0.12.6)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (5.3.1)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (0.3.0)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.16.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (4.9)
Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /usr/local/lib/python3.10/dist-packages (from google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (1.5.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0.dev0,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0.dev0,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (2023.5.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0.dev0,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0.dev0,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (3.4)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3.0.dev0,>=2.14.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (0.5.0)
Installing collected packages: shapely, google-cloud-resource-manager, google-cloud-aiplatform
  Attempting uninstall: shapely
    Found existing installation: shapely 2.0.1
    Uninstalling shapely-2.0.1:
      Successfully uninstalled shapely-2.0.1
Successfully installed google-cloud-aiplatform-1.27.0 google-cloud-resource-manager-1.10.2 shapely-1.8.5.post1

#Authenticating and testing the VertexAI model

from google.colab import auth as google_auth
google_auth.authenticate_user()

import vertexai
from vertexai.preview.language_models import TextGenerationModel

def predict_large_language_model_sample(
    project_id: str,
    model_name: str,
    temperature: float,
    max_decode_steps: int,
    top_p: float,
    top_k: int,
    content: str,
    location: str = "us-central1",
    tuned_model_name: str = "",
    ) :
    """Predict using a Large Language Model."""
    vertexai.init(project=project_id, location=location)
    model = TextGenerationModel.from_pretrained(model_name)
    if tuned_model_name:
      model = model.get_tuned_model(tuned_model_name)
    response = model.predict(
        content,
        temperature=temperature,
        max_output_tokens=max_decode_steps,
        top_k=top_k,
        top_p=top_p,)
    print(f"Response from Model: {response.text}")
predict_large_language_model_sample("octo-t2sql", "text-bison@001", 0.2, 256, 0.8, 40, '''Give me ten interview questions for the role of software engineer''', "us-central1")
Response from Model: 1. What is your experience with project management?
2. What is your process for managing a project?
3. How do you handle unexpected challenges or roadblocks?
4. How do you communicate with stakeholders?
5. How do you measure the success of a project?
6. What are your strengths and weaknesses as a project manager?
7. What are your salary expectations?
8. What are your career goals?
9. What are your thoughts on the company's culture?
10. Why are you interested in this position?

Before running the example, make sure the first parameter of predict_large_language_model_sample is corresponding to your project_id. You will be prompted to authenticate.

Then we can learn the usage and acceleration effect of gptcache with the following code, which consists of three parts:

  1. Usual way

  2. Exact Search

  3. Similar Search

VertexAI API standard usage#

import time

# def response_text(vertexai_resp):
#     return vertexai_resp['choices'][0]['message']['content']


question = 'whatβ€˜s github?'

# VertexAI API original usage
start_time = time.time()
response = predict_large_language_model_sample("octo-t2sql", "text-bison@001", 0.2, 256, 0.8, 40, question, "us-central1")

print(f'Question: {question}')
print("Time consuming: {:.2f}s".format(time.time() - start_time))
# print(f'Answer: {response_text(response)}\n')
Response from Model: GitHub is a web-based hosting service for software development projects that use the Git revision control system. It offers all of the distributed version control and source code management (SCM) functionality of Git, as well as a graphical user interface (GUI) and web interface that make it easy to manage projects with multiple collaborators.

GitHub is used by many open source projects, as well as by private companies for software development. It is also used by individuals for personal projects.

GitHub is a popular choice for software development because it is easy to use, reliable, and secure. It also offers a number of features that make it a good choice for collaboration, including issue tracking, pull requests, and wikis.

If you are interested in learning more about GitHub, there are a number of resources available online. The GitHub website has a comprehensive help section, and there are also a number of books and articles available on the subject.
Question: whatβ€˜s github?
Time consuming: 2.87s

VertexAI API + GPTCache using LangChain πŸ¦œοΈπŸ”— (exact match cache)#

Initalize the cache to run GPTCache and import LangChainLLMs from gptcache.adapter.langchain_models, which will automatically set the map data manager to match the exact cahe, more details refer to build your cache.

And if you ask the exact same two questions, the answer to the second question will be obtained from the cache without requesting the model again.

import time
from langchain import SQLDatabase, SQLDatabaseChain
from langchain.llms import VertexAI
from langchain import PromptTemplate, LLMChain



# the following initialises the cache
# -------------------------------------------------
from gptcache.adapter.langchain_models import LangChainLLMs
from gptcache import Cache
from gptcache.processor.pre import get_prompt

llm = VertexAI()


llm_cache = Cache()
llm_cache.init(
    pre_embedding_func=get_prompt,
)

cached_llm = LangChainLLMs(llm=llm)
answer = cached_llm(prompt=question, cache_obj=llm_cache)
# -------------------------------------------------

# before = time.time()
# print(answer)
# print("Read through Time Spent =", time.time() - before)

# before = time.time()
# answer = cached_llm(prompt=question, cache_obj=llm_cache)
# print(answer)
# print("Cache Hit Time Spent =", time.time() - before)

question = "What NFL team won the Super Bowl in the year Justin Bieber was born?"

before = time.time()
print(answer)
print("Read through Time Spent =", time.time() - before)

before = time.time()
answer = cached_llm(prompt=question, cache_obj=llm_cache)
print(answer)
print("Cache Hit Time Spent =", time.time() - before)

# for _ in range(2):
#     start_time = time.time()
#     response = predict_large_language_model_sample("octo-t2sql", "text-bison@001", 0.2, 256, 0.8, 40, question, "us-central1")
#     print(f'Question: {question}')
#     print("Time consuming: {:.2f}s".format(time.time() - start_time))
    # print(f'Answer: {response_text(response)}\n')
The New England Patriots won Super Bowl XXXIX in 2005, the year Justin Bieber was born.
Read through Time Spent = 0.0011386871337890625
The New England Patriots won Super Bowl XXXIX in 2005, the year Justin Bieber was born.
Cache Hit Time Spent = 0.0007178783416748047

VertexAI API + GPTCache, similar search cache#

Set the cache with embedding_func to generate embedding for the text, and data_manager to manager the cache data, similarity_evaluation to evaluate the similarities, more details refer to build your cache.

After obtaining an answer in response to several similar questions, the answers to subsequent questions can be retrieved from the cache without the need to request the model again.

How similar search works:

  • Similarity evaluator collects data from Cache Storage and Vector Store to determine similarity between input request and Vector Store requests

  • Request Router returns request that is most similar to input request from cache

import time


# def response_text(vertexai_resp):
#     return vertexai_resp['choices'][0]['message']['content']

from gptcache import cache
from langchain import SQLDatabase, SQLDatabaseChain
from gptcache.embedding import Onnx
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation

print("Cache loading.....")

onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
cache.init(
    embedding_func=onnx.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(),
    )


questions = [
    "what's github",
    "can you explain what GitHub is",
    "can you tell me more about GitHub",
    "what is the purpose of GitHub"
]

for question in questions:
    start_time = time.time()
    response = predict_large_language_model_sample("octo-t2sql", "text-bison@001", 0.2, 256, 0.8, 40, question, "us-central1")
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))

  # before = time.time()
  # print(answer)
  # print("Read through Time Spent =", time.time() - before)

  # before = time.time()
  # answer = cached_llm(prompt=question, cache_obj=llm_cache)
  # print(answer)
  # print("Cache Hit Time Spent =", time.time() - before)
Cache loading.....
Response from Model: GitHub is a web-based hosting service for software development projects that use the Git revision control system. It offers all of the distributed version control and source code management (SCM) functionality of Git, as well as a graphical user interface (GUI) and web interface, making it easy for teams to collaborate on software projects.

GitHub is used by many large organizations, including Google, Facebook, Amazon, and Microsoft. It is also popular with open source projects, such as the Linux kernel and the Apache web server.

GitHub is free for open source projects, but there is a paid subscription option for private projects. The paid subscription offers additional features, such as unlimited private repositories, priority support, and the ability to host private wikis and blogs.
Question: what's github
Time consuming: 2.41s
Response from Model: GitHub is a web-based hosting service for software development projects that use the Git revision control system. It offers all of the distributed version control and source code management (SCM) functionality of Git, as well as a graphical user interface (GUI) and web interface, making it easy for teams to collaborate on software projects.

GitHub is used by many open source projects, as well as by private companies for software development. It is also used by many educational institutions for teaching software engineering.

GitHub is a popular choice for software development because it is easy to use, has a large community of users, and offers a variety of features that make it well-suited for collaboration.

Here are some of the benefits of using GitHub:

* It is easy to use. GitHub has a simple and intuitive interface that makes it easy for developers of all levels of experience to use.
* It has a large community of users. GitHub has a large community of users who are willing to help each other out. This can be a valuable resource for developers who are stuck on a problem.
* It offers a variety of features. GitHub offers a variety of features that make it well-suited for collaboration, including issue tracking, pull requests, and code review.

Question: can you explain what GitHub is
Time consuming: 2.95s
Response from Model: GitHub is a web-based hosting service for software development projects that use the Git revision control system. It offers all of the distributed version control and source code management (SCM) functionality of Git, as well as a graphical user interface (GUI) and web interface, making it easy for teams to collaborate on software projects.

GitHub is used by many open source projects, as well as by private companies for software development. It is also used by many educational institutions for teaching software engineering.

GitHub is free for open source projects, and has a paid subscription service for private projects. The paid service offers additional features, such as private repositories, unlimited collaborators, and support.

GitHub is a popular choice for software development because it is easy to use, has a large community of users, and offers a variety of features that make it well-suited for collaboration.
Question: can you tell me more about GitHub
Time consuming: 2.80s
Response from Model: GitHub is a web-based hosting service for software development projects that use the Git revision control system. It offers all of the distributed version control and source code management (SCM) functionality of Git, as well as a number of additional features such as issue tracking, project management, wikis, and code review.

GitHub is used by many open source projects, as well as by private companies for software development. It is also a popular platform for hosting personal projects.

GitHub is free for open source projects and for private projects with fewer than five users. For private projects with more than five users, GitHub offers a paid subscription plan.

GitHub is a powerful tool for software development. It can help teams to collaborate more effectively, track changes to code, and manage projects. It is also a great platform for hosting personal projects.
Question: what is the purpose of GitHub
Time consuming: 2.79s