# Caching chat response

This notebook is to show you how to use Vertex AI to answer questions and teach you how to cache the  response for exact and similar matches with **gptcache**. It is relatively simple, you just need to add an extra step to initialize the cache.


In [None]:
!pip install google-cloud-aiplatform

In [None]:
! pip install -q gptcache langchain

Collecting google-cloud-aiplatform
  Downloading google_cloud_aiplatform-1.27.0-py2.py3-none-any.whl (2.6 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m2.6/2.6 MB[0m [31m39.2 MB/s[0m eta [36m0:00:00[0m
Collecting google-cloud-resource-manager<3.0.0dev,>=1.3.3 (from google-cloud-aiplatform)
  Downloading google_cloud_resource_manager-1.10.2-py2.py3-none-any.whl (321 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m321.3/321.3 kB[0m [31m32.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting shapely<2.0.0 (from google-cloud-aiplatform)
  Downloading Shapely-1.8.5.post1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.0 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m2.0/2.0 MB[0m 

#Authenticating and testing the VertexAI model

In [None]:
from google.colab import auth as google_auth
google_auth.authenticate_user()

import vertexai
from vertexai.preview.language_models import TextGenerationModel

def predict_large_language_model_sample(
    project_id: str,
    model_name: str,
    temperature: float,
    max_decode_steps: int,
    top_p: float,
    top_k: int,
    content: str,
    location: str = "us-central1",
    tuned_model_name: str = "",
    ) :
    """Predict using a Large Language Model."""
    vertexai.init(project=project_id, location=location)
    model = TextGenerationModel.from_pretrained(model_name)
    if tuned_model_name:
      model = model.get_tuned_model(tuned_model_name)
    response = model.predict(
        content,
        temperature=temperature,
        max_output_tokens=max_decode_steps,
        top_k=top_k,
        top_p=top_p,)
    print(f"Response from Model: {response.text}")
predict_large_language_model_sample("octo-t2sql", "text-bison@001", 0.2, 256, 0.8, 40, '''Give me ten interview questions for the role of software engineer''', "us-central1")

Response from Model: 1. What is your experience with project management?
2. What is your process for managing a project?
3. How do you handle unexpected challenges or roadblocks?
4. How do you communicate with stakeholders?
5. How do you measure the success of a project?
6. What are your strengths and weaknesses as a project manager?
7. What are your salary expectations?
8. What are your career goals?
9. What are your thoughts on the company's culture?
10. Why are you interested in this position?


Before running the example, make sure the first parameter of `predict_large_language_model_sample` is corresponding to your `project_id`. You will be prompted to authenticate.

Then we can learn the usage and acceleration effect of gptcache with the following code, which consists of three parts:

1.   Usual way
2.   Exact Search
3.   Similar Search




## VertexAI API standard usage

In [None]:
import time

# def response_text(vertexai_resp):
#     return vertexai_resp['choices'][0]['message']['content']


question = 'what‚Äòs github?'

# VertexAI API original usage
start_time = time.time()
response = predict_large_language_model_sample("octo-t2sql", "text-bison@001", 0.2, 256, 0.8, 40, question, "us-central1")

print(f'Question: {question}')
print("Time consuming: {:.2f}s".format(time.time() - start_time))
# print(f'Answer: {response_text(response)}\n')

Response from Model: GitHub is a web-based hosting service for software development projects that use the Git revision control system. It offers all of the distributed version control and source code management (SCM) functionality of Git, as well as a graphical user interface (GUI) and web interface that make it easy to manage projects with multiple collaborators.

GitHub is used by many open source projects, as well as by private companies for software development. It is also used by individuals for personal projects.

GitHub is a popular choice for software development because it is easy to use, reliable, and secure. It also offers a number of features that make it a good choice for collaboration, including issue tracking, pull requests, and wikis.

If you are interested in learning more about GitHub, there are a number of resources available online. The GitHub website has a comprehensive help section, and there are also a number of books and articles available on the subject.
Questi

## VertexAI API + GPTCache using LangChain ü¶úÔ∏èüîó (exact match cache)

Initalize the cache to run GPTCache and import `LangChainLLMs` from `gptcache.adapter.langchain_models`, which will automatically set the map data manager to match the exact cahe, more details refer to [build your cache](https://gptcache.readthedocs.io/en/dev/usage.html#build-your-cache).

And if you ask the exact same two questions, the answer to the second question will be obtained from the cache without requesting the model again.

In [None]:
import time
from langchain import SQLDatabase, SQLDatabaseChain
from langchain.llms import VertexAI
from langchain import PromptTemplate, LLMChain



# the following initialises the cache
# -------------------------------------------------
from gptcache.adapter.langchain_models import LangChainLLMs
from gptcache import Cache
from gptcache.processor.pre import get_prompt

llm = VertexAI()


llm_cache = Cache()
llm_cache.init(
    pre_embedding_func=get_prompt,
)

cached_llm = LangChainLLMs(llm=llm)
answer = cached_llm(prompt=question, cache_obj=llm_cache)
# -------------------------------------------------

# before = time.time()
# print(answer)
# print("Read through Time Spent =", time.time() - before)

# before = time.time()
# answer = cached_llm(prompt=question, cache_obj=llm_cache)
# print(answer)
# print("Cache Hit Time Spent =", time.time() - before)

question = "What NFL team won the Super Bowl in the year Justin Bieber was born?"

before = time.time()
print(answer)
print("Read through Time Spent =", time.time() - before)

before = time.time()
answer = cached_llm(prompt=question, cache_obj=llm_cache)
print(answer)
print("Cache Hit Time Spent =", time.time() - before)

# for _ in range(2):
#     start_time = time.time()
#     response = predict_large_language_model_sample("octo-t2sql", "text-bison@001", 0.2, 256, 0.8, 40, question, "us-central1")
#     print(f'Question: {question}')
#     print("Time consuming: {:.2f}s".format(time.time() - start_time))
    # print(f'Answer: {response_text(response)}\n')

The New England Patriots won Super Bowl XXXIX in 2005, the year Justin Bieber was born.
Read through Time Spent = 0.0011386871337890625
The New England Patriots won Super Bowl XXXIX in 2005, the year Justin Bieber was born.
Cache Hit Time Spent = 0.0007178783416748047


## VertexAI API + GPTCache, similar search cache

Set the cache with `embedding_func` to generate embedding for the text, and `data_manager` to manager the cache data, `similarity_evaluation` to evaluate the similarities, more details refer to [build your cache](https://gptcache.readthedocs.io/en/dev/usage.html#build-your-cache).

After obtaining an answer in response to several similar questions, the answers to subsequent questions can be retrieved from the cache without the need to request the model again.

How similar search works:

*   Similarity evaluator collects data from **Cache Storage and Vector Store** to determine similarity between input request and Vector Store requests
*   Request Router returns request that is most similar to input request from cache



In [None]:
import time


# def response_text(vertexai_resp):
#     return vertexai_resp['choices'][0]['message']['content']

from gptcache import cache
from langchain import SQLDatabase, SQLDatabaseChain
from gptcache.embedding import Onnx
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation

print("Cache loading.....")

onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
cache.init(
    embedding_func=onnx.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(),
    )


questions = [
    "what's github",
    "can you explain what GitHub is",
    "can you tell me more about GitHub",
    "what is the purpose of GitHub"
]

for question in questions:
    start_time = time.time()
    response = predict_large_language_model_sample("octo-t2sql", "text-bison@001", 0.2, 256, 0.8, 40, question, "us-central1")
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))

  # before = time.time()
  # print(answer)
  # print("Read through Time Spent =", time.time() - before)

  # before = time.time()
  # answer = cached_llm(prompt=question, cache_obj=llm_cache)
  # print(answer)
  # print("Cache Hit Time Spent =", time.time() - before)



Cache loading.....
Response from Model: GitHub is a web-based hosting service for software development projects that use the Git revision control system. It offers all of the distributed version control and source code management (SCM) functionality of Git, as well as a graphical user interface (GUI) and web interface, making it easy for teams to collaborate on software projects.

GitHub is used by many large organizations, including Google, Facebook, Amazon, and Microsoft. It is also popular with open source projects, such as the Linux kernel and the Apache web server.

GitHub is free for open source projects, but there is a paid subscription option for private projects. The paid subscription offers additional features, such as unlimited private repositories, priority support, and the ability to host private wikis and blogs.
Question: what's github
Time consuming: 2.41s
Response from Model: GitHub is a web-based hosting service for software development projects that use the Git revisi