Visual Question Answering#

This example will show you how to use GPTCache and Replicate to implement question answering about images, it uses BLIP model to answer free-form questions about images in natural language. Where the Replicate will be used to return the answer, and GPTCache will cache the generated answer so that the next time the same or similar question about the image is requested, it can be returned directly from the cache, which can improve efficiency and reduce costs.

This bootcamp is divided into three parts: how to initialize gptcache, running the Replicate model to get the answer, and finally showing how to start the service with gradio. You can also try this example on Google Colab.

Initialize the gptcache#

Please install gptcache first, then we can initialize the cache. There are two ways to initialize the cache, the first is to use the map cache (exact match cache) and the second is to use the database cache (similar search cache), it is more recommended to use the second one, but you have to install the related requirements.

Before running the example, make sure the REPLICATE_API_TOKEN environment variable is set by executing echo $REPLICATE_API_TOKEN. If it is not already set, it can be set by using export REPLICATE_API_TOKEN=YOUR_API_TOKEN on Unix/Linux/MacOS systems or set REPLICATE_API_TOKEN=YOUR_API_TOKEN on Windows systems.

1. Init for exact match cache#

cache.init is used to initialize gptcache, the default is to use map to search for cached data, pre_embedding_func is used to pre-process the data inserted into the cache, and it will use the get_input_str method, more configuration refer to initialize Cache.

# from gptcache import cache
# from gptcache.processor.pre import get_input_str
# # init gptcache
# cache.init(pre_embedding_func=get_input_str)

2. Init for similar match cache#

When initializing gptcahe, the following four parameters are configured:

pre_embedding_func:

Pre-processing before extracting feature vectors, it will use the get_input_image_file_name method.
embedding_func:

The method to extract the image feature vector, you can refer to gptcache.embedding for options of image embedding methods.
data_manager:

DataManager for cache management. It is used for image feature vector, question and response answer in the example, it takes Milvus (please make sure it is started), you can also configure other vector storage, refer to VectorBase API.
similarity_evaluation:

The evaluation method after the cache hit. It evaluates the similarity between the current question and questions of cache hits. In this case, you can select ExactMatchEvaluation, OnnxModelEvaluation, NumpyNormEvaluation from gptcache.similarity_evaluation.

from gptcache import cache
from gptcache.adapter import openai
from gptcache.processor.pre import get_input_image_file_name

from gptcache.embedding import Timm
from gptcache.similarity_evaluation.onnx import OnnxModelEvaluation
from gptcache.manager import get_data_manager, CacheBase, VectorBase


timm = Timm()
cache_base = CacheBase('sqlite')
vector_base = VectorBase('milvus', host='localhost', port='19530', dimension=timm.dimension)
data_manager = get_data_manager(cache_base, vector_base)

cache.init(
    pre_embedding_func=get_input_image_file_name,
    embedding_func=timm.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=OnnxModelEvaluation(),
    )

Run replicate blip#

Then run replicate.run, which will use blip model to answer free-form questions about images in natural language.

Note that replicate here is imported from gptcache.adapter.replicate, which can be used to cache with gptcache at request time. Please download the merlion.png before running the following code.

from gptcache.adapter import replicate

output = replicate.run(
            "andreasjansson/blip-2:4b32258c42e9efd4288bb9910bc532a69727f9acd26aa08e175713a0a857a608",
            input={"image": open("./merlion.png", "rb"),
                   "question": "Which city is this photo taken?"}
        )
print(output)

singapore

Start with gradio#

Finally, we can start a gradio application to answer the questions about images. First define the vqa method, then start the service with gradio, as shown below:

def vqa(img, question):
    output = replicate.run(
            "andreasjansson/blip-2:4b32258c42e9efd4288bb9910bc532a69727f9acd26aa08e175713a0a857a608",
            input={"image": open(img, "rb"),
                   "question": question}
        )
    return output

import gradio

interface = gradio.Interface(vqa, 
                             [gradio.Image(source="upload", type="filepath"), gradio.Textbox(label="Question")],
                             gradio.Textbox(label="Answer")
                            )

interface.launch(inline=True)

Visual Question Answering

Contents