😍 Contributing to GPTCache#

Before contributing to GPTCache, it is recommended to read the usage doc example-doc. These two articles will introduce how to use GPTCache and the meaning of parameters of related functions.

In the process of contributing, pay attention to the parameter type, because there is currently no type restriction added.

Note that development MUST be based on the dev branch

First check which part you want to contribute:

Add a method to pre-process the llm request
Add a scalar store type
Add a vector store type
Add a new data manager
Add a embedding function
Add a similarity evaluation function
Add a method to post-process the cache answer list
Add a new process in handling chatgpt requests

Lazy import and automatic installation#

For newly added third-party dependencies, lazy import and automatic installation are required. Implementation consists of the following steps:

Lazy import

# The __init__.py file of the same directory under the new file
__all__ = ['Milvus']

from gptcache.utils.lazy_import import LazyImport

milvus = LazyImport('milvus', globals(), 'gptcache.cache.vector_data.milvus')


def Milvus(**kwargs):
    return milvus.Milvus(**kwargs)

Automatic installation

# 2.1 Add the import method
# add new method to util/__init__.py
__all__ = ['import_pymilvus']

from gptcache.utils.dependency_control import prompt_install


def import_pymilvus():
    try:
        # pylint: disable=unused-import
        import pymilvus
    except ModuleNotFoundError as e:  # pragma: no cover
        prompt_install('pymilvus')
        import pymilvus  # pylint: disable=ungrouped-imports

# 2.2 use the import method in your file
from gptcache.util import import_pymilvus
import_pymilvus()

Add a method to pre-process the llm request#

refer to the implementation of Pre.

Make sure the input params, the data represents the original request dictionary object
Implement the post method
Add a usage example to example directory and add the corresponding content to example.md README.md

# The origin openai request
import openai

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

# This is the pre-process function of openai request, which is to get the last message
def last_content(data, **_):
    return data.get("messages")[-1]["content"]

Add a cache storage type#

refer to the implementation of SQLDataBase.

Implement the CacheStorage interface
Make sure the newly added third-party libraries are lazy imported and automatic installation
Add the new store to the CacheBase method
Add a usage example to example directory and add the corresponding content to example.md README.md

Add a vector store type#

refer to the implementation of milvus.

Implement the VectorBase interface
Make sure the newly added third-party libraries are lazy imported and automatic installation
Add the new store to the VectorBase method
Add a usage example to example directory and add the corresponding content to example.md README.md

Add a new data manager#

refer to the implementation of MapDataManager, SSDataManager.

Implement the DataManager interface
Add the new store to the get_data_manager method
Add a usage example to example directory and add the corresponding content to example.md README.md

Add a embedding function#

refer to the implementation of cohere or openai.

Add a new python file to embedding directory
Make sure the newly added third-party libraries are lazy imported and automatic installation
Implement the embedding function and make sure your output dimension
Add a usage example to example directory and add the corresponding content to example.md README.md

Add a similarity evaluation function#

refer to the implementation of SearchDistanceEvaluation or OnnxModelEvaluation

Implement the SimilarityEvaluation interface
Make sure the range of return value, the range method return the min and max value
Make sure the input params of evaluation, you can learn more about in the user view model

rank = chat_cache.evaluation_func({
    "question": pre_embedding_data,
    "embedding": embedding_data,
}, {
    "question": cache_question,
    "answer": cache_answer,
    "search_result": cache_data,
}, extra_param=context.get('evaluation', None))

Make sure the newly added third-party libraries are lazy imported and automatic installation
Implement the similarity evaluation function
Add a usage example to example directory and add the corresponding content to example.md README.md

Add a method to post-process the cache answer list#

refer to the implementation of first or random_one

Make sure the input params, you can learn more about in the adapter
Make sure the newly added third-party libraries are lazy imported and automatic installation
Implement the post method
Add a usage example to example directory and add the corresponding content to example.md README.md

# Get the most similar one from multiple results
def first(messages):
    return messages[0]


# Randomly fetch one of many results
def random_one(messages):
    return random.choice(messages)

Add a new process in handling chatgpt requests#

Need to have a clear understanding of the current process, refer to the adapter
Add a new process
Make sure all examples work properly

😍 Contributing to GPTCache

Contents