Feature#

  • Support the openai chat completion normal and stream request

  • Get top_k similar search results, it can be set when creating the data manager

  • Support the cache chain, see: Cache#next_cache

bak_cache = Cache()
bak_cache.init()
cache.init(next_cache=bak_cache)
  • Whether to completely skip the current cache, that is, do not search the cache or save the Chat GPT results, see: Cache#cache_enable_func

  • In the cache initialization phase, no cache search is performed, but save the result returned by the chat gpt to cache, see: cache_skip=True in create request

openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=mock_messages,
    cache_skip=True,
)
  • Like Lego bricks, custom assemble all modules, including:

    • Adapter: The user interface to adapt different LLM model requests to the GPTCache protocol

    • Pre-processor: Extracts the key information from the request and preprocess

    • Context Buffer: Maintains session context

    • Encoder: Embed the text into a dense vector for similarity search

    • Cache manager: which includes searching, saving, or evicting data

    • Ranker: Evaluate similarity by judging the quality of cached answers

    • Post-processor: Determine which cached answers to the user, and generate the response