Manager#
Index
vector_data.base#
- class gptcache.manager.vector_data.base.VectorData(id: int, data: numpy.ndarray)[source]#
Bases:
object
- data: numpy.ndarray#
vector_data.docarray_index#
- class gptcache.manager.vector_data.docarray_index.DocarrayVectorData(*, id: int, data: docarray.typing.tensor.ndarray.NdArray)[source]#
Bases:
docarray.base_doc.doc.BaseDoc
Class representing a vector data element with an ID and associated data.
- data: docarray.typing.tensor.ndarray.NdArray#
- class gptcache.manager.vector_data.docarray_index.DocArrayIndex(index_file_path: str, top_k: int)[source]#
Bases:
gptcache.manager.vector_data.base.VectorBase
Class representing in-memory exact nearest neighbor index for vector search.
- Parameters
- mul_add(datas: List[gptcache.manager.vector_data.base.VectorData]) None [source]#
Add multiple vector data elements to the index.
- Parameters
datas – A list of vector data elements to be added.
- search(data: numpy.ndarray, top_k: int = - 1) Optional[List[Tuple[float, int]]] [source]#
Search for the nearest vector data elements in the index.
- Parameters
data – The query vector data.
top_k – The number of top matches to return.
- Returns
A list of tuples, each containing the match score and the ID of the matched vector data element.
- rebuild(ids: Optional[List[int]] = None) bool [source]#
In the case of DocArrayIndex, the rebuild operation is not needed.
vector_data.manager#
- class gptcache.manager.vector_data.manager.VectorBase[source]#
Bases:
object
VectorBase to manager the vector base.
- Generate specific VectorBase with the configuration. For example, setting for
Milvus (with , host, port, password, secure, collection_name, index_params, search_params, local_mode, local_data params), Faiss (with , index_path, dimension, top_k params), Chromadb (with top_k, client_settings, persist_directory, collection_name params), Hnswlib (with index_file_path, dimension, top_k, max_elements params). pgvector (with url, collection_name, index_params, top_k, dimension params).
- Parameters
name (str) – the name of the vectorbase, it is support ‘milvus’, ‘faiss’, ‘chromadb’, ‘hnswlib’ now.
top_k (int) – the number of the vectors results to return, defaults to 1.
dimension (int) – the dimension of the vector, defaults to 0.
index_path (str) – the path to Faiss index, defaults to ‘faiss.index’.
host (str) – the host for Milvus vector database, defaults to ‘localhost’.
port (str) – the port for Milvus vector database, defaults to ‘19530’.
user (str) – the user for Zilliz Cloud, defaults to “”.
password (str) – the password for Zilliz Cloud, defaults to “”.
secure – whether it is https with Zilliz Cloud, defaults to False.
index_params (dict) – the index parameters for Milvus, defaults to the HNSW index: {‘metric_type’: ‘L2’, ‘index_type’: ‘HNSW’, ‘params’: {‘M’: 8, ‘efConstruction’: 64}}.
search_params (dict) – the index parameters for Milvus, defaults to None.
collection_name (str) – the name of the collection for Milvus vector database, defaults to ‘gptcache’.
local_mode (bool) – if true, will start a local milvus server.
local_data (str) – required when local_mode is True.
url (str) – the connection url for PostgreSQL database, defaults to ‘postgresql://postgres@localhost:5432/postgres’
index_params – the index parameters for pgvector.
collection_name – the prefix of the table for PostgreSQL pgvector, defaults to ‘gptcache’.
client_settings (Settings) – the setting for Chromadb.
persist_directory (str) – the directory to persist, defaults to ‘.chromadb/’ in the current directory.
index_path – the path to hnswlib index, defaults to ‘hnswlib_index.bin’.
max_elements (int) – max_elements of hnswlib, defaults 100000.
vector_data.milvus#
- class gptcache.manager.vector_data.milvus.Milvus(host: str = 'localhost', port: str = '19530', user: str = '', password: str = '', secure: bool = False, collection_name: str = 'gptcache', dimension: int = 0, top_k: int = 1, index_params: Optional[dict] = None, search_params: Optional[dict] = None, local_mode: bool = False, local_data: str = './milvus_data')[source]#
Bases:
gptcache.manager.vector_data.base.VectorBase
vector store: Milvus
- Parameters
host (str) – the host for Milvus vector database, defaults to ‘localhost’.
port (str) – the port for Milvus vector database, defaults to ‘19530’.
user (str) – the user for Zilliz Cloud, defaults to “”.
password (str) – the password for Zilliz Cloud, defaults to “”.
secure – whether it is https with Zilliz Cloud, defaults to False.
collection_name (str) – the name of the collection for Milvus vector database, defaults to ‘gptcache’.
dimension (int) – the dimension of the vector, defaults to 0.
top_k (int) – the number of the vectors results to return, defaults to 1.
index_params (dict) – the index parameters for Milvus, defaults to the HNSW index: {‘metric_type’: ‘L2’, ‘index_type’: ‘HNSW’, ‘params’: {‘M’: 8, ‘efConstruction’: 64}}.
local_mode (bool) – if true, will start a local milvus server.
local_data (str) – required when local_mode is True.
- SEARCH_PARAM = {'ANNOY': {'metric_type': 'L2', 'params': {'search_k': 10}}, 'AUTOINDEX': {'metric_type': 'L2', 'params': {}}, 'HNSW': {'metric_type': 'L2', 'params': {'ef': 10}}, 'IVF_FLAT': {'metric_type': 'L2', 'params': {'nprobe': 10}}, 'IVF_HNSW': {'metric_type': 'L2', 'params': {'ef': 10, 'nprobe': 10}}, 'IVF_PQ': {'metric_type': 'L2', 'params': {'nprobe': 10}}, 'IVF_SQ8': {'metric_type': 'L2', 'params': {'nprobe': 10}}, 'RHNSW_FLAT': {'metric_type': 'L2', 'params': {'ef': 10}}, 'RHNSW_PQ': {'metric_type': 'L2', 'params': {'ef': 10}}, 'RHNSW_SQ': {'metric_type': 'L2', 'params': {'ef': 10}}}#
- mul_add(datas: List[gptcache.manager.vector_data.base.VectorData])[source]#
- search(data: numpy.ndarray, top_k: int = - 1)[source]#
- update_embeddings(data_id: int, emb: numpy.ndarray)[source]#
vector_data.qdrant#
- class gptcache.manager.vector_data.qdrant.QdrantVectorStore(url: Optional[str] = None, port: Optional[int] = 6333, grpc_port: int = 6334, prefer_grpc: bool = False, https: Optional[bool] = None, api_key: Optional[str] = None, prefix: Optional[str] = None, timeout: Optional[float] = None, host: Optional[str] = None, collection_name: Optional[str] = 'gptcache', location: Optional[str] = './qdrant', dimension: int = 0, top_k: int = 1, flush_interval_sec: int = 5, index_params: Optional[dict] = None)[source]#
Bases:
gptcache.manager.vector_data.base.VectorBase
Qdrant Vector Store
- mul_add(datas: List[gptcache.manager.vector_data.base.VectorData])[source]#
- search(data: numpy.ndarray, top_k: int = - 1)[source]#
vector_data.pgvector#
- class gptcache.manager.vector_data.pgvector.PGVector(url: str, index_params: dict, collection_name: str = 'gptcache', dimension: int = 0, top_k: int = 1)[source]#
Bases:
gptcache.manager.vector_data.base.VectorBase
vector store: pgvector
- Parameters
url (str) – the connection url for PostgreSQL database, defaults to ‘postgresql://postgres@localhost:5432/postgres’.
dimension (int) – the dimension of the vector, defaults to 0.
top_k (int) – the number of the vectors results to return, defaults to 1.
index_params (dict) – the index parameters for pgvector, defaults to ‘vector_l2_ops’ index: {“index_type”: “L2”, “params”: {“lists”: 100, “probes”: 10}.
- INDEX_PARAM = {'L2': {'name': 'vector_l2_ops', 'operator': '<->'}, 'cosine': {'name': 'vector_cosine_ops', 'operator': '<=>'}, 'inner_product': {'name': 'vector_ip_ops', 'operator': '<->'}}#
- mul_add(datas: List[gptcache.manager.vector_data.base.VectorData])[source]#
- search(data: numpy.ndarray, top_k: int = - 1)[source]#
vector_data.faiss#
- class gptcache.manager.vector_data.faiss.Faiss(index_file_path, dimension, top_k)[source]#
Bases:
gptcache.manager.vector_data.base.VectorBase
vector store: Faiss
- Parameters
- mul_add(datas: List[gptcache.manager.vector_data.base.VectorData])[source]#
- search(data: numpy.ndarray, top_k: int = - 1)[source]#
vector_data.redis_vectorstore#
- class gptcache.manager.vector_data.redis_vectorstore.RedisVectorStore(host: str = 'localhost', port: str = '6379', username: str = '', password: str = '', dimension: int = 0, collection_name: str = 'gptcache', top_k: int = 1, namespace: str = '')[source]#
Bases:
gptcache.manager.vector_data.base.VectorBase
vector store: Redis
- Parameters
host (str) – redis host, defaults to “localhost”.
port (str) – redis port, defaults to “6379”.
username (str) – redis username, defaults to “”.
password (str) – redis password, defaults to “”.
dimension (int) – the dimension of the vector, defaults to 0.
collection_name (str) – the name of the index for Redis, defaults to “gptcache”.
top_k (int) – the number of the vectors results to return, defaults to 1.
Example
from gptcache.manager import VectorBase vector_base = VectorBase("redis", dimension=10)
- mul_add(datas: List[gptcache.manager.vector_data.base.VectorData])[source]#
- search(data: numpy.ndarray, top_k: int = - 1)[source]#
vector_data.chroma#
- class gptcache.manager.vector_data.chroma.Chromadb(client_settings=None, persist_directory=None, collection_name: str = 'gptcache', top_k: int = 1)[source]#
Bases:
gptcache.manager.vector_data.base.VectorBase
vector store: Chromadb
- Parameters
client_settings (Settings) – the setting for Chromadb.
persist_directory (str) – the directory to persist, defaults to .chromadb/ in the current directory.
collection_name (str) – the name of the collection in Chromadb, defaults to ‘gptcache’.
top_k (int) – the number of the vectors results to return, defaults to 1.
- mul_add(datas: List[gptcache.manager.vector_data.base.VectorData])[source]#
- update_embeddings(data_id: str, emb: numpy.ndarray)[source]#
vector_data.hnswlib_store#
- class gptcache.manager.vector_data.hnswlib_store.Hnswlib(index_file_path: str, dimension: int, top_k: int, max_elements: int)[source]#
Bases:
gptcache.manager.vector_data.base.VectorBase
vector store: hnswlib
- Parameters
- add(key: int, data: numpy.ndarray)[source]#
- mul_add(datas: List[gptcache.manager.vector_data.base.VectorData])[source]#
- search(data: numpy.ndarray, top_k: int = - 1)[source]#
vector_data.usearch#
vector_data.weaviate#
scalar_data.sql_storage#
- gptcache.manager.scalar_data.sql_storage.get_models(table_prefix, db_type, table_len_config)[source]#
- class gptcache.manager.scalar_data.sql_storage.SQLStorage(db_type: str = 'sqlite', url: str = 'sqlite:///./sqlite.db', table_name: str = 'gptcache', table_len_config=None)[source]#
Bases:
gptcache.manager.scalar_data.base.CacheStorage
Using sqlalchemy to manage SQLite, PostgreSQL, MySQL, MariaDB, SQL Server and Oracle.
- Parameters
name (str) – the name of the cache storage, it is support ‘sqlite’, ‘postgresql’, ‘mysql’, ‘mariadb’, ‘sqlserver’ and ‘oracle’ now.
sql_url (str) – the url of the sql database for cache, such as ‘<db_type>+<db_driver>://<username>:<password>@<host>:<port>/<database>’, and the default value is related to the cache_store parameter, ‘sqlite:///./sqlite.db’ for ‘sqlite’, ‘duckdb:///./duck.db’ for ‘duckdb’, ‘postgresql+psycopg2://postgres:123456@127.0.0.1:5432/postgres’ for ‘postgresql’, ‘mysql+pymysql://root:123456@127.0.0.1:3306/mysql’ for ‘mysql’, ‘mariadb+pymysql://root:123456@127.0.0.1:3307/mysql’ for ‘mariadb’, ‘mssql+pyodbc://sa:Strongpsw_123@127.0.0.1:1434/msdb?driver=ODBC+Driver+17+for+SQL+Server’ for ‘sqlserver’, ‘oracle+cx_oracle://oracle:123456@127.0.0.1:1521/?service_name=helowin&encoding=UTF-8&nencoding=UTF-8’ for ‘oracle’.
table_name (str) – the table name for sql database, defaults to ‘gptcache’.
- batch_insert(all_data: List[gptcache.manager.scalar_data.base.CacheData])[source]#
- get_data_by_id(key: int) Optional[gptcache.manager.scalar_data.base.CacheData] [source]#
scalar_data.base#
- class gptcache.manager.scalar_data.base.DataType(value)[source]#
Bases:
enum.IntEnum
An enumeration.
- STR = 0#
- IMAGE_BASE64 = 1#
- IMAGE_URL = 2#
- class gptcache.manager.scalar_data.base.Answer(answer: Any, answer_type: int = DataType.STR)[source]#
Bases:
object
- data_type:
0: str 1: base64 image
- answer: Any#
- class gptcache.manager.scalar_data.base.QuestionDep(name: str, data: str, dep_type: int = DataType.STR)[source]#
Bases:
object
- class gptcache.manager.scalar_data.base.Question(content: str, deps: Optional[List[gptcache.manager.scalar_data.base.QuestionDep]] = None)[source]#
Bases:
object
- deps: Optional[List[gptcache.manager.scalar_data.base.QuestionDep]] = None#
- class gptcache.manager.scalar_data.base.CacheData(question, answers, embedding_data=None, session_id=None, create_on=None, last_access=None)[source]#
Bases:
object
- question: Union[str, gptcache.manager.scalar_data.base.Question]#
- answers: List[gptcache.manager.scalar_data.base.Answer]#
- embedding_data: Optional[numpy.ndarray] = None#
- create_on: Optional[datetime.datetime] = None#
- last_access: Optional[datetime.datetime] = None#
- class gptcache.manager.scalar_data.base.CacheStorage[source]#
Bases:
object
BaseStorage for scalar data.
- abstract batch_insert(all_data: List[gptcache.manager.scalar_data.base.CacheData])[source]#
scalar_data.manager#
- class gptcache.manager.scalar_data.manager.CacheBase[source]#
Bases:
object
CacheBase to manager the cache storage.
- Generate specific CacheStorage with the configuration. For example, setting for
SQLDataBase (with name, sql_url and table_name params) to manage SQLite, PostgreSQL, MySQL, MariaDB, SQL Server and Oracle.
- Parameters
name (str) – the name of the cache storage, it is support ‘sqlite’, ‘postgresql’, ‘mysql’, ‘mariadb’, ‘sqlserver’ and ‘oracle’ now.
sql_url (str) –
the url of the sql database for cache, such as ‘<db_type>+<db_driver>://<username>:<password>@<host>:<port>/<database>’, and the default value is related to the cache_store parameter,
’sqlite:///./sqlite.db’ for ‘sqlite’,
’duckdb:///./duck.db’ for ‘duckdb’,
’postgresql+psycopg2://postgres:123456@127.0.0.1:5432/postgres’ for ‘postgresql’,
’mysql+pymysql://root:123456@127.0.0.1:3306/mysql’ for ‘mysql’,
’mariadb+pymysql://root:123456@127.0.0.1:3307/mysql’ for ‘mariadb’,
’mssql+pyodbc://sa:Strongpsw_123@127.0.0.1:1434/msdb?driver=ODBC+Driver+17+for+SQL+Server’ for ‘sqlserver’,
’oracle+cx_oracle://oracle:123456@127.0.0.1:1521/?service_name=helowin&encoding=UTF-8&nencoding=UTF-8’ for ‘oracle’.
table_name (str) – the table name for sql database, defaults to ‘gptcache’.
table_len_config (dict) –
the table length config for sql database, defaults to {}. the key includes:
’question_question’: the question column size in the question table, default to 3000.
’answer_answer’: the answer column size in the answer table, default to 3000.
’session_id’: the session id column size in the session table, default to 1000.
’dep_name’: the name column size in the dep table, default to 1000.
’dep_data’: the data column size in the dep table, default to 3000.
- Returns
CacheStorage.
Example
from gptcache.manager import CacheBase cache_base = CacheBase('sqlite')
scalar_data.redis_storage#
- gptcache.manager.scalar_data.redis_storage.get_models(global_key: str, redis_connection: redis.client.Redis)[source]#
Get all the models for the given global key and redis connection. :param global_key: Global key will be used as a prefix for all the keys :type global_key: str
- Parameters
redis_connection – Redis connection to use for all the models.
Note: This needs to be explicitly mentioned in Meta class for each Object Model, otherwise it will use the default connection from the pool. :type redis_connection: Redis
- class gptcache.manager.scalar_data.redis_storage.RedisCacheStorage(global_key_prefix='gptcache', host: str = 'localhost', port: int = 6379, **kwargs)[source]#
Bases:
gptcache.manager.scalar_data.base.CacheStorage
Using redis-om as OM to store data in redis cache storage
- Parameters
host – redis host, default value ‘localhost’ :type host: str :param port: redis port, default value 27017 :type port: int :param global_key_prefix: A global prefix for keys against which data is stored. For example, for a global_key_prefix =’gptcache’, keys would be constructed would look like this: gptcache:questions:abc123 :type global_key_prefix: str :param kwargs: Additional parameters to provide in order to create redis om connection
Example
from gptcache.manager import CacheBase, manager_factory cache_store = CacheBase('redis', redis_host="localhost", redis_port=6379, global_key_prefix="gptcache", ) # or data_manager = manager_factory("mongo,faiss", data_dir="./workspace", scalar_params={ "redis_host"="localhost", "redis_port"=6379, "global_key_prefix"="gptcache", }, vector_params={"dimension": 128}, )
- batch_insert(all_data: List[gptcache.manager.scalar_data.base.CacheData])[source]#
scalar_data.mongo#
- class gptcache.manager.scalar_data.mongo.MongoStorage(host: str = 'localhost', port: int = 27017, dbname: str = 'gptcache', username: Optional[str] = None, password: Optional[str] = None, **kwargs)[source]#
Bases:
gptcache.manager.scalar_data.base.CacheStorage
Using mongoengine as ORM to manage mongodb documents. By default, data is stored ‘gptcache’ database and following collections are created to store the data
‘sessions’
‘answers’
‘questions’
‘question_deps’
- Parameters
host (str) – mongodb host, default value ‘localhost’
port – mongodb port, default value 27017
dbname – database name, default value ‘gptcache’
:param : Mongo database name, default value ‘gptcache’ :type host: str :param username: username for authentication, default value None :type host: str :param password: password for authentication, default value None :type host: str
Example
from gptcache.manager import CacheBase, manager_factory cache_store = CacheBase('mongo', mongo_host="localhost", mongo_port=27017, dbname="gptcache", username=None, password=None, ) # or data_manager = manager_factory("mongo,faiss", data_dir="./workspace", scalar_params={ "mongo_host": "localhost", "mongo_port": 27017, "dbname"="gptcache", "username"="", "password"="", }, vector_params={"dimension": 128}, )
- batch_insert(all_data: List[gptcache.manager.scalar_data.base.CacheData])[source]#
- get_data_by_id(key) Optional[gptcache.manager.scalar_data.base.CacheData] [source]#
eviction.base#
eviction.manager#
eviction.memory_cache#
data_manager#
- class gptcache.manager.data_manager.DataManager[source]#
Bases:
object
DataManager manage the cache data, including save and search
- abstract import_data(questions: List[Any], answers: List[Any], embedding_datas: List[Any], session_ids: List[Optional[str]])[source]#
- abstract get_scalar_data(res_data, **kwargs) gptcache.manager.scalar_data.base.CacheData [source]#
- abstract search(embedding_data, **kwargs)[source]#
search the data in the cache store accrodding to the embedding data
- Returns
a list of search result, [[score, id], [score, id], …]
- class gptcache.manager.data_manager.MapDataManager(data_path, max_size, get_data_container=None)[source]#
Bases:
gptcache.manager.data_manager.DataManager
MapDataManager, store all data in a map data structure.
- Parameters
Example
from gptcache.manager import get_data_manager data_manager = get_data_manager("data_map.txt", 1000)
- import_data(questions: List[Any], answers: List[Any], embedding_datas: List[Any], session_ids: List[Optional[str]])[source]#
- get_scalar_data(res_data, **kwargs) gptcache.manager.scalar_data.base.CacheData [source]#
- class gptcache.manager.data_manager.SSDataManager(s: gptcache.manager.scalar_data.base.CacheStorage, v: gptcache.manager.vector_data.base.VectorBase, o: Optional[gptcache.manager.object_data.base.ObjectBase], max_size, clean_size, policy='LRU')[source]#
Bases:
gptcache.manager.data_manager.DataManager
Generate SSDataManage to manager the data.
- Parameters
s (CacheStorage) – CacheStorage to manager the scalar data, it can be generated with
gptcache.manager.CacheBase()
.v (VectorBase) – VectorBase to manager the vector data, it can be generated with
gptcache.manager.VectorBase()
.max_size (int) – the max size for the cache, defaults to 1000.
clean_size (int) – the size to clean up, defaults to max_size * 0.2.
eviction (str) – The eviction policy, it is support “LRU” and “FIFO” now, and defaults to “LRU”.
- save(question, answer, embedding_data, **kwargs)[source]#
Save the data and vectors to cache and vector storage.
- Parameters
Example
import numpy as np from gptcache.manager import get_data_manager, CacheBase, VectorBase data_manager = get_data_manager(CacheBase('sqlite'), VectorBase('faiss', dimension=128)) data_manager.save('hello', 'hi', np.random.random((128, )).astype('float32'))
- import_data(questions: List[Any], answers: List[gptcache.manager.scalar_data.base.Answer], embedding_datas: List[Any], session_ids: List[Optional[str]])[source]#
- get_scalar_data(res_data, **kwargs) Optional[gptcache.manager.scalar_data.base.CacheData] [source]#
- search(embedding_data, **kwargs)[source]#
search the data in the cache store accrodding to the embedding data
- Returns
a list of search result, [[score, id], [score, id], …]
object_data.base#
object_data.manager#
- class gptcache.manager.object_data.manager.ObjectBase[source]#
Bases:
object
ObjectBase to manager the object storage.
- Generate specific ObjectStorage with the configuration. For example, setting for
ObjectBase (with name) to manage LocalObjectStorage, S3 object storage.
- Parameters
- Returns
ObjectStorage.
Example
from gptcache.manager import ObjectBase obj_storage = ObjectBase('local', path='./')
object_data.s3_storage#
object_data.local_storage#
- class gptcache.manager.object_data.local_storage.LocalObjectStorage(local_root: str)[source]#
Bases:
gptcache.manager.object_data.base.ObjectBase
Local object storage
eviction.eviction_manager#
- class gptcache.manager.eviction_manager.EvictionManager(scalar_storage, vector_base)[source]#
Bases:
object
EvictionManager to manager the eviction policy.
- Parameters
scalar_storage (
CacheStorage
) – CacheStorage to manager the scalar data.vector_base (
VectorBase
) – VectorBase to manager the vector data.
- MAX_MARK_COUNT = 5000#
- MAX_MARK_RATE = 0.1#
- BATCH_SIZE = 100000#
- REBUILD_CONDITION = 5#
factory#
- gptcache.manager.factory.manager_factory(manager='map', data_dir='./', max_size=1000, clean_size=None, eviction: str = 'LRU', get_data_container: Optional[Callable] = None, scalar_params=None, vector_params=None, object_params=None)[source]#
- Factory of DataManager.
By using this factory method, you only need to specify the root directory of the data, and it can automatically manage all the local files.
- Parameters
manager (str) – Type of DataManager. Supports: Map, or {scalar_name},{vector_name} or {scalar_name},{vector_name},{object_name}
data_dir (str) – Root path for data storage.
max_size (int) – the max size for the cache, defaults to 1000.
clean_size (int) – the size to clean up, defaults to max_size * 0.2.
eviction (str) – the eviction policy, it is support “LRU” and “FIFO” now, and defaults to “LRU”.
get_data_container (Callable) – a Callable to get the data container, defaults to None.
scalar_params (dict) – Params of scalar storage.
vector_params (dict) – Params of vector storage.
object_params (dict) – Params of object storage.
- Returns
SSDataManager or MapDataManager.
Example
from gptcache.manager import manager_factory data_manager = manager_factory("sqlite,faiss", data_dir="./workspace", vector_params={"dimension": 128})
- gptcache.manager.factory.get_data_manager(cache_base: Optional[Union[gptcache.manager.scalar_data.CacheBase, str]] = None, vector_base: Optional[Union[gptcache.manager.vector_data.VectorBase, str]] = None, object_base: Optional[Union[gptcache.manager.object_data.ObjectBase, str]] = None, max_size: int = 1000, clean_size: Optional[int] = None, eviction: str = 'LRU', data_path: str = 'data_map.txt', get_data_container: Optional[Callable] = None)[source]#
- Generate SSDataManager (with cache_base, vector_base, max_size, clean_size and eviction params),
or MAPDataManager (with data_path, max_size and get_data_container params) to manager the data.
- Parameters
cache_base (
CacheBase
or str) – a CacheBase object, or the name of the cache storage, it is support ‘sqlite’, ‘duckdb’, ‘postgresql’, ‘mysql’, ‘mariadb’, ‘sqlserver’ and ‘oracle’ now.vector_base (
VectorBase
or str) – a VectorBase object, or the name of the vector storage, it is support ‘milvus’, ‘faiss’ and ‘chromadb’ now.object_base (
ObjectBase
or str) – a object storage, supports local path and s3.max_size (int) – the max size for the cache, defaults to 1000.
clean_size (int) – the size to clean up, defaults to max_size * 0.2.
eviction (str) – the eviction policy, it is support “LRU” and “FIFO” now, and defaults to “LRU”.
data_path (str) – the path to save the map data, defaults to ‘data_map.txt’.
get_data_container (Callable) – a Callable to get the data container, defaults to None.
- Returns
SSDataManager or MapDataManager.
Example
from gptcache.manager import get_data_manager, CacheBase, VectorBase data_manager = get_data_manager(CacheBase('sqlite'), VectorBase('faiss', dimension=128))