Embedding#
Index
openai#
- class gptcache.embedding.openai.OpenAI(model: str = 'text-embedding-ada-002', api_key: Optional[str] = None, api_base: Optional[str] = None)[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate text embedding for given text using OpenAI.
- Parameters
Example
from gptcache.embedding import OpenAI test_sentence = 'Hello, world.' encoder = OpenAI(api_key='your_openai_key') embed = encoder.to_embeddings(test_sentence)
- to_embeddings(data, **_)[source]#
Generate embedding given text input
- Parameters
data (str) – text in string.
- Returns
a text embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
cohere#
- class gptcache.embedding.cohere.Cohere(model: str = 'large', api_key: Optional[str] = None)[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate text embedding for given text using Cohere.
Example
from gptcache.embedding import Cohere test_sentence = 'Hello, world.' encoder = Cohere(model='small', api_key='your_cohere_key') embed = encoder.to_embeddings(test_sentence)
- to_embeddings(data, **_)[source]#
Generate embedding given text input
- Parameters
data (str) – text in string.
- Returns
a text embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
sbert#
- class gptcache.embedding.sbert.SBERT(model: str = 'all-MiniLM-L6-v2')[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate sentence embedding for given text using pretrained models of Sentence Transformers.
- Parameters
model (str) – model name, defaults to ‘all-MiniLM-L6-v2’.
Example
from gptcache.embedding import SBERT test_sentence = 'Hello, world.' encoder = SBERT('all-MiniLM-L6-v2') embed = encoder.to_embeddings(test_sentence)
- to_embeddings(data, **_)[source]#
Generate embedding given text input
- Parameters
data (str) – text in string.
- Returns
a text embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
fasttext#
- class gptcache.embedding.fasttext.FastText(model: str = 'en', dim: Optional[int] = None)[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate sentence embedding for given text using pretrained models of different languages from fastText.
- Parameters
Example
from gptcache.embedding import FastText test_sentence = 'Hello, world.' encoder = FastText(model='en', dim=100) embed = encoder.to_embeddings(test_sentence)
- to_embeddings(data, **_)[source]#
Generate embedding given text input
- Parameters
data (str) – text in string.
- Returns
a text embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
base#
string#
vit#
- class gptcache.embedding.vit.ViT(model: str = 'google/vit-base-patch16-384')[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate sentence embedding for given text using pretrained models from Huggingface transformers.
- Parameters
model (str) – model name, defaults to ‘google/vit-base-patch16-384’.
Example
import io from PIL import Image from gptcache.embedding import ImageEmbedding def prepare_image(image_data: str = None): if not image_data: image_data = io.BytesIO() Image.new('RGB', (244, 244), color=(255, 0, 0)).save(image_data, format='JPEG') image_data.seek(0) image = Image.open(image_data) return image image = prepare_image() encoder = ImageEmbeddings(model="google/vit-base-patch16-384") embed = encoder.to_embeddings(image)
- to_embeddings(data, **__)[source]#
Generate embedding given text input
- Parameters
data (str) – text in string.
- Returns
a text embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
langchain#
- class gptcache.embedding.langchain.LangChain(embeddings: langchain.embeddings.base.Embeddings, dimension: int = 0)[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate text embedding for given text using LangChain
- Parameters
embeddings (Embeddings) – the LangChain Embeddings object.
dimension (int) – The vector dimension after embedding is calculated by calling embed once by default. If you confirm the dimension, you can assign a value to this parameter to reduce this request.
Example
from gptcache.embedding import LangChain from langchain.embeddings.openai import OpenAIEmbeddings test_sentence = 'Hello, world.' embeddings = OpenAIEmbeddings(model="your-embeddings-deployment-name") encoder = LangChain(embeddings=embeddings) embed = encoder.to_embeddings(test_sentence)
huggingface#
- class gptcache.embedding.huggingface.Huggingface(model: str = 'distilbert-base-uncased')[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate sentence embedding for given text using pretrained models from Huggingface transformers.
- Parameters
model (str) – model name, defaults to ‘distilbert-base-uncased’.
Example
from gptcache.embedding import Huggingface test_sentence = 'Hello, world.' encoder = Huggingface(model='distilbert-base-uncased') embed = encoder.to_embeddings(test_sentence) test_sentence = '什么是Github' huggingface = Huggingface(model='uer/albert-base-chinese-cluecorpussmall') embed = huggingface.to_embeddings(test_sentence)
- to_embeddings(data, **_)[source]#
Generate embedding given text input
- Parameters
data (str) – text in string.
- Returns
a text embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
data2vec#
- class gptcache.embedding.data2vec.Data2VecAudio(model_name='facebook/data2vec-audio-base-960h')[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate audio embedding for given audio using pretrained models from Data2Vec.
- Parameters
model (str) – model name, defaults to ‘facebook/data2vec-audio-base-960h’.
Example
from gptcache.embedding import Data2VecAudio audio_file = 'test.wav' encoder = Data2VecAudio(model='facebook/data2vec-audio-base-960h') embed = encoder.to_embeddings(audio_file)
- to_embeddings(data, **_)[source]#
Generate embedding given text input
- Parameters
data (str) – path to audio file.
- Returns
a text embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
uform#
- class gptcache.embedding.uform.UForm(model: Union[str, uform.TritonClient] = 'unum-cloud/uform-vl-english', embedding_type: str = 'text')[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate multi-modal embeddings using pretrained models from UForm.
- Parameters
Example
from gptcache.embedding import UForm test_sentence = 'Hello, world.' encoder = UForm(model='unum-cloud/uform-vl-english') embed = encoder.to_embeddings(test_sentence) test_sentence = '什么是Github' encoder = UForm(model='unum-cloud/uform-vl-multilingual') embed = encoder.to_embeddings(test_sentence)
- to_embeddings(data: Any, **_)[source]#
Generate embedding given text input or a path to a file.
- Parameters
data (str) – text in string, or a path to an image file.
- Returns
an embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
onnx#
- class gptcache.embedding.onnx.Onnx(model='GPTCache/paraphrase-albert-onnx')[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate text embedding for given text using ONNX Model.
Example
from gptcache.embedding import Onnx test_sentence = 'Hello, world.' encoder = Onnx(model='GPTCache/paraphrase-albert-onnx') embed = encoder.to_embeddings(test_sentence)
- to_embeddings(data, **_)[source]#
Generate embedding given text input.
- Parameters
data (str) – text in string.
- Returns
a text embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
rwkv#
- class gptcache.embedding.rwkv.Rwkv(model: str = 'sgugger/rwkv-430M-pile')[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate sentence embedding for given text using RWKV models.
- Parameters
model (str) – model name, defaults to ‘sgugger/rwkv-430M-pile’. Check https://huggingface.co/docs/transformers/model_doc/rwkv for more avaliable models.
Example
from gptcache.embedding import Rwkv test_sentence = 'Hello, world.' encoder = Rwkv(model='sgugger/rwkv-430M-pile') embed = encoder.to_embeddings(test_sentence)
- to_embeddings(data, **_)[source]#
Generate embedding given text input
- Parameters
data (str) – text in string.
- Returns
a text embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
timm#
- class gptcache.embedding.timm.Timm(model: str = 'resnet18', device: str = 'default')[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate image embedding for given image using pretrained models from Timm.
- Parameters
model (str) – model name, defaults to ‘resnet34’.
Example
import requests from io import BytesIO from gptcache.embedding import Timm encoder = Timm(model='resnet50') embed = encoder.to_embeddings('path/to/image')
- to_embeddings(data, skip_preprocess: bool = False, **_)[source]#
Generate embedding given image data
- preprocess(image_path)[source]#
Load image from path and then transform image to torch.tensor with model transformations.
- Parameters
image_path (str) – image path.
- Returns
an image tensor (without batch size).
- property dimension#
Embedding dimension.
- Returns
embedding dimension
paddlenlp#
- class gptcache.embedding.paddlenlp.PaddleNLP(model: str = 'ernie-3.0-medium-zh')[source]#
Bases:
gptcache.embedding.base.BaseEmbedding
Generate sentence embedding for given text using pretrained models from PaddleNLP transformers.
- Parameters
model (str) – model name, defaults to ‘ernie-3.0-medium-zh’.
Example
from gptcache.embedding import PaddleNLP test_sentence = 'Hello, world.' encoder = PaddleNLP(model='ernie-3.0-medium-zh') embed = encoder.to_embeddings(test_sentence)
- to_embeddings(data, **_)[source]#
Generate embedding given text input
- Parameters
data (str) – text in string.
- Returns
a text embedding in shape of (dim,).
- property dimension#
Embedding dimension.
- Returns
embedding dimension