Similarity Evaluation#
Index
kreciprocal#
- gptcache.similarity_evaluation.kreciprocal.euclidean_distance_calculate(vec_l: numpy.array, vec_r: numpy.array)[source]#
- class gptcache.similarity_evaluation.kreciprocal.KReciprocalEvaluation(vectordb: gptcache.manager.vector_data.base.VectorBase, top_k: int = 3, max_distance: float = 4.0, positive: bool = False)[source]#
Bases:
gptcache.similarity_evaluation.distance.SearchDistanceEvaluation
Using K Reciprocal to evaluate sentences pair similarity.
This evaluator borrows popular reranking method K-reprocical reranking for similarity evaluation. K-reciprocal relation refers to the mutual nearest neighbor relationship between two embeddings, where each embedding is the K nearest neighbor of the other based on a given distance metric. This evaluator checks whether the query embedding is in candidate cache embeddingβs top_k nearest neighbors. If query embedding is not candidateβs top_k neighbors, the pair will be considered as dissimilar pair. Otherwise, their distance will be kept and continue for a SearchDistanceEvaluation check. max_distance is used to bound this distance to make it between [0-max_distance]. positive is used to indicate this distance is directly proportional to the similarity of two entites. If positive is set False, max_distance will be used to substract this distance to get the final score.
- Parameters
vectordb (gptcache.manager.vector_data.base.VectorBase) β vector database to retrieval embeddings to test k-reciprocal relationship.
top_k (int) β for each retievaled candidates, this method need to test if the query is top-k of candidate.
max_distance (float) β the bound of maximum distance.
positive (bool) β if the larger distance indicates more similar of two entities, It is True. Otherwise it is False.
Example
from gptcache.similarity_evaluation import KReciprocalEvaluation from gptcache.manager.vector_data.faiss import Faiss from gptcache.manager.vector_data.base import VectorData import numpy as np faiss = Faiss('./none', 3, 10) cached_data = np.array([0.57735027, 0.57735027, 0.57735027]) faiss.mul_add([VectorData(id=0, data=cached_data)]) evaluation = KReciprocalEvaluation(vectordb=faiss, top_k=2, max_distance = 4.0, positive=False) query = np.array([0.61396013, 0.55814557, 0.55814557]) score = evaluation.evaluation( { 'question': 'question1', 'embedding': query }, { 'question': 'question2', 'embedding': cached_data } )
- static normalize(vec: numpy.ndarray)[source]#
Normalize the input vector.
- Parameters
vec (numpy.array) β numpy vector needs to normalize.
- Returns
normalized vector.
sequence_match#
- gptcache.similarity_evaluation.sequence_match.euclidean_distance_calculate(vec_l: numpy.array, vec_r: numpy.array)[source]#
- class gptcache.similarity_evaluation.sequence_match.SequenceMatchEvaluation(weights: List[float], embedding_extractor: str, embedding_config=None)[source]#
Bases:
gptcache.similarity_evaluation.similarity_evaluation.SimilarityEvaluation
Evaluate sentence pair similarity using SequenceMatchEvaluation.
- Parameters
weights (List[float]) β List of weights corresponding to each sequence element for calculating the weighted distance.
embedding_extractor (gptcache.embedding.base.BaseEmbedding) β The embedding extractor used to obtain embeddings from the text content.
Example
from gptcache.similarity_evaluation import SequenceMatchEvaluation from gptcache.embedding import Onnx weights = [0.5, 0.3, 0.2] evaluation = SequenceMatchEvaluation(weights, 'onnx') query = { 'question': 'USER: "foo2" USER: "foo4"', } cache = { 'question': 'USER: "foo6" USER: "foo8"', } score = evaluation.evaluation(query, cache)
- static normalize(vec: numpy.ndarray)[source]#
Normalize the input vector.
- Parameters
vec (numpy.array) β numpy vector needs to normalize.
- Returns
normalized vector.
sbert_crossencoder#
- class gptcache.similarity_evaluation.sbert_crossencoder.SbertCrossencoderEvaluation(model: str = 'cross-encoder/quora-distilroberta-base')[source]#
Bases:
gptcache.similarity_evaluation.similarity_evaluation.SimilarityEvaluation
Using SBERT crossencoders to evaluate sentences pair similarity.
This evaluator use the crossencoder model to evaluate the similarity of two sentences.
- Parameters
model β model name of SbertCrossencoderEvaluation. Default is βcross-encoder/quora-distilroberta-baseβ.
Check more please refer to https://www.sbert.net/docs/pretrained_cross-encoders.html#quora-duplicate-questions. :type model: str
Example
from gptcache.similarity_evaluation import SbertCrossencoderEvaluation evaluation = SbertCrossencoderEvaluation() score = evaluation.evaluation( { 'question': 'What is the color of sky?' }, { 'question': 'hello' } )
distance#
- class gptcache.similarity_evaluation.distance.SearchDistanceEvaluation(max_distance=4.0, positive=False)[source]#
Bases:
gptcache.similarity_evaluation.similarity_evaluation.SimilarityEvaluation
Using search distance to evaluate sentences pair similarity.
This is the evaluator to compare two embeddings according to their distance computed in embedding retrieval stage. In the retrieval stage, search_result is the distance used for approximate nearest neighbor search and have been put into cache_dict. max_distance is used to bound this distance to make it between [0-max_distance]. positive is used to indicate this distance is directly proportional to the similarity of two entites. If positive is set False, max_distance will be used to substract this distance to get the final score.
- Parameters
Example
from gptcache.similarity_evaluation import SearchDistanceEvaluation evaluation = SearchDistanceEvaluation() score = evaluation.evaluation( {}, { "search_result": (1, None) } )
exact_match#
- class gptcache.similarity_evaluation.exact_match.ExactMatchEvaluation[source]#
Bases:
gptcache.similarity_evaluation.similarity_evaluation.SimilarityEvaluation
Using exact metric to evaluate sentences pair similarity.
This evaluator is used to directly compare two question from text. If every single character in two questions can match, then this evaluator will return 1 else 0.
Example
from gptcache.similarity_evaluation import ExactMatchEvaluation evaluation = ExactMatchEvaluation() score = evaluation.evaluation( { "question": "What is the color of sky?" }, { "question": "What is the color of sky?" } )
cohere_rerank#
- class gptcache.similarity_evaluation.cohere_rerank.CohereRerank(model: str = 'rerank-english-v2.0', api_key: Optional[str] = None)[source]#
Bases:
gptcache.similarity_evaluation.similarity_evaluation.SimilarityEvaluation
Use the Cohere Rerank API to evaluate relevance of question and answer.
Reference: https://docs.cohere.com/reference/rerank-1
- Parameters
Example
from gptcache.similarity_evaluation import CohereRerankEvaluation evaluation = CohereRerankEvaluation() score = evaluation.evaluation( { 'question': 'What is the color of sky?' }, { 'answer': 'the color of sky is blue' } )
np#
- class gptcache.similarity_evaluation.np.NumpyNormEvaluation(enable_normal: bool = True, question_embedding_function=None)[source]#
Bases:
gptcache.similarity_evaluation.similarity_evaluation.SimilarityEvaluation
Using Numpy norm to evaluate sentences pair similarity.
This evaluator calculate the L2 distance of two embeddings for similarity check. if enable_normal is True, both query embedding and cache embedding will be normalized. Note normalized distance will substracted by maximum distance so it will be positively correlated with the similarity.
- Parameters
enable_normal (bool) β whether to normalize the embedding, defaults to False.
question_embedding_function (function) β optional, a function to generate question embedding
Example
from gptcache.similarity_evaluation import NumpyNormEvaluation import numpy as np evaluation = NumpyNormEvaluation() score = evaluation.evaluation( { 'question': 'What is color of sky?' 'embedding': np.array([-0.5, -0.5]) }, { 'question': 'What is the color of sky?' 'embedding': np.array([-0.49, -0.51]) } )
- static normalize(vec: numpy.ndarray)[source]#
Normalize the input vector.
- Parameters
vec (numpy.array) β numpy vector needs to normalize.
- Returns
normalized vector.
onnx#
- gptcache.similarity_evaluation.onnx.pad_sequence(input_ids_list: List[numpy.ndarray], padding_value: int = 0)[source]#
- class gptcache.similarity_evaluation.onnx.OnnxModelEvaluation(model: str = 'GPTCache/albert-duplicate-onnx')[source]#
Bases:
gptcache.similarity_evaluation.similarity_evaluation.SimilarityEvaluation
Using ONNX model to evaluate sentences pair similarity.
This evaluator use the ONNX model to evaluate the similarity of two sentences.
- Parameters
model (str) β model name of OnnxModelEvaluation. Default is βGPTCache/albert-duplicate-onnxβ.
Example
from gptcache.similarity_evaluation import OnnxModelEvaluation evaluation = OnnxModelEvaluation() score = evaluation.evaluation( { 'question': 'What is the color of sky?' }, { 'question': 'hello' } )
- evaluation(src_dict: Dict[str, Any], cache_dict: Dict[str, Any], **_) float [source]#
Evaluate the similarity score of pair.
- Parameters
src_dict (Dict) β the query dictionary to evaluate with cache.
cache_dict (Dict) β the cache dictionary.
- Returns
evaluation score.
similarity_evaluation#
- class gptcache.similarity_evaluation.similarity_evaluation.SimilarityEvaluation[source]#
Bases:
object
Similarity Evaluation interface, determine the similarity between the input request and the requests from the Vector Store. Based on this similarity, it determines whether a request matches the cache.
Example
from gptcache import cache from gptcache.similarity_evaluation import SearchDistanceEvaluation cache.init( similarity_evaluation=SearchDistanceEvaluation() )
time#
- class gptcache.similarity_evaluation.time.TimeEvaluation(evaluation: str, evaluation_config=None, time_range: float = 86400.0)[source]#
Bases:
gptcache.similarity_evaluation.similarity_evaluation.SimilarityEvaluation
Add time dimension restrictions on the basis of other Evaluation, for example, only use the cache within 1 day from the current time, and filter out the previous cache.
- Parameters
evaluation β Similarity evaluation, like distance/onnx.
evaluation_config β Similarity evaluation config.
time_range β Time range, time unit: s
Example
import datetime from gptcache.manager.scalar_data.base import CacheData from gptcache.similarity_evaluation import TimeEvaluation evaluation = TimeEvaluation(evaluation="distance", time_range=86400) similarity = eval.evaluation( {}, { "search_result": (3.5, None), "cache_data": CacheData("a", "b", create_on=datetime.datetime.now()), }, ) # 0.5