WebPage QA#
This is an example of a QA example, reference llama_index example.
It works by first obtaining a list of URLs from the user, then extracting relevant information from the pages associated with those URLs.Next, a vector index is created based on this information, and finally, the program is able to answer questions using the indexed information.
Before running this example, please set OPENAI_API_KEY environment param.
Init GPTCache#
import hashlib
from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
def get_hashed_name(name):
return hashlib.sha256(name.encode()).hexdigest()
def init_gptcache(cache_obj: Cache, llm: str):
hashed_llm = get_hashed_name(llm)
init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")
gptcache_obj = GPTCache(init_gptcache)
Load WebPage Data#
from llama_index import (
GPTVectorStoreIndex,
ServiceContext,
LLMPredictor,
SimpleWebPageReader,
)
loader = SimpleWebPageReader(html_to_text=True)
documents = loader.load_data(urls=["https://milvus.io/docs/overview.md"])
Build Index and Get Query Engine#
index = GPTVectorStoreIndex.from_documents(
documents,
service_context=ServiceContext.from_defaults(
llm_predictor=LLMPredictor(cache=gptcache_obj)
),
)
query_engine = index.as_query_engine()
Query#
%%time
print(query_engine.query("What is milvus?"))
Milvus is an open source vector database for building and managing large-scale AI applications. It provides fast and accurate vector search capabilities, enabling users to quickly search and retrieve vectors from large datasets.
CPU times: user 1.21 s, sys: 206 ms, total: 1.42 s
Wall time: 9.69 s
%%time
print(query_engine.query("What's milvus?"))
Milvus is an open source vector database for building and managing large-scale AI applications. It provides fast and accurate vector search capabilities, enabling users to quickly search and retrieve vectors from large datasets.
CPU times: user 784 ms, sys: 17.8 ms, total: 801 ms
Wall time: 940 ms