This showcases on how to use the model along with vectordb and retrieval to make the model converse on top of contextual data
There are two ways we can implement this:
Python
CLI
Python Implementation:
==> With default supported ETLs
from genai_stack.model import OpenAIGpt35Modelmodel = OpenAIGpt35Model.from_kwargs( fields={"openai_api_key": "Paste your Open AI key"})# This does the ETL underneath but supports only the default 5 data typesmodel.add_source("csv", "valid_csv_path_or_url")model.predict("<Some question whose answer is could be found in the csv>")
For more context on default ETLs check the doc here.
==> With your own custom ETL, Retriever and Vectordb
from genai_stack.model import OpenAIGpt35Modelfrom genai_stack.etl import LangLoaderEtl from genai_stack.retriever import LangChainRetrieverfrom genai_stack.vectordb.chromadb import ChromaDBconfig ={"source":{"name":"PyPDFLoader","fields":{"file_path":"/your/pdf/path"}},}# Initialise vectordb vectordb = ChromaDB.from_kwargs(class_name ="genai-stack")# ETL Processetl = LangLoaderEtl.from_kwargs(vectordb=vectordb, **config)etl.run()# Setup the model and retriever retriever = LangChainRetriever.from_kwargs(vectordb = vectordb)model = OpenAIGpt35Model.from_kwargs( retriever = retriever, fields={"openai_api_key": "Paste your Open AI key"})model.predict("<Some question whose answer is could be found in the pdf>")
For more context refer to each component's documentation
CLI Implementation
You can write a etl.json for the etl process and model.json to perform inference on the extracted data
Important Note: The vectordb section should be the same for the etl.json and model.json.
Explanation: During the ETL process all the data are extracted and stored into the vectordb as embeddings on which we can perform semantic search. So when we are using the model on top of contextual data we need to specify the source of the contextual data.
The source of contextual data in our case is the vectordb into which the ETL contents were loaded into . So that's why the vectordb content should be the same for both the model.json and etl.json