GenAI Stack (old)
v0.2.0
v0.2.0
  • Getting Started
    • 💬Introduction
    • 🚀Quickstart with colab
    • 📘Default Data Types
    • 🪛Installation
  • Components
    • ✨Introduction
    • 🚜ETL
      • 🔥Quickstart
      • 🦜Langchain
      • 🦙LLama Hub
    • 🌱Embeddings
      • 🔥Quickstart
      • 🦜Langchain
      • 📖Advanced Usage
    • 🔮Vector Database
      • 🔥Quickstart
      • 📦Chromadb
      • 📦Weaviate
      • 📖Advanced Usage
    • 📚Prompt Engine
      • 🔥Quickstart
      • 📖Advanced Usage
    • 📤Retrieval
      • 🔥Quickstart
      • 📖Advanced Usage
    • ️️️🗃️ LLM Cache
      • 🔥Quickstart
    • 📦Memory
      • 🔥Quickstart
      • 📖Advanced Usage
    • 🦄LLMs
      • OpenAI
      • GPT4All
      • Hugging Face
      • Custom Model
  • Advanced Guide
    • 💻GenAI Stack API Server
    • 🔃GenAI Server API's Reference
  • Example Use Cases
    • 💬Chat on PDF
    • 💬Chat on CSV
    • 💬Similarity Search on JSON
    • 📖Document Search
    • 💬RAG pipeline
    • 📚Information Retrieval Pipeline
  • 🧑CONTRIBUTING.md
Powered by GitBook
On this page
  1. Components
  2. Embeddings

Advanced Usage

Embedding functions are rarely used alone.

Its used in two way

  • In ETL and Vectordb to convert all the raw data extracted by the ETL into embeddings to be stored in the Vectordb. It also helps in converting the query to embeddings.

  • In Retrieval it converts the user query into an embedding to search against the other data in the vectordb index

Usage

Imports:

from genai_stack.etl.langchain import LangchainETL
from genai_stack.stack.stack import Stack
from genai_stack.vectordb.chromadb import ChromaDB
from genai_stack.etl.utils import get_config_from_source_kwargs
from genai_stack.embedding.utils import get_default_embeddings

Configuration:

config = {
    "name": "HuggingFaceEmbeddings",
    "fields": {
        "model_name": "sentence-transformers/all-mpnet-base-v2",
        "model_kwargs": {"device": "cpu"},
        "encode_kwargs": {"normalize_embeddings": False},
    }
}

Using with ETL

Once you have defined your configuration as a Python dictionary, you can use it with the LangchainEmbedding.from_kwargs() method:

embeddings = LangchainETL.from_kwargs(**config)

etl = LangchainETL.from_config(get_config_from_source_kwargs("pdf", "path/to/pdf"))

# Connect the ETL, Embedding and Vectordb component using Stack
stack = Stack(model=None, embedding=get_default_embeddings(), etl=etl, vectordb=ChromaDB.from_kwargs())

etl.run()

Using with retriever

# Initialise all your components
etl = LangchainETL.from_kwargs(name="CSVLoader", fields={"file_path": "addresses.csv"})
embedding = LangchainEmbedding.from_kwargs(**config)
chromadb = ChromaDB.from_kwargs()
llm = OpenAIGpt35Model.from_kwargs(parameters={"openai_api_key": "<OPENAI-API-KEY>"})
prompt_engine = PromptEngine.from_kwargs(should_validate=False)
retriever = LangChainRetriever.from_kwargs()
memory = ConversationBufferMemory.from_kwargs()

# Initialise your stack by connecting the components end-to-end
stack = Stack(
    etl=etl,
    embedding=embedding,
    vectordb=chromadb,
    model=llm,
    prompt_engine=prompt_engine,
    retriever=retriever,
    memory=memory
)

# Query to get RAG based results
response = retriever.retrieve("Where does John live?")

PreviousLangchainNextVector Database

Last updated 1 year ago

🌱
📖