๐Ÿ“–Document Search

Open In Colab

Requirements

  • Python environment with necessary packages installed.

  • GenAI Stack library and its dependencies.

  • Weaviate, an open-source vector search engine, installed and configured if it is used as the underlying VectorDB.

  • A dataset or source documents for indexing and searching.

from genai_stack.embedding.langchain import LangchainEmbedding[doc-search.ipynb](doc-search.ipynb)
from genai_stack.etl.langchain import LangchainETL
from genai_stack.stack.stack import Stack
from genai_stack.vectordb import ChromaDB
from genai_stack.vectordb.weaviate_db import Weaviate

Search single document

Search a single document using etl and vector database.

embedding = LangchainEmbedding.from_kwargs(
    name="HuggingFaceEmbeddings",
    fields={
        "model_name": "sentence-transformers/all-mpnet-base-v2",
        "model_kwargs": {"device": "cpu"},
        "encode_kwargs": {"normalize_embeddings": False},
    }
)
chromadb = ChromaDB.from_kwargs()
etl = LangchainETL.from_kwargs(
    name="PyPDFLoader", fields={
        "file_path": "<your_file>.pdf",
    }
)
stack = Stack(
    model=None,
    embedding=embedding,
    vectordb=chromadb,
    etl=etl
)

output

Search multiple documents

Search a directory containing documents. Returns a list of documents with path and page number.

output

output

output

Checkout the notebook here for more details.

Last updated