430.10 a

Description: Implementation RAG with FAISS and Langchain

#400 #ML_Engineer_Basic#430 #ML_Development_Tools#430.10 #Streamlit#430.10 a #LLM_RAG_Application_Using_Langchain_and_FAISS

LLM_RAG_Application_Using_Langchain_and_FAISS

OpenAI and RAG With FAISS by LangChain

pip install -q openai langchain tiktoken faiss-cpu pydantic docarray langchain-openai langchain-community boto3 langchain_experimental

import os


from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.prompts import ChatPromptTemplate
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader


from langchain.schema import HumanMessage, SystemMessage

def get_text_response (query):
    
    #Step 1: Loading and splitting documents: 
    loader = TextLoader("./abc.txt")
    documents = loader.load()
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    docs = text_splitter.split_documents(documents)
    
    #Step 2: Embedding generation and FAISS database creation:
    embeddings = OpenAIEmbeddings()
    db = FAISS.from_documents(docs, embeddings)
    
    
    #Step 3: Query the FAISS db to retrieve relevant document content
    query=query
    retriever = db.as_retriever()
    retrieved_docs = retriever.invoke(query)
    retrieved_context = retrieved_docs[0].page_content  # Assume using first doc's content
    
    # Step 4: Combine the retrieved document content with the original query
    combined_query = f"Context: {retrieved_context}\nQuery: {query}"
    
    # Step 5: Use the combined query as a prompt for the ChatOpenAI model
    chat = ChatOpenAI()  # Ensure the ChatOpenAI instance is correctly initialized
    response = chat.invoke([
        SystemMessage(content="Please use the following information to assist."),
        HumanMessage(content=combined_query)
    ])
    
    return response.content

Streamlit

#Streamlit

pip install streamlit
streamlit run <file_name>.py [--server.port 8080]

Error

kwargs error openai

...
  Parameters {'temperature'} should be specified explicitly. Instead they were passed in as part of `model_kwargs` parameter. (type=value_error)

bedrock

ubuntu:~/environment $ python RAG.py 
...
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: Malformed input request: 4 schema violations found, please reformat your input and try again.

During handling of the above exception, another exception occurred:

...
ValueError: Error raised by bedrock service: An error occurred (ValidationException) when calling the InvokeModel operation: Malformed input request: 4 schema violations found, please reformat your input and try again.

There is error in bedrock because that return output different

print(get_text_response("what is your hobby?" , llm_o))
print(get_text_response("what is youre hobby?" , llm_b))

openai

ubuntu:~/environment $ python RAG.py 
I'm here to help you with any questions or tasks you have related to programming, data visualization, or any other topic you need assistance with. How can I support you today?

Bedrock

Traceback (most recent call last):
  File "/home/ubuntu/environment/RAG.py", line 86, in <module>
    print(get_text_response("what is youre hobby?" , llm_b))
  File "/home/ubuntu/environment/RAG.py", line 77, in get_text_response
    return response.content
AttributeError: 'str' object has no attribute 'content'

The important difference between openai and bedrock. response type

bedrock returns a simple string
openai returns additional information, such as multiple settings or names.
Need to extract content from dict_key among them

ubuntu:~/environment $ python RAG.py 
Response type: <class 'str'>
respone of Bedrock
Params: {}
Response type: <class 'langchain_core.messages.ai.AIMessage'>
Response attributes: dict_keys(['content', 'additional_kwargs', 'type', 'name', 'id', 'example'])
respone of client=<openai.resources.chat.completions.Completions object at 0x7ff4ee8cb820> async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7ff4ee231240> openai_api_key=SecretStr('**********') openai_proxy=''

Different AWS Region

  File "/opt/homebrew/lib/python3.11/site-packages/langchain_community/llms/bedrock.py", line 451, in _prepare_input_and_invoke
    raise ValueError(f"Error raised by bedrock service: {e}")
ValueError: Error raised by bedrock service: Could not connect to the endpoint URL: "https://bedrock-runtime.ap-northeast-2.amazonaws.com/model/amazon.titan-text-express-v1/invoke"

When I succeed default regioin us-east-1, another env conda using apo-northeast-2 default
Explicitly declaring arguments
before
llm = Bedrock(model_id = model)
after
llm = Bedrock(model_id = model, region_name="us-east-1")

pip install langchain_experimental

for from langchain_experimental.text_splitter import SemanticChunker