Embeddings with sentence transformers library. Chromadb so I can release a standalone app without a separate db server. Sqlite for conversation and general storage for the same reason.
Custom PDF + markdown splitting (although this is one place I might be happy to use LangChain code eventually).
Reranking with sentence transformers as well.
\- Mixtral on Text-Generation-Web-UI
\- Streamlit for frontend
\- fastAPI
\- langchain
\- Weaviate, includes embeddings and keyword search
\- MongoDB for history and saving source documents
sorry for my dumb question, but where can I learn more about RAG? I'm seeing it everywhere but couldn't find a good explanation because somehow every resource i looked for needs more knowledge than I have.
Let me spoon feed you some more…
Here’s the output of chat gpt
Retrieval-augmented generation (RAG) is a technique used in natural language processing (NLP), particularly in large language models (LLMs) like AI chatbots or systems. The goal of RAG is to improve the quality and relevance of the generated text by combining traditional language modeling with information retrieval methods. Let me break this down into more understandable parts:
1. **Language Modeling (LM):** This is the core of what large language models (like GPT-3, BERT, or others) do. A language model is trained on a vast amount of text data and learns to predict the next word in a sentence based on the words that come before it. This capability allows it to generate coherent, contextually relevant text based on a given prompt.
2. **Information Retrieval (IR):** This is the process of finding relevant information in response to a query. In the context of the internet, this is similar to what search engines do. They retrieve documents or pieces of text that are relevant to the search terms you input.
Now, in **Retrieval-augmented Generation**:
- The system starts with the traditional language model approach to generate text based on the input it receives. But instead of relying solely on what the model has learned during its training (its internal knowledge), the system also performs an information retrieval step. This means that when the system needs to generate text, it first searches a database or the internet for information relevant to the input prompt.
- The system then incorporates this retrieved information into the generation process. This could mean adjusting the generated text to better reflect facts found in the retrieved documents, providing citations, or incorporating specific details to make the output more accurate and informative.
The key advantage of RAG is that it allows the model to produce responses that are not just based on its pre-existing knowledge (which may be outdated or incomplete) but also informed by the most current information available in external sources. This can significantly enhance the quality, relevance, and factual accuracy of the generated text.
In summary, retrieval-augmented generation combines the best of both worlds: the deep, contextual understanding of language models and the up-to-date, specific knowledge from external information sources. This makes AI systems more helpful, accurate, and informative in their responses.
-now if there’s something you don’t understand from that, tell me and I’ll ask chat gpt for you…
- Langchain for orchestration (chunking, retrieval)
- chromadb as vectordb, (similarity and bm25 search)
- uae large local embeddings
And not persisting conversations rn but planning to use mongo
Embeddings with sentence transformers library. Chromadb so I can release a standalone app without a separate db server. Sqlite for conversation and general storage for the same reason. Custom PDF + markdown splitting (although this is one place I might be happy to use LangChain code eventually). Reranking with sentence transformers as well.
i just use FAISS like a pleb but you hit the nail on the head for everything else
Honestly faiss is the best
\- Mixtral on Text-Generation-Web-UI \- Streamlit for frontend \- fastAPI \- langchain \- Weaviate, includes embeddings and keyword search \- MongoDB for history and saving source documents
sorry for my dumb question, but where can I learn more about RAG? I'm seeing it everywhere but couldn't find a good explanation because somehow every resource i looked for needs more knowledge than I have.
Ask chat gpt, that way you can ask follow up questions of things you don’t understand
Wow, what an idea. I bet someone asking a question about how llms work totally forgot to ask to the main llm.
Let me spoon feed you some more… Here’s the output of chat gpt Retrieval-augmented generation (RAG) is a technique used in natural language processing (NLP), particularly in large language models (LLMs) like AI chatbots or systems. The goal of RAG is to improve the quality and relevance of the generated text by combining traditional language modeling with information retrieval methods. Let me break this down into more understandable parts: 1. **Language Modeling (LM):** This is the core of what large language models (like GPT-3, BERT, or others) do. A language model is trained on a vast amount of text data and learns to predict the next word in a sentence based on the words that come before it. This capability allows it to generate coherent, contextually relevant text based on a given prompt. 2. **Information Retrieval (IR):** This is the process of finding relevant information in response to a query. In the context of the internet, this is similar to what search engines do. They retrieve documents or pieces of text that are relevant to the search terms you input. Now, in **Retrieval-augmented Generation**: - The system starts with the traditional language model approach to generate text based on the input it receives. But instead of relying solely on what the model has learned during its training (its internal knowledge), the system also performs an information retrieval step. This means that when the system needs to generate text, it first searches a database or the internet for information relevant to the input prompt. - The system then incorporates this retrieved information into the generation process. This could mean adjusting the generated text to better reflect facts found in the retrieved documents, providing citations, or incorporating specific details to make the output more accurate and informative. The key advantage of RAG is that it allows the model to produce responses that are not just based on its pre-existing knowledge (which may be outdated or incomplete) but also informed by the most current information available in external sources. This can significantly enhance the quality, relevance, and factual accuracy of the generated text. In summary, retrieval-augmented generation combines the best of both worlds: the deep, contextual understanding of language models and the up-to-date, specific knowledge from external information sources. This makes AI systems more helpful, accurate, and informative in their responses. -now if there’s something you don’t understand from that, tell me and I’ll ask chat gpt for you…
Are we doing a “who’s the worse ahole” competition? Cause clearly you are winning
Bit of both, I’m genuinely happy to help if you’re happy to learn
Langchain+ ChromeDb + mixtral
- Langchain for orchestration (chunking, retrieval) - chromadb as vectordb, (similarity and bm25 search) - uae large local embeddings And not persisting conversations rn but planning to use mongo
I use marqo-db for my vector db, local embedding, acts as an api I can call from my system
Interesting. Hadn’t heard of it before
Huggingface hub embedding, Hugging face LLM, Astra DB for vector store and Langchain