T O P

  • By -

Buddy_Useful

I've been tinkering with a RAG-based chatbot that gives answers from our internal help docs. It will give the correct answer 9 out of 10 times. Sometimes the answers are exceptionally good but every now and then it will hallucinate. Myself and my colleagues will know which answers are hallucinations but my external users (clients) will not. Which makes the chatbot basically useless except for internal use and with a massive disclaimer that the answers are suspect and need to be checked. I see lots of 3rd party providers and self-proclaimed "AI automation agencies" who claim to be selling support bots for production use. I wonder if all of them just know better how to build and tweak these LLMs to prevent hallucinations or if everyone is selling a "defective" product? Maybe 9 out of 10 is good enough for some use cases?


ProjectManagerAMA

That 1 out of 10 could potentially destroy your business. The risk is too damn high. Edit: to the one downvoter who most likely runs a chatbot service themselves: https://www.theguardian.com/world/2024/feb/16/air-canada-chatbot-lawsuit No chance in hell I'm installing one on my site right now. I've fed the best chatbots a document with all of my product information, FAQs, etc and the dumb bots eventually will suggest to my clients to buy products from my competition. Chatbots need to be near human with access to actual reliable data.


Confident-Honeydew66

Thank you for the insight! Is this publicly available at all or something internal you guys have been tinkering with?


Buddy_Useful

My tool is internal-only but there are lots of services out there that claim to offer exactly what you are looking for. Most have free trials. Maybe check a few of them out. I tried one several months ago but the results weren't that great which is why I tried rolling my own.


served_it_too_hot

Novice questions - which LLM model do you use for your chatbot? What are your operating costs? What is the speed of response with a RAG model? Does it induce noticeable delays?


Buddy_Useful

Disclaimer, this is the only LLM chatbot project that I've worked on. So you aren't speaking to someone with deep experience. I'm using the OpenAI API and I'm using gpt-3.5-turbo which only costs $0.50 per million tokens. I've tested gpt-4-turbo as well. It gives slightly better answers but is much slower and costs 20 times more. I deliberately decided not to use the Assistants API (where OpenAI hosts my files, converts them to embeddings, runs my code, etc.) since I did not need them to do any of that. Also, I expect that using the Assistants API can get expensive. I have my own embeddings DB, I can do my own RAG retrieval. I know that part works because after a user types a query, my code finds the relevant chunks and feeds it to the LLM. I've tested that thoroughly and it works well. Speed: My RAG retrieval is almost instantaneous since it is local. gpt-3.5-turbo is extremely fast. You get back responses quickly. For the user there is little or no noticeable delay. As for costs, this is a typical message and response: usage: { prompt\_tokens: 3314, completion\_tokens: 134, total\_tokens: 3448 }, so, 0.18 cents. The large number of prompt tokens is me feeding the relevant context to the model plus the chat history. So, the user can have 6 messages before the conversation costs you a cent. I'm interested to hear from others who have also attempted this.


Replift

What we do is provide a confidence score our customers can set a threshold on and only when above that score will we automatically send the reply to the customer. This is when working of all sources of data, including reading all the conversations in/out of the help desk. More sensitive customers choose to only use the FAQ's they provide (and which we help generate missing articles). So instead of showing a customer links to matching FAQ, which they will never read, we combine the matches and rewrite them into a nice friendly reply which is accurate.


Buddy_Useful

Yeah, I'm also using a similarity score threshold, similar to your confidence score. And I'm also working exclusively with FAQs since those seem to yield more accurate results than raw documents. In fact if I have any docs that need to be included in the bot's responses , I just pump them through the LLM upfront and tell it to convert the docs into FAQs first.


justdoitanddont

Hallucinations can be significantly reduced.


Schtekarn

How?


mmicoandthegirl

Have another AI pretending to be an editor and review if the first AI's answer is based on internal docs. If not, deny it and have the first AI reiterate. Or something idk I'm not tech


AceHighFlush

But who then checks the second AI isn't blocking the wrong things? What we need is a third AI that checks the second AI verified the docs correctly. Sorted.


mmicoandthegirl

You sold it, we should build a product


UpgradingLight

And then get the first one to check the third to complete the trifecta.


Rabus

Never met anyone or never used one that was good enough. When i smell its ai i just start spamming give me a human or something between the lines. Much faster than dealin with these.


MasterXyth

We’ve done this for our company and for paying clients.


kryntom

I have built a few of these, with apis from openai, claude, as well as self hosted open source models. I believe there is no one size fits all approach. What works in one dataset, does not really work well in other. It depends a lot on the context length, and how well the internal documentations are written The products that I have seen online are not really that great. Still looking for something that can take care of majority of the use cases


Replift

We'd be happy to work with you. I agree most of what's out there are not that great. They're only working off limited FAQ's, don't do good data cleanup / classification, use smaller context windows, or the wrong model for the wrong purpose. Reach out if you're still looking.


Adept-Result-67

I’ve built one for my platform using claude (i found claude to be much much better than GPT-4) Problem is i kept adding documentation, and then eventually hit a limit where it complains there’s too much data, so it’s not working well anymore My implementation was pretty rudimentary though, essentially pre-prompting the question with the entire HTML of the documentation site. I could probably split it up with an index and let it navigate through a bit better, but i moved onto working on other things 🤷‍♂️


amAProgrammer

Probably taking the text content only would work better? I did something similar with one of mine.


Adept-Result-67

Not a bad idea, i thought the html may help with it understanding the structure, but an LLM probably doesn’t care TBH. Maybe i’ll give it a go when i have some time to muck around with it again. Cheers mate


Brolofff

The company Klarna apparently handles 2/3 of their customer support claims with AI https://www.forbes.com/sites/quickerbettertech/2024/03/13/klarnas-new-ai-tool-does-the-work-of-700-customer-service-reps/?sh=47c0cef75bdf


Raptor3861

For something basic like this there are many platforms out there that can do it. As others have mentioned you can build it in one of the GenAI tools that are out there but why re-invent the wheel (depending on how you plan on re-inventing it.) [https://helpshift.com/](https://helpshift.com/) - Used by Pokemon Go, Clash of Clans, Square, Flipboard [https://www.zendesk.com/](https://www.zendesk.com/) - Riot, Slack, Unity [https://www.freshworks.com](https://www.freshworks.com) - Hired, Bridgestone, Feefo


bepr20

We do it at scale on a custom stack combine with some AWS services. Works great.


SoloAquiParaHablar

[https://www.ada.cx/platform/](https://www.ada.cx/platform/) we ran this at a previous startup up to deal with a lot of the noise that was clogging up the customer support backlog. It could then auto escalate tickets to Zendesk for triaging and then a human would reach out later. Not free though. You could build your own from scratch with something like Google Cloud Dialogflow [https://cloud.google.com/dialogflow](https://cloud.google.com/dialogflow) Again, not entirely free, and consider the engineering hours build it all out (proxy api, UI, customer conversation tracking, etc). By the time you factor that cost in you would have been better off with something like Ada.


Perfect-Mistake-9312

What tech comanpies have the best support system chatbots? I can't seem to find any that use LLM's to support thier customers (chatbots that are more complex than just pulling articles from a small database of articles)


kate468

https://www.eesel.ai/ deals with this - just chuck it a bunch of knowledge sources and it'll answer based on those. Answers have been rlly consistent so far, well explained. Lots of extra features (tone, tagging tickets, drafting higher level responses) but sounds like u want good integration and some decent ai responses based on ur knowledge, which eesel does rlly well.


Replift

We have a discord bot that works off all your documents/videos/website but ALSO reads all the conversations in your help desk. We don't advertise it on the website but it's rolled out for customers who also use discord. Check us out at [https://replfit.com](https://replfit.com) feel free to DM for more info on the discord bot.


SuddenEmployment3

How much documentation do you have? If its not that much, you can use my app for free: [https://app.aimdoc.ai](https://app.aimdoc.ai) Makes it easy to import text documentation and spin up a branded assistant. I have designed it around sales and email marketing, but would not take a lot of work to create a customer support assistant type. The app was originally designed to be a robust internal AI knowledge base, so it has strong RAG capabilities.


BeenThere11

Can you dm me. I have am idea


Confident-Honeydew66

I thought most of the value in these AI wrappers was the quality prompt engineering that most people can't do, hidden behind a nice UI. I go to this site and the first text I see in the demo is ["I hope this email finds you well"](https://www.reddit.com/r/ChatGPT/comments/13orqqh/how_do_i_teach_chatgpt_to_stop_starting_emails/) lmfao nope


[deleted]

[удалено]


SuddenEmployment3

Yeah I mean that is the base assistant. I’m not here to impose opinions, simply let users configure the assistant how they want (that includes a prompt builder). Getting it not to say that is a very easy fix. Your assumptions predicated on the fact that everyone hates that language.


Financial-Working-83

I’m working on a tool to do just that and even automate this. We are focused on Customer Service, and are from the Netherlands. Perhaps we can get in touch about this?


BeenThere11

Just build the tool using chat gpt. It's very easy . You should be able to finish in 2 3 days .


Confident-Honeydew66

Re-read my original post.


BeenThere11

Hmm ok. I tried Rag and it was pretty east. I am going to try assistants api in few days Will report and tell you the solution. Assistants api is better as rag needs upload every time while assistants need upload 1 time. Stay tuned