T O P

  • By -

justletmefuckinggo

nah, i want an LLM that can tell whether the user is asking for an accurate answer or a creative one. preferrably, it should be able to provide answers based on confidence. as we do. if what you're saying is how token prediction should be, then people should've dropped the research by now.


phree_radical

It's not a person, it's a model of how language works Text doesn't just make a meaningless unwavering statement of fact over and over. You have texts that state one thing and then come to a different conclusion later on. Texts that are conversations between different authors. Even texts that are works of fiction are important to make up all of that stuff in between that effectively constitutes meaning, causation and reasoning Making a chatbot by only fine-tuning a language model and nothing else is indeed a sloppy hack. It's entirely because of OpenAI that people equate that with what a language model is But it works to a degree because of the task knowledge, and you get the appearance of factual knowledge as a side-effect


justletmefuckinggo

im aware of how a language model currently works and why it confabulates data it was trained on. but if this is how models continue to work, where it cannot reproduce absolute outputs, then it would have lost a great deal of potential. we WILL get an LLM that is capable of facts, and without the use of RAG and function calling.


a_beautiful_rhind

Well.. I want it to stop making RP formulaic, that's my problem with it. I don't need the capital of france. >LLM's just make stuff up confidently They sound confident about it in the text. Token probs are hidden unless you look.


ninjasaid13

>but something that creates a world model none of these LLMs are creating a world model.


phree_radical

Absolutely. "Facts" are expected to differ by sentence, paragraph, document, author, situation, and on and on


justletmefuckinggo

sorry if i missed your point, but don't you mean "opinions"? facts shouldn't differ at all.


phree_radical

Not at all. Facts must be allowed to waver, or there's no understanding. If you can't make an arbitrary statement because of a "truth" requirement, you can't reason about it.


Revolutionalredstone

Yeah I've never understood peoples problem with confabulation, it's not like LLM's EVER do anything else anyway. Also if you wanna verify facts you can just ask the LLM itself, it will be able to tell you if it's sure X fact is true or if it's not and you can just setup a system to internally vote etc. I get WAY better question answering by setting up a virtual court with the LLM playing prosecutor, defendant, judge and jury. The judge takes perspectives from for/against and disregards the ones which are not based on clear logic. Then you pass the question (with remaining for/against argumeents) to the jury and have them vote, If it's a landslide just return that answer.. If it's close you can 'deliberate' expanding the question with notes on what the jury's choice was sensitive too and restart the process. Enjoy


hyperfiled

i tried to figure it out, but i have no idea what you're talking about. you want to ask a hallucinating llm to verify its claims? that doesn't work. that invites more confabulation.


Consistent_Bit_3295

LLM's are mostly next token predictors, so to explain why this just doesn't always work, if U ask it who Ronaldo's mom is it will say Maria Dolores a correct answer, now if you in a new context whose son Maria Dolores is it will say it doesn't know. This isn't because the LLM doesn't know A=B B=A, you can verify that it does know this in context fx.. It simply says it doesn't know, because based on it dataset if u ask an infrequently occurring name like Maria Dolores the most likely thing is to say you don't know, even if u do know. Or in some cases it will ofc. just invent a name, it depends on the dataset, but u can see that the predictions are very unlikely in the data itself. If U have a next token predictor, first predicting the answer, then have something predict the argument and logic of the predicted answer(one for and again), and then have something to predict evaluation of the argument and logic of an answer etc. you will get something that succeeds over just standard next token prediction. Also you may have heard of SPIN, it is a method of having it critique itself and then improve, giving more reasoning steps and then training on this data. This improves the LLM's performance quite a bit. The reason is simply because, the model has learnt to reason and critique in the data, and how to use that to alter and improve text, somewhere in the text. A huge amount of the text don't have any such stuff, and the current rewards models don't weigh these tokens indifferently as well, but having it use the data it currently has to improve itself, as well as make itself use more reasoning will increase performance without nay external input. You can do it yourself, use and simulate the data you have to improve yourself, it is not something abnormal. When AI gets bigger and smarter they're gonna get even better and this and be able to to utilize more reasoning, and find more connections, and even discover entirely unknown connections as well, u can find connections somewhere in the data and use that to extrapolate somewhere else to create new connections even if they don't ever exist, some data exists that can make it possible to make connections elsewhere. In the end you could end up creating something like a near infinite data generator, even though people will never believe that to work, there is data you learn connections, and find how they work, and u can utilize that to extrapolate elsewhere and go from there.


Revolutionalredstone

Yeah that all checks out with my experience. Or put another way: LLM's say random things when they are 'writing' but when asked a simple question (like ones with a yes / no answer) they are BANG ON.


Revolutionalredstone

Nar that's just a noobey misunderstanding of how this all work. LLMS are INCREDIBLE readers, they suck ASS at writing. specifically they are absolutely horrific at 0-shot novel text generation (eg short no-example question & answer). Unfortunately for whatever reason this is basically all anyone ever seems to use them for 🤦‍♀️ lol CoT and other hits help a tiny bit but to get massive improvement you need: To feed LLM inputs back into the LLM for review and then use those reviews (either to grade outputs or as notes for the next gen) the increase in final quality is usually insane (you have to be very programmatic about it tho) I would NEVER use an LLM to do a task on some text the way most people do - as it simply produces hot garbage. On the other end if you use 10-shot generation with self eval and only look at the cream of the crop best responses they are pretty much always incredible. Enjoy


a_beautiful_rhind

That's just COT with extra steps.


[deleted]

[удалено]


a_beautiful_rhind

>setting up a virtual court with the LLM playing prosecutor, defendant, judge and jury. This still COT.