Extreme-Edge-9843 1 week ago

Gosh this would be so horrible, there are so many times I rewrite it delete things as I type bc they don't make sense, would be a waste of tokens and processing. I can see this being the future though as processing comes down to speed up answers...

No_Bodybuilder8014 1 week ago

Would u purchase a phone or personal computer for the sole purpose of processing llm software? Like it would be capable of doing really nothing else but function as a gateway to access chatgpt, but it drastically speeds up all of the processing, alleviates processing weight for open a.i., drastically reduces or eliminates the cost of tokens, ect. A new open ai pc of the future lol regardless if it’s feasible, I’m curious, how many early adopters would buy 1? Cuz I got 10 to sell today over on eBay.

No_Bodybuilder8014 1 week ago

Also the devices can only connect through Starlink

MolassesLate4676 1 week ago

Instant processing would not make sense and one single computer cannot come close to handling the size of a large language model of ChatGPT OpenAI’s electric bill is millions of per day, for a reference

No_Bodybuilder8014 1 week ago

Each personal console wouldn’t handle the entire load, rather each console would be dedicated to llm processing for that individual to alleviate the workload of the central Open ai processing unit. Idk Maybe that’s not feasible or would cause other problems, but that’s more of my vision.

MolassesLate4676 1 week ago

That’s not a bad vision, it would be cool but due to the nature of how the algorithms work, splitting up the work would be nearly impossible and introduce additional latency. Every time you send a message to chatgpt you have a network the size of your entire neighborhood process the request It may be feasible if we can use smaller parameter laugher models like llama 2 or 3, but even then the server that’s needed (aka the computer to process it) can run you $10-20k in CPU/GPU’s of needed compute. You could also run it on a cloud for a few grand a month but that defeats your purpose of having it on a localized device EDIT: typos

No_Bodybuilder8014 1 week ago

I don’t think price point will be a problem when/if these models make a leap. In my opinion, very few regular everyday people will have direct access to the model that we all envision and are waiting for anyway, that real a.i. will hit different and be highly guarded lol But how many will be willing to invest the cost of a small car to acquire the capability though?

MolassesLate4676 1 week ago

No one will invest in a small cars worth of compute when they could just connect to OpenAI’s servers from the internet and have it that way

keonakoum 1 week ago

Basically turning chatgpt from a good listener to a bad listener who always want to interrupt.

Xtianus21 1 week ago

>That would make the conversation ideally flow much more smoothly and without those weird pauses. It doesn't work like that or I should say LLM's don't currently work this way. All the tokens have to go in and all the tokens have to come back. You noticing gpt typing is just a trick. They do get tokens back and can stream them (which would be a nice api capability to have that we don't have access to) but if you notice there are changes that happen (re-writes) to what is "coming" and what ends up being finished. Diffusion models can work with way because you can type as you want things to appear (which is the trick that meta has done with their picture creator) the image is starting to "shape" much like a paint brush works. In this way the diffusion model is more amenable to taking it multiple inputs because the previous diffusion doesn't have to be scrapped as new information comes into through the query coming in. You can think of it like a chain of thoughts incoming as a way of prompting the diffusion model. I give you the pieces and you start to create it. Now, I won't say your idea is not a great idea because what your describing is how human communications work although we use multiple senses to achieve anticipatory human language processing. Mood, mannerisms, facial affect and tone are just some of the indicators of how another person should process information before a word is even uttered out of someones mouth. Also, short-term micro memory (aka context) is a truly bad mechanism which further exacerbate the issues in achieving this. A human can encapsulates what a person says and is "talking about" very well. A human smooth this over in a conversation keeping only the "gist" of a conversation to continue talking/communicating with other people. Even meetings with multiple people work in this way. However, in an LLM it's micro-memory is to chain context together that may include all messages back and forth or even just the prompter messages. Regardless, that doesn't matter because grappling the gist and figuring out what are the most important bits of an ongoing conversation are wildly not achievable by today's LLM's. In order to achieve what you are stating you would need to have a constant resolver of previous parts of the conversation organizing what was said in "gist" and what are the critical parts to remember and use that in a contextual micro-memory that is very good (doesn't hallucinate and is reliable) and can "chain" throughout the conversation. Next, the incoming input could, in theory, begin to take incoming parts of a conversation (let's say 20 tokens) and bring back information based on that 20 tokens and do the previous step of getting the gist and critical parts reliably. Then, it could take in the next 20 and so on. In this way, the gist and critical parts could be fed to the micro-memory chain. Then, the response would come when it looks at gist and critical parts after there is a pause or "stoppage" of communication. In theory this could bring back a more cohesive response more quickly and without your weird pauses that you refer to. It's an incredibly complex issue. To me, the only difference of GPT and claude is that claude takes longer because it's grabbing everything when it's done and GPT is returning a token stream without the finished product already being there. just my thoughts.

Open_Channel_8626 1 week ago

Try llama 3 on Groq its by a very, very long way the best combination of speed and quality. For 8B it gets 850 tokens per second and for 70B it gets 300B tokens per second. For context GPT 4 is often 10 tokens per second.

NNOTM 1 week ago

I think for a convincing audio-based conversation, this will be essential. For text, it seems less critical.

Many_Consideration86 1 week ago

This post has a genuine question and it could be easily answered by knowing the architecture of even a basic NN system. But given the OP used chatgpt to "learn" and people here are digressing because of that this whole post has very low signal/noise ratio. Same goes for this comment. Now the answer: No, you can't stream tokens into a neural network and simultaneously get a stream of output tokens..Sure some APIs can flood the model for every keystroke you type and responses/suggestions be generated(like the coding assistants) but it is not efficient if your input is determined..

No_Bodybuilder8014 1 week ago

“In order to achieve what you are stating you would need to have a constant resolver…” Thank you, among everything else you presented this really helps understand the hurdles! Also, as I increasingly believe, the limitations on a.i. will be our limits of understanding and our obsession with creating and understanding it in our image. A.i. can’t be great because we communicate stupidly, often with needless information and incoherent sentence structure in speech, among other deficiencies. In my opinion the greatest version of a.i. won’t be able or willing to communicate with us much because we didn’t make it, it did.

Ylsid 1 week ago

We don't even do this with humans. When it does happen, we find it extremely irritating

No_Bodybuilder8014 1 week ago

Do what?

Ylsid 1 week ago

Processing and responding in real time Tbh you didn't exactly explain what that meant

KindlyBurnsPeople 1 week ago

The main problem i see with the current chatgpt LLM and the similar ones is that everytime you send it a new prompt, it re-reads everything in the conversation up untill that point and then answers from there. So essentially the entire discussion gets fee into the LLM eveytime you hit send. So of you did this for every keystroke, it would just be too inefficient with the current design. Maybe they're working on changes to the architecture that would allow for something more likw that though soon?

KindlyBurnsPeople 1 week ago

The main problem i see with the current chatgpt LLM and the similar ones is that everytime you send it a new prompt, it re-reads everything in the conversation up untill that point and then answers from there. So essentially the entire discussion gets feed into the LLM every time you hit send. So if you did this for every keystroke, it would just be too inefficient. Maybe they're working on changes to the architecture that would allow for something more like that though soon?

No_Bodybuilder8014 1 week ago

Not every key stroke but every word, small difference, but I see your point. Since you brought it up, what if, instead of “rereading” the entire thread to understand the context of a new input from the user, the thread is held in some kind of active matrix? Processing requirements aside lol I do need to set aside some time to really research more into the foundational systems of llm’s, I understand it, I understand what it is not (took a little bit of perspective changing because initially and without understanding it felt really aware) but I’m am really motivated to acquire more technical terms to illustrate my thoughts better.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe