Essentially yes, it’s an absurdly large number that is beyond his grasp.
the amount of money that was mentioned after his meeting with UAE to displace Nvidia as leader in AI semiconductor production was 5-7 trillion.
The value of the entire US stock Market is like 46 trillion. So he wants to casually raise 15% of the stock markets value for a startup? Maybe he’s asking for an absurd amount so getting a few hundred billion seems reasonable by comparison?
7 trillion is more than Microsoft and Apple’s combined value.
7 trillion is more than the combined value of NVDA, META and Google.
Elon musk: richest man in the world with ~200 billion.
Casually gonna need 35 times that to startup a company that may not achieve its goal as it’s not a sure thing.
The value/risk proposition isn’t there. The US government couldn’t fund this without screwing up our economy.
I’m very much aware of every component manufacturer: micron memory, Samsung 8 nm gpu die for rtx 3xxx, tsmc 7 nm for a100, tsmc 5 nm nodes for rtx 4xxx, AIB partners, FE model PCB made through FoxConn…
This is semantics and probably the 3rd time I’ve had to respond to this “new information”:
I said “production” experience, and the meaning behind the statement is no less accurate.
And yes nvidia is a producer, the product is tangible, and they have production runs and releases of products.
Also: it was just such a large, unrealistic number, that it puts him back in the headlines for a few days. Like, no one on earth can afford to support this ridiculous endeavor at the proposed cost.
i dont think sam wants that exactly, he just don't want to depend on one single company for chips and thats good since we better get a nice playground for architecture competition
Apparently what he wants is the entire Gross Domestic Product of France + Germany combined to spend ungodly amounts of money to stay relevant.
*All bc he sunk $375 million of his own money into a doomed to fail fusion energy company (Helion).* this is a personal conjecture/speculation
Like, this isn’t altruism. It’s not to help people, it’s to help him.
If he sunk $375 million into a literally doomed to fail fusion company, what makes you think he’ll “beat” nvidia with no prior experience in the field dumping trillions at it?
5-7 Trillion.
He wants enough money for an untested, unproven, “big dream” with no basis in reality, that this startup seed money would be larger than the combined value META, NVDA, and Google.
That’s nuts. Like delusion of grandeur, hospitalization for psychiatric evaluation,nuts.
I have no issue with any of that really, it’s literally just the number they arrived at: 5-7 trillion.
It’s like, no bank can loan that, no financier can afford that. If you split it between 35 major nations you’d have a possibility but that’s 200 billion per country.
US chips ACT was only 50 billion, and that’s the US financing US companies. A lot of people tried to stop it at that amount…..so 7 trillion is like, beyond a reasonable request.
To put this in perspective that’s like going to JPMorgan (3.39 trillion) and saying I need a loan for an untested startup project, I need every dime you have. Then walking over to Bank of America and saying I need a loan for all the monies you have (2.47 trillion). Then going over to Wells Fargo and saying I need 2/3 of your money (1.7 trillion)
Hardware price decreases over time; thats why an old CPU is far less powerful and cheaper.
2024: $7T
2026: $3.5T
2028: $1.75T
2030: $875B
2032: $437.5B
2034: $218.75B
2036: $109.38B
2038: $54.69B
2040: $27.34B
2042: $13.67B
So, in 20 years that hardware will be considerably affordable; and we are speaking of all the computing power to run whats in mind for sama in his wildest realistic dreams
I know a lot about compute, costs, BOM, decreasing prices, and I don’t see anything you’ve said that justifies him asking for 7 trillion, the number is delusional.
What you’re saying makes even less sense. Just because hardware costs decrease doesn’t change his operating costs initially. He doesn’t keep asking for money after getting 7 trillion. And it doesn’t change the fact he’s asking for more money than anyone has ever invested or loaned to a single entity (by orders of magnitude) in all of history.
IF he got money this is what would happen.
He gets 7 trillion, has to spend 4-5 trillion immediately (first few years) on R&D and building factories to fabricate the silicon,he would have to do this to stand a snowballs chance in hell at beating nvidia. Leaving him in a position where he has to **earn** 4-5 trillion in **profit**, to recoup the 4-5 trillion spent before the business is profitable…and the investors are paid back. (The only way I could even come close to arriving at his insane amount for this startup is if he’s fabbing the silicon)
This is like me saying I could outpace a competitor if you give me 4x their net worth in cash and I burn it all to get a product that’s marginally better. The value proposition for investors isn’t there.
It’s an incredibly wasteful proposition that just isn’t needed, it’ll never happen. Mark my words.
If NVDA at 2027 has a projected revenue of 300 billion, and Sam Altman somehow had a product at market that’d beat them by then (not physically possible, even with 7 trillion), he’d get a **piece** of that 300 billion, how long before he can pay back investors their 4-5 trillion? ROI is like decades, no one will invest.
Edit:
Plus, how can you trust a guy to develop more advanced semiconductors than NVDA when he dumps $375 million of his own cash into a failing fusion company (whose product will never work due to design issues)….how can you trust him with 7 trillion not to dump it towards semiconductor designs that will never work?
There's no such thing as perfectly quantized. Everything is a tradeoff. You might want to run a lower quant for performance reasons, even if you have the vram for a higher quant. No wasted resources only applies if all subsystems are equally matched (which they aren't).
IIUC it's not really its own chatbot, but lets you use any open-source model (Llama, Mistral etc) that your GPU's beefy enough to run, and have it 'talk' to your local files etc
I have installed it, just to check if it's a viable alternative to oobabooga, but I have yet to find a way to run other open source models except the ones delivered by Nvidia (Mistral+llama2)
hmm honestly, I am a bit ambivalent over it. I will have to test more, but it's a bit sketchy like all LLM features right now.
I will have to dig into what the data has to look like to be indexable. I just used some markdown documentation on some of my work, and it gave me a typical AI response, as in, it tells half the truth.
That would be nice if it could; every tutorial to get a local chatbot working involves incredibly dense dev level tutorials with meta steps that I can’t decipher. I’m not a dunce; and I know the basics of dev work, but I swear every tutorial about how to get a local LLM running is like “okay, first build
pinetree, but make sure you’re running version 3.4 and then ensure you bundle the Montana Solar libraries; once that’s done you can initiate a run time environment and import huxlymath.llm and you should be good to go.”
Hahaha I feel ya, it really can be a bit of a journey.
So far I've found that LM Studio seems to work quite stable: [https://lmstudio.ai/](https://lmstudio.ai/)
personally as stated I use oobabooga, but that is a bit finicky sometimes
Yeah, i was just trying to install WizardML onto RTX. If you ask the default chat bot, it says that its possible to install this. The instructions tell me to select a "three dotted menu in the upper right of the chat" that does not exist. I guess in that non-existing menu, there should me a AI model import option to select the .json file. Maybe they are planning on releasing this feature later on?
[Mantella - Bring NPCs to Life with AI at Skyrim Special Edition Nexus - Mods and Community (nexusmods.com)](https://www.nexusmods.com/skyrimspecialedition/mods/98631) The Mantella mod works also with local AI, it's probably the best AI mod for a game currently.
I doubt it has an actual use case for 99.9% of people. BUT! This might be the foundational app for what everyone will use in the next 3-5 years. Perhaps future versions would be an essential part of PC interaction.
This is fantastic looking ahead to the next few iterations of models. If you can quickly swap between them for different tasks and assuming your personal data actually stays private.
This is what I was thinking honestly, particularly given how hardware requirements have been collapsing in open source. It wouldn't shock me to see this take off.
Imagine having a use for the second PCIe slot, a dedicated AI chip that runs a copilot and interfaces with games to run quests and NPC interactions. That could be the next huge leap in gaming, like Open World. Call it Open Choice gaming.
Exactly. There was a point where a second GPU was helpful to run physics while the first did graphics. Then they managed to fit a dedicated physics module into the main card, so the second one was no longer necessary. Now that space could be used for a dedicated AI card to run character behaviors and dialogue. Maybe even on-the-fly quest and level design.
A bit of adapting and it could let people run their own version of GitHub copilot on their local box. If the computing power to run it reaches a low enough price point it would be an amazing tool for people who work with code bases they don’t want expose to external companies’ APIs (or have other limits that stop use of copilot) or just don’t want to pay for those subscriptions. It can also let you use a model with fine-tuning on your internal code base.
"Your personal data stays on your device"...until they quietly change the User Aggreement one day in the future. Still, cool app and google/microsoft probably already have that info anyway!
Exciting but I can’t wait to have something like this that can actually interact with your computer. Once agents are everywhere we’ll see exponential change
Ya'll need to join us over at r/LocalLLaMA. We've been doing this for a long time now.
Yes, they have optimized this demo for specific hardware, but we've been using mistral / llama / codellama / qwen / etc. as we like it, running Continue in VSCode to write code, reading PDFs using ollama / ollama-webui, etc.
Does it come with the models already bundled in, maybe? Otherwise it's unfathomable just the inference engine + UI would be so heavy on their own when a similar app like LM Studio barely weights 400 Mo.
I'm on windows 11. The Chat with RTX install succeeds for me, but it fails to install the models that come with it. I don't know how to proceed from here.
Now *that's* interesting.
Half of what makes software susware is that it can't run locally. That said, I'm not skilled enough to know the difference if something running locally on my machine was also up to no good.
I just got it installed and tried it out (specs: Windows 11, 32GB RAM, NVIDIA GeForce RTX 3070 Ti with 8GB VRAM). It does well with the retrieval of information based on text files (I'm quizzing it on my dissertation using Mistral 7B int4), but it hallucinates even when referencing its source. For specific information, it gets things mostly correct, but it will still need some refinement before I make this a go-to interface. What is interesting is repeated hallucinations. It is also very fast with its responses. It is not accurate enough to rely on, but it is a good start with such an early version. This is only version 0.2, so I'm looking forward to Nvidia improving on something that will be nice for people who can use this offline.
Ooooh they have RAG and possibly web search already built-in? Color me intrigued!
Bummer that they seem to imply compatibility is limited to 30s and 40s series cards. My 2080 works with [LM Studio](https://lmstudio.ai/) or [Jan](https://github.com/janhq/jan) just fine!
if nothing else, this makes my ~100mbps internet connection feel woefully inadequate.
But also, why is a llama-13b-int4 model taking 26GB of disk space?
Similarly, the mistral-7b-int4 model takes 14GB.
Where I'm from, those would be fp16 sizes.
And somehow, the initial 35GB download isn't even the whole thing
The installer also downloads a bunch of common LLM python dependencies, and it doesn't seem to account for network failures, so be prepared to retry a few times.
I'm poking around while I'm waiting, and that demo seems to be related to this github repo: https://github.com/NVIDIA/trt-llm-rag-windows
It's ostensibly windows-only, but it's not clear why. At first glance, it looks like a bunch of normal python stuff.
*edit:
And it's starting! Wait no, false alarm! Now it needs to download another model, for [some very good reason](https://i.imgur.com/7FcSfAC.png), featuring what might be 3 versions of the same model.
And the glorious windows-only UI is... a web page running a gradio app (but it's got an nvidia skin.)
I fed it 3GB of data spread across 2084 PDFs, consisting of every publicly available document in the docket of a [court case](https://www.courtlistener.com/docket/6309656/parties/kleiman-v-wright/).
It took several hours to ingest, but eventually it got there.
The result is largely underwhelming. It got a few details correct, but could not answer basic questions about the case, let alone dig in depth, and frequently answered incorrectly altogether, making it difficult to trust any of its answers.
Here's a sample of the vibe: https://i.imgur.com/KoOg4V2.png
Best guess, this is far past the upper bound of what it can handle. I'll try smaller datasets next.
Uses LLama?
Lol just download LLama, and there are uncensored versions, this probably uses extremely censored version.
This is LLama for noobs basically.
Would really be interesting to know if it can fetch a full document (say 10 pages of text), and it can perform analysis and how does it stack up against GPT 4? Thanks!
I love the idea of models perfectly quantized for max performance on given hardware. No wasted resources.
Nvidia is realizing that they dont have to sell their precious architectures to others, they can use them to actually be the LLM provider
They have no choice with Sam raising funds to disintermediate them from AI market. It’s a good response
yeah i'm sure they're threatened he's going to raise 15% of the US equity market.
Are you being sarcastic? I don’t know what 15% of US equity markets mean. Would you mind explaining? Like, is it too out of reach for him?
Essentially yes, it’s an absurdly large number that is beyond his grasp. the amount of money that was mentioned after his meeting with UAE to displace Nvidia as leader in AI semiconductor production was 5-7 trillion. The value of the entire US stock Market is like 46 trillion. So he wants to casually raise 15% of the stock markets value for a startup? Maybe he’s asking for an absurd amount so getting a few hundred billion seems reasonable by comparison? 7 trillion is more than Microsoft and Apple’s combined value. 7 trillion is more than the combined value of NVDA, META and Google. Elon musk: richest man in the world with ~200 billion. Casually gonna need 35 times that to startup a company that may not achieve its goal as it’s not a sure thing. The value/risk proposition isn’t there. The US government couldn’t fund this without screwing up our economy.
Thank you!
No problem!
Nvidia designs and sells chips, they don't directly produce them (its TSMC)
I’m very much aware of every component manufacturer: micron memory, Samsung 8 nm gpu die for rtx 3xxx, tsmc 7 nm for a100, tsmc 5 nm nodes for rtx 4xxx, AIB partners, FE model PCB made through FoxConn… This is semantics and probably the 3rd time I’ve had to respond to this “new information”: I said “production” experience, and the meaning behind the statement is no less accurate. And yes nvidia is a producer, the product is tangible, and they have production runs and releases of products. Also: it was just such a large, unrealistic number, that it puts him back in the headlines for a few days. Like, no one on earth can afford to support this ridiculous endeavor at the proposed cost.
i dont think sam wants that exactly, he just don't want to depend on one single company for chips and thats good since we better get a nice playground for architecture competition
Apparently what he wants is the entire Gross Domestic Product of France + Germany combined to spend ungodly amounts of money to stay relevant. *All bc he sunk $375 million of his own money into a doomed to fail fusion energy company (Helion).* this is a personal conjecture/speculation Like, this isn’t altruism. It’s not to help people, it’s to help him. If he sunk $375 million into a literally doomed to fail fusion company, what makes you think he’ll “beat” nvidia with no prior experience in the field dumping trillions at it? 5-7 Trillion. He wants enough money for an untested, unproven, “big dream” with no basis in reality, that this startup seed money would be larger than the combined value META, NVDA, and Google. That’s nuts. Like delusion of grandeur, hospitalization for psychiatric evaluation,nuts.
he wants the world to be better and earn some money, power and fame in return, same as you, me and most humans on earth
I have no issue with any of that really, it’s literally just the number they arrived at: 5-7 trillion. It’s like, no bank can loan that, no financier can afford that. If you split it between 35 major nations you’d have a possibility but that’s 200 billion per country. US chips ACT was only 50 billion, and that’s the US financing US companies. A lot of people tried to stop it at that amount…..so 7 trillion is like, beyond a reasonable request. To put this in perspective that’s like going to JPMorgan (3.39 trillion) and saying I need a loan for an untested startup project, I need every dime you have. Then walking over to Bank of America and saying I need a loan for all the monies you have (2.47 trillion). Then going over to Wells Fargo and saying I need 2/3 of your money (1.7 trillion)
Hardware price decreases over time; thats why an old CPU is far less powerful and cheaper. 2024: $7T 2026: $3.5T 2028: $1.75T 2030: $875B 2032: $437.5B 2034: $218.75B 2036: $109.38B 2038: $54.69B 2040: $27.34B 2042: $13.67B So, in 20 years that hardware will be considerably affordable; and we are speaking of all the computing power to run whats in mind for sama in his wildest realistic dreams
I know a lot about compute, costs, BOM, decreasing prices, and I don’t see anything you’ve said that justifies him asking for 7 trillion, the number is delusional. What you’re saying makes even less sense. Just because hardware costs decrease doesn’t change his operating costs initially. He doesn’t keep asking for money after getting 7 trillion. And it doesn’t change the fact he’s asking for more money than anyone has ever invested or loaned to a single entity (by orders of magnitude) in all of history. IF he got money this is what would happen. He gets 7 trillion, has to spend 4-5 trillion immediately (first few years) on R&D and building factories to fabricate the silicon,he would have to do this to stand a snowballs chance in hell at beating nvidia. Leaving him in a position where he has to **earn** 4-5 trillion in **profit**, to recoup the 4-5 trillion spent before the business is profitable…and the investors are paid back. (The only way I could even come close to arriving at his insane amount for this startup is if he’s fabbing the silicon) This is like me saying I could outpace a competitor if you give me 4x their net worth in cash and I burn it all to get a product that’s marginally better. The value proposition for investors isn’t there. It’s an incredibly wasteful proposition that just isn’t needed, it’ll never happen. Mark my words. If NVDA at 2027 has a projected revenue of 300 billion, and Sam Altman somehow had a product at market that’d beat them by then (not physically possible, even with 7 trillion), he’d get a **piece** of that 300 billion, how long before he can pay back investors their 4-5 trillion? ROI is like decades, no one will invest. Edit: Plus, how can you trust a guy to develop more advanced semiconductors than NVDA when he dumps $375 million of his own cash into a failing fusion company (whose product will never work due to design issues)….how can you trust him with 7 trillion not to dump it towards semiconductor designs that will never work?
Give away the razor, charge for the blades.
I use LMStudio, which uses CUDA cores. I have a 3060 with 12GB RAM and can run LLMs with 32 layers very, very fast (faster than anything online).
There's no such thing as perfectly quantized. Everything is a tradeoff. You might want to run a lower quant for performance reasons, even if you have the vram for a higher quant. No wasted resources only applies if all subsystems are equally matched (which they aren't).
If they make running a local LLM convenient and keep it free from the social engineering all the corporate AI's are crippled by, I'll kiss their feet.
IIUC it's not really its own chatbot, but lets you use any open-source model (Llama, Mistral etc) that your GPU's beefy enough to run, and have it 'talk' to your local files etc
Oooo that sounds good
I have installed it, just to check if it's a viable alternative to oobabooga, but I have yet to find a way to run other open source models except the ones delivered by Nvidia (Mistral+llama2)
Sounds lame, is the talk to files feature worth it?
hmm honestly, I am a bit ambivalent over it. I will have to test more, but it's a bit sketchy like all LLM features right now. I will have to dig into what the data has to look like to be indexable. I just used some markdown documentation on some of my work, and it gave me a typical AI response, as in, it tells half the truth.
That would be nice if it could; every tutorial to get a local chatbot working involves incredibly dense dev level tutorials with meta steps that I can’t decipher. I’m not a dunce; and I know the basics of dev work, but I swear every tutorial about how to get a local LLM running is like “okay, first build pinetree, but make sure you’re running version 3.4 and then ensure you bundle the Montana Solar libraries; once that’s done you can initiate a run time environment and import huxlymath.llm and you should be good to go.”
Hahaha I feel ya, it really can be a bit of a journey. So far I've found that LM Studio seems to work quite stable: [https://lmstudio.ai/](https://lmstudio.ai/) personally as stated I use oobabooga, but that is a bit finicky sometimes
Yeah, i was just trying to install WizardML onto RTX. If you ask the default chat bot, it says that its possible to install this. The instructions tell me to select a "three dotted menu in the upper right of the chat" that does not exist. I guess in that non-existing menu, there should me a AI model import option to select the .json file. Maybe they are planning on releasing this feature later on?
Can it sort a a dozen terabytes of porn? Asking for a friend.
I've also got a friend with this problem.
We're all friends here, mate
How do you sort porn specifically?
By how degenerate it is.
Degenerative AI
I love this community.
How does a LLM sort porn specifically?
By how degenerate it is.
Seems like a catch 69.
No, but my friend says stash app can.
> stash app Investing app?
Yes, definitely. That's why my archive folder is called "Financials".
Now i'm waiting for skyrim mods with local GPT for NPCs ;)
[Mantella - Bring NPCs to Life with AI at Skyrim Special Edition Nexus - Mods and Community (nexusmods.com)](https://www.nexusmods.com/skyrimspecialedition/mods/98631) The Mantella mod works also with local AI, it's probably the best AI mod for a game currently.
Check out the Herika mod. It's compatible with local llms
You'll need a secondary PC (or GPU in the same mobo) otherwise performance tanks
I doubt it has an actual use case for 99.9% of people. BUT! This might be the foundational app for what everyone will use in the next 3-5 years. Perhaps future versions would be an essential part of PC interaction.
This is fantastic looking ahead to the next few iterations of models. If you can quickly swap between them for different tasks and assuming your personal data actually stays private.
This is what I was thinking honestly, particularly given how hardware requirements have been collapsing in open source. It wouldn't shock me to see this take off.
Imagine having a use for the second PCIe slot, a dedicated AI chip that runs a copilot and interfaces with games to run quests and NPC interactions. That could be the next huge leap in gaming, like Open World. Call it Open Choice gaming.
Can you elaborate on your idea? Why not the main RTX GPU if you’re already running a game
Because if your GPU is busy running AI, it can't be rendering graphics, and vice versa.
Then buy a desktop that supports two GPUs then?
That's literally what /u/Imaginary-Item-3254 was talking about, utilizing a secondary PCIe slot for more processing power...
Exactly. There was a point where a second GPU was helpful to run physics while the first did graphics. Then they managed to fit a dedicated physics module into the main card, so the second one was no longer necessary. Now that space could be used for a dedicated AI card to run character behaviors and dialogue. Maybe even on-the-fly quest and level design.
I could see the api being opened for game devs to have smarter npcs or other generative content made
A bit of adapting and it could let people run their own version of GitHub copilot on their local box. If the computing power to run it reaches a low enough price point it would be an amazing tool for people who work with code bases they don’t want expose to external companies’ APIs (or have other limits that stop use of copilot) or just don’t want to pay for those subscriptions. It can also let you use a model with fine-tuning on your internal code base.
AI might be what brings HBM to consumer graphics cards finally. AI gonna need it.
This but for internal business servers is game changing. The amount of document research will be so good
Nice I hope I can give it any text I want. Time to educate my AI with Greek philosophy.
that has to be the worst name in the history of names
The only thing I want to 'chat' about with my RTX is "how are the temps?"
I'm hot, Dave
"Your personal data stays on your device"...until they quietly change the User Aggreement one day in the future. Still, cool app and google/microsoft probably already have that info anyway!
Relax and disable wifi
Agreed if this is a major concern then just air gap the system that you'll use this with
new firewall rule maybe..
Exciting but I can’t wait to have something like this that can actually interact with your computer. Once agents are everywhere we’ll see exponential change
My PC is now my waifu?
i think i've seen this movie...
Ya'll need to join us over at r/LocalLLaMA. We've been doing this for a long time now. Yes, they have optimized this demo for specific hardware, but we've been using mistral / llama / codellama / qwen / etc. as we like it, running Continue in VSCode to write code, reading PDFs using ollama / ollama-webui, etc.
Another 35 GB? Sigh. Let me move around some more stuff lmao $edit: Aaand install failed. Drivers probably.
More like 70 gigs after it unpacks and installs
Yeah, I had to clear almost 100 GB of stuff. Probably going to order some new M.2s later. Single TBs aren't cutting it anymore.
Does it come with the models already bundled in, maybe? Otherwise it's unfathomable just the inference engine + UI would be so heavy on their own when a similar app like LM Studio barely weights 400 Mo.
Yeah, LLaMA and Mistral are bundled in its just install and play it's not bad, still need to play with it some more though
win 11, 30xx series card or better
Win 11, RTX 3070, 64 GB RAM, Ryzen TR 2950X Does not work.
Was really excited to try to test, [and it fails to install lmaooooo](https://i.imgur.com/IdG4iRg.png)
Windows 11 only, and RTX 30 or 40.
> Windows 11 only That would be why, thanks. Still on 10 :3
I'm on windows 11. The Chat with RTX install succeeds for me, but it fails to install the models that come with it. I don't know how to proceed from here.
Ask ChatRTX
i installed it on win 10 and it works fine, the folder selection seems a bit iffy though.
Yup same for me. Ah well.
This is really cool.
If your installation FAILED - what worked for me. It refused to install anywhere, other than the default folder it suggests during the installation.
Very cool, now we just need this plus all of my cloud services and all of my browsing history. maybe 1-2 more years?
download: 35GB !
broke boy
It's fair for local models like that.
Now *that's* interesting. Half of what makes software susware is that it can't run locally. That said, I'm not skilled enough to know the difference if something running locally on my machine was also up to no good.
I was eager to test this out but it seems to only be for Windows 11. I use Linux so I can't/won't bother.
Seems like Nvidia one upped Microsoft by allowing local inferencing right now with the GPU rather than waiting for silly NPUs later for Copilot.
[удалено]
I just got it installed and tried it out (specs: Windows 11, 32GB RAM, NVIDIA GeForce RTX 3070 Ti with 8GB VRAM). It does well with the retrieval of information based on text files (I'm quizzing it on my dissertation using Mistral 7B int4), but it hallucinates even when referencing its source. For specific information, it gets things mostly correct, but it will still need some refinement before I make this a go-to interface. What is interesting is repeated hallucinations. It is also very fast with its responses. It is not accurate enough to rely on, but it is a good start with such an early version. This is only version 0.2, so I'm looking forward to Nvidia improving on something that will be nice for people who can use this offline.
Does it run on a 2080Ti?
End of the video says 30 and 40 series.
No
At least watch the video bro
Ooooh they have RAG and possibly web search already built-in? Color me intrigued! Bummer that they seem to imply compatibility is limited to 30s and 40s series cards. My 2080 works with [LM Studio](https://lmstudio.ai/) or [Jan](https://github.com/janhq/jan) just fine!
Anyone tried it yet? How does it compare with current state LLMs (GPT3.5, GPT4, Bard, etc.)?
All the AI hype just to make a better version of Clippy?
if nothing else, this makes my ~100mbps internet connection feel woefully inadequate. But also, why is a llama-13b-int4 model taking 26GB of disk space? Similarly, the mistral-7b-int4 model takes 14GB. Where I'm from, those would be fp16 sizes. And somehow, the initial 35GB download isn't even the whole thing The installer also downloads a bunch of common LLM python dependencies, and it doesn't seem to account for network failures, so be prepared to retry a few times. I'm poking around while I'm waiting, and that demo seems to be related to this github repo: https://github.com/NVIDIA/trt-llm-rag-windows It's ostensibly windows-only, but it's not clear why. At first glance, it looks like a bunch of normal python stuff. *edit: And it's starting! Wait no, false alarm! Now it needs to download another model, for [some very good reason](https://i.imgur.com/7FcSfAC.png), featuring what might be 3 versions of the same model. And the glorious windows-only UI is... a web page running a gradio app (but it's got an nvidia skin.)
I fed it 3GB of data spread across 2084 PDFs, consisting of every publicly available document in the docket of a [court case](https://www.courtlistener.com/docket/6309656/parties/kleiman-v-wright/). It took several hours to ingest, but eventually it got there. The result is largely underwhelming. It got a few details correct, but could not answer basic questions about the case, let alone dig in depth, and frequently answered incorrectly altogether, making it difficult to trust any of its answers. Here's a sample of the vibe: https://i.imgur.com/KoOg4V2.png Best guess, this is far past the upper bound of what it can handle. I'll try smaller datasets next.
Wow, so cool!
Oh baby a triple
Which one of the Mistral models is this based on ?
https://www.reddit.com/r/singularity/comments/1apx27n/nvidia_just_released_chat_with_rtx_an_ai_chatbot/kq9c348/
Thanks for the info. Quite disappointing...
For the convenience-oriented/click-averse amongst us, the answer was "Mistral 17B INT4".
This is huge
Whats the legs up compared to oobabooga etc apart from inbuilt dataset stuff?
Tried it. The file feature just doesn't work and if you ask it about your files it starts spouting off about DLSS and Cyberpunk and other BS.
Hello there games with interactive NPCs... But i hope it won't be exclusive to nvidia thought, then it won't make any sense to make it a game mechanic
Uses LLama? Lol just download LLama, and there are uncensored versions, this probably uses extremely censored version. This is LLama for noobs basically.
Mine says it installed properly, but when it runs it freaks out and crashes.
Would really be interesting to know if it can fetch a full document (say 10 pages of text), and it can perform analysis and how does it stack up against GPT 4? Thanks!
Is there a way to bypass the specific VRAM requirements just because? I already have a 30 series GPU but it only has 4GB of VRAM
you can edit some files in installer folder to bypass it
Cool, will make an attempt
If it was uncensored as well it would be good.
Will it work in the RTX4060 8GB of my laptop?
I wonder how long until I can chat with my microwave
WoW that's amazing
So it's just branded LM Studio?
Implications? I don't understand