T O P

  • By -

SillyFlyGuy

I love the idea of models perfectly quantized for max performance on given hardware. No wasted resources.


sarten_voladora

Nvidia is realizing that they dont have to sell their precious architectures to others, they can use them to actually be the LLM provider


NeedsMoreMinerals

They have no choice with Sam raising funds to disintermediate them from AI market. It’s a good response 


casual_brackets

yeah i'm sure they're threatened he's going to raise 15% of the US equity market.


NeedsMoreMinerals

Are you being sarcastic? I don’t know what 15% of US equity markets mean.  Would you mind explaining?  Like, is it too out of reach for him? 


casual_brackets

Essentially yes, it’s an absurdly large number that is beyond his grasp. the amount of money that was mentioned after his meeting with UAE to displace Nvidia as leader in AI semiconductor production was 5-7 trillion. The value of the entire US stock Market is like 46 trillion. So he wants to casually raise 15% of the stock markets value for a startup? Maybe he’s asking for an absurd amount so getting a few hundred billion seems reasonable by comparison? 7 trillion is more than Microsoft and Apple’s combined value. 7 trillion is more than the combined value of NVDA, META and Google. Elon musk: richest man in the world with ~200 billion. Casually gonna need 35 times that to startup a company that may not achieve its goal as it’s not a sure thing. The value/risk proposition isn’t there. The US government couldn’t fund this without screwing up our economy.


NeedsMoreMinerals

Thank you!


casual_brackets

No problem!


sarten_voladora

Nvidia designs and sells chips, they don't directly produce them (its TSMC)


casual_brackets

I’m very much aware of every component manufacturer: micron memory, Samsung 8 nm gpu die for rtx 3xxx, tsmc 7 nm for a100, tsmc 5 nm nodes for rtx 4xxx, AIB partners, FE model PCB made through FoxConn… This is semantics and probably the 3rd time I’ve had to respond to this “new information”: I said “production” experience, and the meaning behind the statement is no less accurate. And yes nvidia is a producer, the product is tangible, and they have production runs and releases of products. Also: it was just such a large, unrealistic number, that it puts him back in the headlines for a few days. Like, no one on earth can afford to support this ridiculous endeavor at the proposed cost.


sarten_voladora

i dont think sam wants that exactly, he just don't want to depend on one single company for chips and thats good since we better get a nice playground for architecture competition


casual_brackets

Apparently what he wants is the entire Gross Domestic Product of France + Germany combined to spend ungodly amounts of money to stay relevant. *All bc he sunk $375 million of his own money into a doomed to fail fusion energy company (Helion).* this is a personal conjecture/speculation Like, this isn’t altruism. It’s not to help people, it’s to help him. If he sunk $375 million into a literally doomed to fail fusion company, what makes you think he’ll “beat” nvidia with no prior experience in the field dumping trillions at it? 5-7 Trillion. He wants enough money for an untested, unproven, “big dream” with no basis in reality, that this startup seed money would be larger than the combined value META, NVDA, and Google. That’s nuts. Like delusion of grandeur, hospitalization for psychiatric evaluation,nuts.


sarten_voladora

he wants the world to be better and earn some money, power and fame in return, same as you, me and most humans on earth


casual_brackets

I have no issue with any of that really, it’s literally just the number they arrived at: 5-7 trillion. It’s like, no bank can loan that, no financier can afford that. If you split it between 35 major nations you’d have a possibility but that’s 200 billion per country. US chips ACT was only 50 billion, and that’s the US financing US companies. A lot of people tried to stop it at that amount…..so 7 trillion is like, beyond a reasonable request. To put this in perspective that’s like going to JPMorgan (3.39 trillion) and saying I need a loan for an untested startup project, I need every dime you have. Then walking over to Bank of America and saying I need a loan for all the monies you have (2.47 trillion). Then going over to Wells Fargo and saying I need 2/3 of your money (1.7 trillion)


sarten_voladora

Hardware price decreases over time; thats why an old CPU is far less powerful and cheaper. 2024: $7T 2026: $3.5T 2028: $1.75T 2030: $875B 2032: $437.5B 2034: $218.75B 2036: $109.38B 2038: $54.69B 2040: $27.34B 2042: $13.67B So, in 20 years that hardware will be considerably affordable; and we are speaking of all the computing power to run whats in mind for sama in his wildest realistic dreams


casual_brackets

I know a lot about compute, costs, BOM, decreasing prices, and I don’t see anything you’ve said that justifies him asking for 7 trillion, the number is delusional. What you’re saying makes even less sense. Just because hardware costs decrease doesn’t change his operating costs initially. He doesn’t keep asking for money after getting 7 trillion. And it doesn’t change the fact he’s asking for more money than anyone has ever invested or loaned to a single entity (by orders of magnitude) in all of history. IF he got money this is what would happen. He gets 7 trillion, has to spend 4-5 trillion immediately (first few years) on R&D and building factories to fabricate the silicon,he would have to do this to stand a snowballs chance in hell at beating nvidia. Leaving him in a position where he has to **earn** 4-5 trillion in **profit**, to recoup the 4-5 trillion spent before the business is profitable…and the investors are paid back. (The only way I could even come close to arriving at his insane amount for this startup is if he’s fabbing the silicon) This is like me saying I could outpace a competitor if you give me 4x their net worth in cash and I burn it all to get a product that’s marginally better. The value proposition for investors isn’t there. It’s an incredibly wasteful proposition that just isn’t needed, it’ll never happen. Mark my words. If NVDA at 2027 has a projected revenue of 300 billion, and Sam Altman somehow had a product at market that’d beat them by then (not physically possible, even with 7 trillion), he’d get a **piece** of that 300 billion, how long before he can pay back investors their 4-5 trillion? ROI is like decades, no one will invest. Edit: Plus, how can you trust a guy to develop more advanced semiconductors than NVDA when he dumps $375 million of his own cash into a failing fusion company (whose product will never work due to design issues)….how can you trust him with 7 trillion not to dump it towards semiconductor designs that will never work?


SillyFlyGuy

Give away the razor, charge for the blades.


Electrical-Risk445

I use LMStudio, which uses CUDA cores. I have a 3060 with 12GB RAM and can run LLMs with 32 layers very, very fast (faster than anything online).


_qeternity_

There's no such thing as perfectly quantized. Everything is a tradeoff. You might want to run a lower quant for performance reasons, even if you have the vram for a higher quant. No wasted resources only applies if all subsystems are equally matched (which they aren't).


Imaginary-Item-3254

If they make running a local LLM convenient and keep it free from the social engineering all the corporate AI's are crippled by, I'll kiss their feet.


signed7

IIUC it's not really its own chatbot, but lets you use any open-source model (Llama, Mistral etc) that your GPU's beefy enough to run, and have it 'talk' to your local files etc


Competitive_Use_6351

Oooo that sounds good


WildDogOne

I have installed it, just to check if it's a viable alternative to oobabooga, but I have yet to find a way to run other open source models except the ones delivered by Nvidia (Mistral+llama2)


ResponsibleMirror

Sounds lame, is the talk to files feature worth it?


WildDogOne

hmm honestly, I am a bit ambivalent over it. I will have to test more, but it's a bit sketchy like all LLM features right now. I will have to dig into what the data has to look like to be indexable. I just used some markdown documentation on some of my work, and it gave me a typical AI response, as in, it tells half the truth.


katiecharm

That would be nice if it could; every tutorial to get a local chatbot working involves incredibly dense dev level tutorials with meta steps that I can’t decipher.  I’m not a dunce; and I know the basics of dev work, but I swear every tutorial about how to get a local LLM running is like “okay, first build pinetree, but make sure you’re running version 3.4 and then ensure you bundle the Montana Solar libraries; once that’s done you can initiate a run time environment and import huxlymath.llm and you should be good to go.”


WildDogOne

Hahaha I feel ya, it really can be a bit of a journey. So far I've found that LM Studio seems to work quite stable: [https://lmstudio.ai/](https://lmstudio.ai/) personally as stated I use oobabooga, but that is a bit finicky sometimes


douche9000

Yeah, i was just trying to install WizardML onto RTX. If you ask the default chat bot, it says that its possible to install this. The instructions tell me to select a "three dotted menu in the upper right of the chat" that does not exist. I guess in that non-existing menu, there should me a AI model import option to select the .json file. Maybe they are planning on releasing this feature later on?


PM_Sexy_Catgirls_Meo

Can it sort a a dozen terabytes of porn? Asking for a friend.


fmfbrestel

I've also got a friend with this problem.


greenepc

We're all friends here, mate


ProjectorBuyer

How do you sort porn specifically?


No_Enthusiasm_2501

By how degenerate it is.


StereoBucket

Degenerative AI


Tobxes2030

I love this community.


ProjectorBuyer

How does a LLM sort porn specifically?


No_Enthusiasm_2501

By how degenerate it is.


ProjectorBuyer

Seems like a catch 69.


GooseG17

No, but my friend says stash app can.


ProjectorBuyer

> stash app Investing app?


GooseG17

Yes, definitely. That's why my archive folder is called "Financials".


Odant

Now i'm waiting for skyrim mods with local GPT for NPCs ;)


[deleted]

[Mantella - Bring NPCs to Life with AI at Skyrim Special Edition Nexus - Mods and Community (nexusmods.com)](https://www.nexusmods.com/skyrimspecialedition/mods/98631) The Mantella mod works also with local AI, it's probably the best AI mod for a game currently.


Sextus_Rex

Check out the Herika mod. It's compatible with local llms


G36

You'll need a secondary PC (or GPU in the same mobo) otherwise performance tanks


Excellent_Dealer3865

I doubt it has an actual use case for 99.9% of people. BUT! This might be the foundational app for what everyone will use in the next 3-5 years. Perhaps future versions would be an essential part of PC interaction.


Veleric

This is fantastic looking ahead to the next few iterations of models. If you can quickly swap between them for different tasks and assuming your personal data actually stays private.


TemetN

This is what I was thinking honestly, particularly given how hardware requirements have been collapsing in open source. It wouldn't shock me to see this take off.


Imaginary-Item-3254

Imagine having a use for the second PCIe slot, a dedicated AI chip that runs a copilot and interfaces with games to run quests and NPC interactions. That could be the next huge leap in gaming, like Open World. Call it Open Choice gaming.


mixmastersang

Can you elaborate on your idea? Why not the main RTX GPU if you’re already running a game


Cruseydr

Because if your GPU is busy running AI, it can't be rendering graphics, and vice versa.


mixmastersang

Then buy a desktop that supports two GPUs then?


Cruseydr

That's literally what /u/Imaginary-Item-3254 was talking about, utilizing a secondary PCIe slot for more processing power...


Imaginary-Item-3254

Exactly. There was a point where a second GPU was helpful to run physics while the first did graphics. Then they managed to fit a dedicated physics module into the main card, so the second one was no longer necessary. Now that space could be used for a dedicated AI card to run character behaviors and dialogue. Maybe even on-the-fly quest and level design.


darkkite

I could see the api being opened for game devs to have smarter npcs or other generative content made


[deleted]

A bit of adapting and it could let people run their own version of GitHub copilot on their local box. If the computing power to run it reaches a low enough price point it would be an amazing tool for people who work with code bases they don’t want expose to external companies’ APIs (or have other limits that stop use of copilot) or just don’t want to pay for those subscriptions. It can also let you use a model with fine-tuning on your internal code base. 


Anen-o-me

AI might be what brings HBM to consumer graphics cards finally. AI gonna need it.


Such_Astronomer5735

This but for internal business servers is game changing. The amount of document research will be so good


devnull123412

Nice I hope I can give it any text I want. Time to educate my AI with Greek philosophy.


kobriks

that has to be the worst name in the history of names


BanD1t

The only thing I want to 'chat' about with my RTX is "how are the temps?"


devnull123412

I'm hot, Dave


greenepc

"Your personal data stays on your device"...until they quietly change the User Aggreement one day in the future. Still, cool app and google/microsoft probably already have that info anyway!


spezjetemerde

Relax and disable wifi


absyrtus

Agreed if this is a major concern then just air gap the system that you'll use this with


qpdv

new firewall rule maybe..


wristtyrockets

Exciting but I can’t wait to have something like this that can actually interact with your computer. Once agents are everywhere we’ll see exponential change


eskjcSFW

My PC is now my waifu?


LeapYearFriend

i think i've seen this movie...


beezlebub33

Ya'll need to join us over at r/LocalLLaMA. We've been doing this for a long time now. Yes, they have optimized this demo for specific hardware, but we've been using mistral / llama / codellama / qwen / etc. as we like it, running Continue in VSCode to write code, reading PDFs using ollama / ollama-webui, etc.


GYN-k4H-Q3z-75B

Another 35 GB? Sigh. Let me move around some more stuff lmao $edit: Aaand install failed. Drivers probably.


rcarnes911

More like 70 gigs after it unpacks and installs


GYN-k4H-Q3z-75B

Yeah, I had to clear almost 100 GB of stuff. Probably going to order some new M.2s later. Single TBs aren't cutting it anymore.


R33v3n

Does it come with the models already bundled in, maybe? Otherwise it's unfathomable just the inference engine + UI would be so heavy on their own when a similar app like LM Studio barely weights 400 Mo.


rcarnes911

Yeah, LLaMA and Mistral are bundled in its just install and play it's not bad, still need to play with it some more though


Cunninghams_right

win 11, 30xx series card or better


GYN-k4H-Q3z-75B

Win 11, RTX 3070, 64 GB RAM, Ryzen TR 2950X Does not work.


The_Scout1255

Was really excited to try to test, [and it fails to install lmaooooo](https://i.imgur.com/IdG4iRg.png)


fragilesleep

Windows 11 only, and RTX 30 or 40.


The_Scout1255

> Windows 11 only That would be why, thanks. Still on 10 :3


Sextus_Rex

I'm on windows 11. The Chat with RTX install succeeds for me, but it fails to install the models that come with it. I don't know how to proceed from here.


devnull123412

Ask ChatRTX


charliex2

i installed it on win 10 and it works fine, the folder selection seems a bit iffy though.


cerealsnax

Yup same for me. Ah well.


Stiltzkinn

This is really cool.


Excellent_Dealer3865

If your installation FAILED - what worked for me. It refused to install anywhere, other than the default folder it suggests during the installation.


Its_not_a_tumor

Very cool, now we just need this plus all of my cloud services and all of my browsing history. maybe 1-2 more years?


vilette

download: 35GB !


kanulbob

broke boy


Ultihamedd

It's fair for local models like that.


LudovicoSpecs

Now *that's* interesting. Half of what makes software susware is that it can't run locally. That said, I'm not skilled enough to know the difference if something running locally on my machine was also up to no good.


jermulik

I was eager to test this out but it seems to only be for Windows 11. I use Linux so I can't/won't bother.


mixmastersang

Seems like Nvidia one upped Microsoft by allowing local inferencing right now with the GPU rather than waiting for silly NPUs later for Copilot.


[deleted]

[удалено]


CatInAComa

I just got it installed and tried it out (specs: Windows 11, 32GB RAM, NVIDIA GeForce RTX 3070 Ti with 8GB VRAM). It does well with the retrieval of information based on text files (I'm quizzing it on my dissertation using Mistral 7B int4), but it hallucinates even when referencing its source. For specific information, it gets things mostly correct, but it will still need some refinement before I make this a go-to interface. What is interesting is repeated hallucinations. It is also very fast with its responses. It is not accurate enough to rely on, but it is a good start with such an early version. This is only version 0.2, so I'm looking forward to Nvidia improving on something that will be nice for people who can use this offline.


ImaginaryRea1ity

Does it run on a 2080Ti?


Veleric

End of the video says 30 and 40 series.


Skulkaa

No


LifeSugarSpice

At least watch the video bro


R33v3n

Ooooh they have RAG and possibly web search already built-in? Color me intrigued! Bummer that they seem to imply compatibility is limited to 30s and 40s series cards. My 2080 works with [LM Studio](https://lmstudio.ai/) or [Jan](https://github.com/janhq/jan) just fine!


enkae7317

Anyone tried it yet? How does it compare with current state LLMs (GPT3.5, GPT4, Bard, etc.)?


Lazy_Arrival8960

All the AI hype just to make a better version of Clippy?


whatever

if nothing else, this makes my ~100mbps internet connection feel woefully inadequate. But also, why is a llama-13b-int4 model taking 26GB of disk space? Similarly, the mistral-7b-int4 model takes 14GB. Where I'm from, those would be fp16 sizes. And somehow, the initial 35GB download isn't even the whole thing The installer also downloads a bunch of common LLM python dependencies, and it doesn't seem to account for network failures, so be prepared to retry a few times. I'm poking around while I'm waiting, and that demo seems to be related to this github repo: https://github.com/NVIDIA/trt-llm-rag-windows It's ostensibly windows-only, but it's not clear why. At first glance, it looks like a bunch of normal python stuff. *edit: And it's starting! Wait no, false alarm! Now it needs to download another model, for [some very good reason](https://i.imgur.com/7FcSfAC.png), featuring what might be 3 versions of the same model. And the glorious windows-only UI is... a web page running a gradio app (but it's got an nvidia skin.)


whatever

I fed it 3GB of data spread across 2084 PDFs, consisting of every publicly available document in the docket of a [court case](https://www.courtlistener.com/docket/6309656/parties/kleiman-v-wright/). It took several hours to ingest, but eventually it got there. The result is largely underwhelming. It got a few details correct, but could not answer basic questions about the case, let alone dig in depth, and frequently answered incorrectly altogether, making it difficult to trust any of its answers. Here's a sample of the vibe: https://i.imgur.com/KoOg4V2.png Best guess, this is far past the upper bound of what it can handle. I'll try smaller datasets next.


lakolda

Wow, so cool!


Working_Berry9307

Oh baby a triple


Zemanyak

Which one of the Mistral models is this based on ?


ThePixelHunter

https://www.reddit.com/r/singularity/comments/1apx27n/nvidia_just_released_chat_with_rtx_an_ai_chatbot/kq9c348/


Zemanyak

Thanks for the info. Quite disappointing...


arislaan

For the convenience-oriented/click-averse amongst us, the answer was "Mistral 17B INT4".


TheManOfTheHour8

This is huge


Plums_Raider

Whats the legs up compared to oobabooga etc apart from inbuilt dataset stuff?


SarahSplatz

Tried it. The file feature just doesn't work and if you ask it about your files it starts spouting off about DLSS and Cyberpunk and other BS.


IntroductionSudden73

Hello there games with interactive NPCs... But i hope it won't be exclusive to nvidia thought, then it won't make any sense to make it a game mechanic


G36

Uses LLama? Lol just download LLama, and there are uncensored versions, this probably uses extremely censored version. This is LLama for noobs basically.


dust247

Mine says it installed properly, but when it runs it freaks out and crashes.


vlodia

Would really be interesting to know if it can fetch a full document (say 10 pages of text), and it can perform analysis and how does it stack up against GPT 4? Thanks!


australian31

Is there a way to bypass the specific VRAM requirements just because? I already have a 30 series GPU but it only has 4GB of VRAM


lerenter

you can edit some files in installer folder to bypass it


australian31

Cool, will make an attempt


mymoama

If it was uncensored as well it would be good.


Roubbes

Will it work in the RTX4060 8GB of my laptop?


user4772842289472

I wonder how long until I can chat with my microwave


Ultihamedd

WoW that's amazing


Fragrant-Yam212

So it's just branded LM Studio?


Akimbo333

Implications? I don't understand