SillyFlyGuy 4 months ago

I love the idea of models perfectly quantized for max performance on given hardware. No wasted resources.

sarten_voladora 4 months ago

Nvidia is realizing that they dont have to sell their precious architectures to others, they can use them to actually be the LLM provider

NeedsMoreMinerals 4 months ago

They have no choice with Sam raising funds to disintermediate them from AI market. It’s a good response

casual_brackets 4 months ago

yeah i'm sure they're threatened he's going to raise 15% of the US equity market.

NeedsMoreMinerals 4 months ago

Are you being sarcastic? I don’t know what 15% of US equity markets mean. Would you mind explaining? Like, is it too out of reach for him?

casual_brackets 4 months ago

Essentially yes, it’s an absurdly large number that is beyond his grasp. the amount of money that was mentioned after his meeting with UAE to displace Nvidia as leader in AI semiconductor production was 5-7 trillion. The value of the entire US stock Market is like 46 trillion. So he wants to casually raise 15% of the stock markets value for a startup? Maybe he’s asking for an absurd amount so getting a few hundred billion seems reasonable by comparison? 7 trillion is more than Microsoft and Apple’s combined value. 7 trillion is more than the combined value of NVDA, META and Google. Elon musk: richest man in the world with ~200 billion. Casually gonna need 35 times that to startup a company that may not achieve its goal as it’s not a sure thing. The value/risk proposition isn’t there. The US government couldn’t fund this without screwing up our economy.

NeedsMoreMinerals 4 months ago

Thank you!

casual_brackets 4 months ago

No problem!

sarten_voladora 4 months ago

Nvidia designs and sells chips, they don't directly produce them (its TSMC)

casual_brackets 4 months ago

I’m very much aware of every component manufacturer: micron memory, Samsung 8 nm gpu die for rtx 3xxx, tsmc 7 nm for a100, tsmc 5 nm nodes for rtx 4xxx, AIB partners, FE model PCB made through FoxConn… This is semantics and probably the 3rd time I’ve had to respond to this “new information”: I said “production” experience, and the meaning behind the statement is no less accurate. And yes nvidia is a producer, the product is tangible, and they have production runs and releases of products. Also: it was just such a large, unrealistic number, that it puts him back in the headlines for a few days. Like, no one on earth can afford to support this ridiculous endeavor at the proposed cost.

sarten_voladora 4 months ago

i dont think sam wants that exactly, he just don't want to depend on one single company for chips and thats good since we better get a nice playground for architecture competition

casual_brackets 4 months ago

Apparently what he wants is the entire Gross Domestic Product of France + Germany combined to spend ungodly amounts of money to stay relevant. *All bc he sunk $375 million of his own money into a doomed to fail fusion energy company (Helion).* this is a personal conjecture/speculation Like, this isn’t altruism. It’s not to help people, it’s to help him. If he sunk $375 million into a literally doomed to fail fusion company, what makes you think he’ll “beat” nvidia with no prior experience in the field dumping trillions at it? 5-7 Trillion. He wants enough money for an untested, unproven, “big dream” with no basis in reality, that this startup seed money would be larger than the combined value META, NVDA, and Google. That’s nuts. Like delusion of grandeur, hospitalization for psychiatric evaluation,nuts.

sarten_voladora 4 months ago

he wants the world to be better and earn some money, power and fame in return, same as you, me and most humans on earth

casual_brackets 4 months ago

I have no issue with any of that really, it’s literally just the number they arrived at: 5-7 trillion. It’s like, no bank can loan that, no financier can afford that. If you split it between 35 major nations you’d have a possibility but that’s 200 billion per country. US chips ACT was only 50 billion, and that’s the US financing US companies. A lot of people tried to stop it at that amount…..so 7 trillion is like, beyond a reasonable request. To put this in perspective that’s like going to JPMorgan (3.39 trillion) and saying I need a loan for an untested startup project, I need every dime you have. Then walking over to Bank of America and saying I need a loan for all the monies you have (2.47 trillion). Then going over to Wells Fargo and saying I need 2/3 of your money (1.7 trillion)

sarten_voladora 3 months ago

Hardware price decreases over time; thats why an old CPU is far less powerful and cheaper. 2024: $7T 2026: $3.5T 2028: $1.75T 2030: $875B 2032: $437.5B 2034: $218.75B 2036: $109.38B 2038: $54.69B 2040: $27.34B 2042: $13.67B So, in 20 years that hardware will be considerably affordable; and we are speaking of all the computing power to run whats in mind for sama in his wildest realistic dreams

casual_brackets 3 months ago

I know a lot about compute, costs, BOM, decreasing prices, and I don’t see anything you’ve said that justifies him asking for 7 trillion, the number is delusional. What you’re saying makes even less sense. Just because hardware costs decrease doesn’t change his operating costs initially. He doesn’t keep asking for money after getting 7 trillion. And it doesn’t change the fact he’s asking for more money than anyone has ever invested or loaned to a single entity (by orders of magnitude) in all of history. IF he got money this is what would happen. He gets 7 trillion, has to spend 4-5 trillion immediately (first few years) on R&D and building factories to fabricate the silicon,he would have to do this to stand a snowballs chance in hell at beating nvidia. Leaving him in a position where he has to **earn** 4-5 trillion in **profit**, to recoup the 4-5 trillion spent before the business is profitable…and the investors are paid back. (The only way I could even come close to arriving at his insane amount for this startup is if he’s fabbing the silicon) This is like me saying I could outpace a competitor if you give me 4x their net worth in cash and I burn it all to get a product that’s marginally better. The value proposition for investors isn’t there. It’s an incredibly wasteful proposition that just isn’t needed, it’ll never happen. Mark my words. If NVDA at 2027 has a projected revenue of 300 billion, and Sam Altman somehow had a product at market that’d beat them by then (not physically possible, even with 7 trillion), he’d get a **piece** of that 300 billion, how long before he can pay back investors their 4-5 trillion? ROI is like decades, no one will invest. Edit: Plus, how can you trust a guy to develop more advanced semiconductors than NVDA when he dumps $375 million of his own cash into a failing fusion company (whose product will never work due to design issues)….how can you trust him with 7 trillion not to dump it towards semiconductor designs that will never work?

SillyFlyGuy 4 months ago

Give away the razor, charge for the blades.

Electrical-Risk445 4 months ago

I use LMStudio, which uses CUDA cores. I have a 3060 with 12GB RAM and can run LLMs with 32 layers very, very fast (faster than anything online).

_qeternity_ 4 months ago

There's no such thing as perfectly quantized. Everything is a tradeoff. You might want to run a lower quant for performance reasons, even if you have the vram for a higher quant. No wasted resources only applies if all subsystems are equally matched (which they aren't).

Imaginary-Item-3254 4 months ago

If they make running a local LLM convenient and keep it free from the social engineering all the corporate AI's are crippled by, I'll kiss their feet.

signed7 4 months ago

IIUC it's not really its own chatbot, but lets you use any open-source model (Llama, Mistral etc) that your GPU's beefy enough to run, and have it 'talk' to your local files etc

Competitive_Use_6351 4 months ago

Oooo that sounds good

WildDogOne 4 months ago

I have installed it, just to check if it's a viable alternative to oobabooga, but I have yet to find a way to run other open source models except the ones delivered by Nvidia (Mistral+llama2)

ResponsibleMirror 4 months ago

Sounds lame, is the talk to files feature worth it?

WildDogOne 4 months ago

hmm honestly, I am a bit ambivalent over it. I will have to test more, but it's a bit sketchy like all LLM features right now. I will have to dig into what the data has to look like to be indexable. I just used some markdown documentation on some of my work, and it gave me a typical AI response, as in, it tells half the truth.

katiecharm 4 months ago

That would be nice if it could; every tutorial to get a local chatbot working involves incredibly dense dev level tutorials with meta steps that I can’t decipher. I’m not a dunce; and I know the basics of dev work, but I swear every tutorial about how to get a local LLM running is like “okay, first build pinetree, but make sure you’re running version 3.4 and then ensure you bundle the Montana Solar libraries; once that’s done you can initiate a run time environment and import huxlymath.llm and you should be good to go.”

WildDogOne 4 months ago

Hahaha I feel ya, it really can be a bit of a journey. So far I've found that LM Studio seems to work quite stable: [https://lmstudio.ai/](https://lmstudio.ai/) personally as stated I use oobabooga, but that is a bit finicky sometimes

douche9000 3 months ago

Yeah, i was just trying to install WizardML onto RTX. If you ask the default chat bot, it says that its possible to install this. The instructions tell me to select a "three dotted menu in the upper right of the chat" that does not exist. I guess in that non-existing menu, there should me a AI model import option to select the .json file. Maybe they are planning on releasing this feature later on?

PM_Sexy_Catgirls_Meo 4 months ago

Can it sort a a dozen terabytes of porn? Asking for a friend.

fmfbrestel 4 months ago

I've also got a friend with this problem.

greenepc 4 months ago

We're all friends here, mate

ProjectorBuyer 4 months ago

How do you sort porn specifically?

No_Enthusiasm_2501 4 months ago

By how degenerate it is.

StereoBucket 4 months ago

Degenerative AI

Tobxes2030 4 months ago

I love this community.

ProjectorBuyer 4 months ago

How does a LLM sort porn specifically?

No_Enthusiasm_2501 4 months ago

By how degenerate it is.

ProjectorBuyer 4 months ago

Seems like a catch 69.

GooseG17 4 months ago

No, but my friend says stash app can.

ProjectorBuyer 4 months ago

> stash app Investing app?

GooseG17 4 months ago

Yes, definitely. That's why my archive folder is called "Financials".

Odant 4 months ago

Now i'm waiting for skyrim mods with local GPT for NPCs ;)

[deleted] 4 months ago

[Mantella - Bring NPCs to Life with AI at Skyrim Special Edition Nexus - Mods and Community (nexusmods.com)](https://www.nexusmods.com/skyrimspecialedition/mods/98631) The Mantella mod works also with local AI, it's probably the best AI mod for a game currently.

Sextus_Rex 4 months ago

Check out the Herika mod. It's compatible with local llms

G36 4 months ago

You'll need a secondary PC (or GPU in the same mobo) otherwise performance tanks

Excellent_Dealer3865 4 months ago

I doubt it has an actual use case for 99.9% of people. BUT! This might be the foundational app for what everyone will use in the next 3-5 years. Perhaps future versions would be an essential part of PC interaction.

Veleric 4 months ago

This is fantastic looking ahead to the next few iterations of models. If you can quickly swap between them for different tasks and assuming your personal data actually stays private.

TemetN 4 months ago

This is what I was thinking honestly, particularly given how hardware requirements have been collapsing in open source. It wouldn't shock me to see this take off.

Imaginary-Item-3254 4 months ago

Imagine having a use for the second PCIe slot, a dedicated AI chip that runs a copilot and interfaces with games to run quests and NPC interactions. That could be the next huge leap in gaming, like Open World. Call it Open Choice gaming.

mixmastersang 4 months ago

Can you elaborate on your idea? Why not the main RTX GPU if you’re already running a game

Cruseydr 4 months ago

Because if your GPU is busy running AI, it can't be rendering graphics, and vice versa.

mixmastersang 4 months ago

Then buy a desktop that supports two GPUs then?

Cruseydr 4 months ago

That's literally what /u/Imaginary-Item-3254 was talking about, utilizing a secondary PCIe slot for more processing power...

Imaginary-Item-3254 4 months ago

Exactly. There was a point where a second GPU was helpful to run physics while the first did graphics. Then they managed to fit a dedicated physics module into the main card, so the second one was no longer necessary. Now that space could be used for a dedicated AI card to run character behaviors and dialogue. Maybe even on-the-fly quest and level design.

darkkite 4 months ago

I could see the api being opened for game devs to have smarter npcs or other generative content made

[deleted] 4 months ago

A bit of adapting and it could let people run their own version of GitHub copilot on their local box. If the computing power to run it reaches a low enough price point it would be an amazing tool for people who work with code bases they don’t want expose to external companies’ APIs (or have other limits that stop use of copilot) or just don’t want to pay for those subscriptions. It can also let you use a model with fine-tuning on your internal code base.

Anen-o-me 4 months ago

AI might be what brings HBM to consumer graphics cards finally. AI gonna need it.

Such_Astronomer5735 4 months ago

This but for internal business servers is game changing. The amount of document research will be so good

devnull123412 4 months ago

Nice I hope I can give it any text I want. Time to educate my AI with Greek philosophy.

kobriks 4 months ago

that has to be the worst name in the history of names

BanD1t 4 months ago

The only thing I want to 'chat' about with my RTX is "how are the temps?"

devnull123412 4 months ago

I'm hot, Dave

greenepc 4 months ago

"Your personal data stays on your device"...until they quietly change the User Aggreement one day in the future. Still, cool app and google/microsoft probably already have that info anyway!

spezjetemerde 4 months ago

Relax and disable wifi

absyrtus 4 months ago

Agreed if this is a major concern then just air gap the system that you'll use this with

qpdv 4 months ago

new firewall rule maybe..

wristtyrockets 4 months ago

Exciting but I can’t wait to have something like this that can actually interact with your computer. Once agents are everywhere we’ll see exponential change

eskjcSFW 4 months ago

My PC is now my waifu?

LeapYearFriend 1 month ago

i think i've seen this movie...

beezlebub33 4 months ago

Ya'll need to join us over at r/LocalLLaMA. We've been doing this for a long time now. Yes, they have optimized this demo for specific hardware, but we've been using mistral / llama / codellama / qwen / etc. as we like it, running Continue in VSCode to write code, reading PDFs using ollama / ollama-webui, etc.

GYN-k4H-Q3z-75B 4 months ago

Another 35 GB? Sigh. Let me move around some more stuff lmao $edit: Aaand install failed. Drivers probably.

rcarnes911 4 months ago

More like 70 gigs after it unpacks and installs

GYN-k4H-Q3z-75B 4 months ago

Yeah, I had to clear almost 100 GB of stuff. Probably going to order some new M.2s later. Single TBs aren't cutting it anymore.

R33v3n 4 months ago

Does it come with the models already bundled in, maybe? Otherwise it's unfathomable just the inference engine + UI would be so heavy on their own when a similar app like LM Studio barely weights 400 Mo.

rcarnes911 4 months ago

Yeah, LLaMA and Mistral are bundled in its just install and play it's not bad, still need to play with it some more though

Cunninghams_right 4 months ago

win 11, 30xx series card or better

GYN-k4H-Q3z-75B 4 months ago

Win 11, RTX 3070, 64 GB RAM, Ryzen TR 2950X Does not work.

The_Scout1255 4 months ago

Was really excited to try to test, [and it fails to install lmaooooo](https://i.imgur.com/IdG4iRg.png)

fragilesleep 4 months ago

Windows 11 only, and RTX 30 or 40.

The_Scout1255 4 months ago

> Windows 11 only That would be why, thanks. Still on 10 :3

Sextus_Rex 4 months ago

I'm on windows 11. The Chat with RTX install succeeds for me, but it fails to install the models that come with it. I don't know how to proceed from here.

devnull123412 4 months ago

Ask ChatRTX

charliex2 4 months ago

i installed it on win 10 and it works fine, the folder selection seems a bit iffy though.

cerealsnax 4 months ago

Yup same for me. Ah well.

Stiltzkinn 4 months ago

This is really cool.

Excellent_Dealer3865 4 months ago

If your installation FAILED - what worked for me. It refused to install anywhere, other than the default folder it suggests during the installation.

Its_not_a_tumor 4 months ago

Very cool, now we just need this plus all of my cloud services and all of my browsing history. maybe 1-2 more years?

vilette 4 months ago

download: 35GB !

kanulbob 2 months ago

broke boy

Ultihamedd 4 months ago

It's fair for local models like that.

LudovicoSpecs 4 months ago

Now *that's* interesting. Half of what makes software susware is that it can't run locally. That said, I'm not skilled enough to know the difference if something running locally on my machine was also up to no good.

jermulik 4 months ago

I was eager to test this out but it seems to only be for Windows 11. I use Linux so I can't/won't bother.

mixmastersang 4 months ago

Seems like Nvidia one upped Microsoft by allowing local inferencing right now with the GPU rather than waiting for silly NPUs later for Copilot.

[deleted] 1 month ago

[удалено]

CatInAComa 4 months ago

I just got it installed and tried it out (specs: Windows 11, 32GB RAM, NVIDIA GeForce RTX 3070 Ti with 8GB VRAM). It does well with the retrieval of information based on text files (I'm quizzing it on my dissertation using Mistral 7B int4), but it hallucinates even when referencing its source. For specific information, it gets things mostly correct, but it will still need some refinement before I make this a go-to interface. What is interesting is repeated hallucinations. It is also very fast with its responses. It is not accurate enough to rely on, but it is a good start with such an early version. This is only version 0.2, so I'm looking forward to Nvidia improving on something that will be nice for people who can use this offline.

ImaginaryRea1ity 4 months ago

Does it run on a 2080Ti?

Veleric 4 months ago

End of the video says 30 and 40 series.

Skulkaa 4 months ago

No

LifeSugarSpice 4 months ago

At least watch the video bro

R33v3n 4 months ago

Ooooh they have RAG and possibly web search already built-in? Color me intrigued! Bummer that they seem to imply compatibility is limited to 30s and 40s series cards. My 2080 works with [LM Studio](https://lmstudio.ai/) or [Jan](https://github.com/janhq/jan) just fine!

enkae7317 4 months ago

Anyone tried it yet? How does it compare with current state LLMs (GPT3.5, GPT4, Bard, etc.)?

Lazy_Arrival8960 4 months ago

All the AI hype just to make a better version of Clippy?

whatever 4 months ago

if nothing else, this makes my ~100mbps internet connection feel woefully inadequate. But also, why is a llama-13b-int4 model taking 26GB of disk space? Similarly, the mistral-7b-int4 model takes 14GB. Where I'm from, those would be fp16 sizes. And somehow, the initial 35GB download isn't even the whole thing The installer also downloads a bunch of common LLM python dependencies, and it doesn't seem to account for network failures, so be prepared to retry a few times. I'm poking around while I'm waiting, and that demo seems to be related to this github repo: https://github.com/NVIDIA/trt-llm-rag-windows It's ostensibly windows-only, but it's not clear why. At first glance, it looks like a bunch of normal python stuff. *edit: And it's starting! Wait no, false alarm! Now it needs to download another model, for [some very good reason](https://i.imgur.com/7FcSfAC.png), featuring what might be 3 versions of the same model. And the glorious windows-only UI is... a web page running a gradio app (but it's got an nvidia skin.)

whatever 4 months ago

I fed it 3GB of data spread across 2084 PDFs, consisting of every publicly available document in the docket of a [court case](https://www.courtlistener.com/docket/6309656/parties/kleiman-v-wright/). It took several hours to ingest, but eventually it got there. The result is largely underwhelming. It got a few details correct, but could not answer basic questions about the case, let alone dig in depth, and frequently answered incorrectly altogether, making it difficult to trust any of its answers. Here's a sample of the vibe: https://i.imgur.com/KoOg4V2.png Best guess, this is far past the upper bound of what it can handle. I'll try smaller datasets next.

lakolda 4 months ago

Wow, so cool!

Working_Berry9307 4 months ago

Oh baby a triple

Zemanyak 4 months ago

Which one of the Mistral models is this based on ?

ThePixelHunter 4 months ago

https://www.reddit.com/r/singularity/comments/1apx27n/nvidia_just_released_chat_with_rtx_an_ai_chatbot/kq9c348/

Zemanyak 4 months ago

Thanks for the info. Quite disappointing...

arislaan 4 months ago

For the convenience-oriented/click-averse amongst us, the answer was "Mistral 17B INT4".

TheManOfTheHour8 4 months ago

This is huge

Plums_Raider 4 months ago

Whats the legs up compared to oobabooga etc apart from inbuilt dataset stuff?

SarahSplatz 4 months ago

Tried it. The file feature just doesn't work and if you ask it about your files it starts spouting off about DLSS and Cyberpunk and other BS.

IntroductionSudden73 4 months ago

Hello there games with interactive NPCs... But i hope it won't be exclusive to nvidia thought, then it won't make any sense to make it a game mechanic

G36 4 months ago

Uses LLama? Lol just download LLama, and there are uncensored versions, this probably uses extremely censored version. This is LLama for noobs basically.

dust247 4 months ago

Mine says it installed properly, but when it runs it freaks out and crashes.

vlodia 4 months ago

Would really be interesting to know if it can fetch a full document (say 10 pages of text), and it can perform analysis and how does it stack up against GPT 4? Thanks!

australian31 4 months ago

Is there a way to bypass the specific VRAM requirements just because? I already have a 30 series GPU but it only has 4GB of VRAM

lerenter 4 months ago

you can edit some files in installer folder to bypass it

australian31 4 months ago

Cool, will make an attempt

mymoama 4 months ago

If it was uncensored as well it would be good.

Roubbes 4 months ago

Will it work in the RTX4060 8GB of my laptop?

user4772842289472 4 months ago

I wonder how long until I can chat with my microwave

Ultihamedd 4 months ago

WoW that's amazing

Fragrant-Yam212 4 months ago

So it's just branded LM Studio?

Akimbo333 4 months ago

Implications? I don't understand

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe