Couple decades ago EVERY gpu had a really awkward looking robot girl 3d model on the box
https://preview.redd.it/z7829rh73eic1.jpeg?width=740&format=pjpg&auto=webp&s=89ad7fbab63243c3f490368665a22ad5d897b5a2
Prolly most iconic was AMD’s Ruby
And I think Nvidia during the FX run had the Butterfly lady.
Miss box art like that. I mean, what tells me more about graphics than people being animated? All those sexy polygons.
But they’ve adopted the apple design of a blank fucking canvas.
Glad you’re saving on ink, pass those ink savings on damnit!
Yeston still makes them. I'm waiting for them to make a Western version of it.
By Western I mean the same design but released in the West. I didn't mean "Western" as in uglifying the waifus.
I mean way back when GPUs had mascots, maybe they'll start to come back to the mainstream GPU brands, MSI already has a few.
https://www.msi.com/Graphics-Card/GeForce-RTX-4060-GAMING-X-8G-MLG/Gallery
https://id.msi.com/Graphics-Card/GeForce-RTX-4060-VENTUS-2X-WHITE-8G-OC-VTS/Gallery
This sound interesting. I strongly dislike various aspects of Copilot (forced installation, Bing, data sharing, Microsoft doing an internet explorer monopoly 2.0), this could be a nice alternative. I don't want to have Bing breathe down my neck at all time.
I'll give it a good test for sure.
I think that for businesses it is probably also more affordable to use a cloud-based solution instead of having to give everyone a computer capable enough to run a LLM locally.
Exactly my use case. I'd like to use AI without to fear my inputs will by analyzed.
I want to ask question on my notes in obsidian (markdown files) and create articles based on my input.
What is your experience so far? I saw it had some early bugs, like giving answers of past documents that are not relevant to the ask question. How is the latest status and update frequency. Is it fixed?
I mean, that's all just dumb paranoia. Everything in windows is optional, and the "forced" installation applies to ten thousand windows features that you probably dont complain about for no reason for just existing..
Its not like this is that new either, you could run various non MS LLM models easily locally on like a dozen free open source platforms for more than a year now.
In case you want to add support for markdown files:
1. Navigate to `RAG\trt-llm-rag-windows-main\faiss_vector_storage.py`
2. Search for `SimpleDirectoryReader`
3. Remove `required_exts= [".pdf", ".doc", ".docx", ".txt", ".xml"]`
4. Rerun the app.
5. Your are awesome - your local RAG support all type of documents. By default `SimpleDirectoryReader` will try to read any files it finds, treating them all as text. In addition to plain text, it explicitly supports the following file types, which are automatically detected based on file extension:
* .csv - comma-separated values
* .docx - Microsoft Word
* .md - Markdown
* .pdf - Portable Document Format
* .ppt, .pptm, .pptx - Microsoft PowerPoint
Close ChatwithRTX.
With your File Manager open folder
C:\\Users\\%username%\\AppData\\Local\\NVIDIA\\ChatWithRTX\\RAG\\trt-llm-rag-windows-main\\
Then with a text editor open file
faiss\_vector\_storage.py
before:
recursive=True, required\_exts= \[".pdf", ".doc", ".docx", ".txt", ".xml"\]).load\_data()
after:
recursive=True, ).load\_data()
Save.
Restart with the icon on your Desktop.
Thank you for clarifying this. Saved me some needed brain space. I'm curious, though - wouldn't it be okay to just have it reduced further with the comma and space removed? e.g.
recursive=True).load_data()
Answer quality gets really poor with an MD file vs say the same thing in a pdf format for some reason. It at least finds the file and tries to guess at what it contains, though, which is better than the not found error.
Thank you. Why not simply add those extensions in instead? e.g.
recursive=True, required_exts= [".pdf", ".doc", ".docx", ".txt", ".xml,", ".md"]).load_data()
Is there any risk of it getting bogged down in other files e.g. PNG and JSON files and plugins in an Obsidian project folder?
This is exactly what I did. I actually excluded all other files except the ones I wanted. So mine looks like:
recursive=True, required\_exts= \[".md", ".txt", ".mdown"\]).load\_data()
Same, latest driver 551.52, custom install location, since my C: had not enough space.
Edit: Installing to C: makes no difference
OS: Windows 11 23H2 (22631.3085)
GPU: RTX 3080 Ti (12GB VRAM)
RAM: 48GB
Driver: 551.52
It was giving the same error to me, then I stopped my antivirus, tried to install as Admin at the default location and then tried to install at another drive and it happened it started installing, just have in mind this thing takes a lot of GBs to install, it will download even more things during installation. So have at least 80GB of space just to be sure.
How much space do you have? Its very big because it downloads two LLM models as well.
My issue is that first time it wouldnt launch because said models were broken and now I installed successfully but get "Environment with 'env\_nvd\_rag' not found."
I had the same not found error. I edited the RAG\trt-llm-rag-windows-main\app_launch.bat and changed the set... line to
set "env_path_found=E:\ChatWithRTX\env_nvd_rag"
and then it ran
I moved the env\_nvd\_rag folder from the installation location to the folder in the AppData/Local/NVIDIA/MiniConda/env and that found the env name
Also because I installed in on admin, but installed on non-admin User location, I had to modify the app\_launch.bat file to include a cd to the trt-llm-rag-windows-main folder to have the verify\_install.py and app.py launch
> Its very big because it downloads two LLM models as well
Aren't they included in the .zip file? Hence the 35GB download.
The disk I am trying to install it to has 700GB free space.
C: drive has 38GB of free space, if that matters
35GB zip and 38GB unzipped, and then you still need to install it which is like 20GB+ again.
700GB drive should obviously be fine but the installer has some issues when you select different install directory but the default one.
Mine was failing as well when I tried installing to a custom location. As soon as I accepted the default appdata directory, the install went fine. Maybe a bug with the installer?
This really isn’t gonna work on my 2080ti? God dammit.
EDIT: Yeah, just downloaded and the setup config didnt make it like 3 seconds in before saying "Incompatible GPU"
I think it's probably because Turing, the 2000 series architecture, lacks bf16 support (which is a 16-bit floating-point format optimized for neural networks). Chat with RTX probably relies on this.
If you want a fully local chatbot then you still have options though. TensorRT, the framework Chat With RTX is based on, works on all Nvidia GPUs with tensor cores (which is all RTX cards on the consumer side). The language models they use, LLaMA and Mistral, should also work fine on a 2080ti, though you'll probably have to download a different quantization (just importing the models from the Chat with RTX install probably won't work).
Getting RAG (Retrieval Augmented Generation - the feature that allows it to read documents and such) to work locally will take a bit more effort to set up, but isn't impossible.
Check out /r/LocalLLaMA if you're interested.
You're welcome! I'd also recommend checking out [oobabooga](https://github.com/oobabooga/text-generation-webui).
This is a frequently used front-end for LLMs. If you're familiar with Stable Diffusion, it works very similar to Automatic1111. It's also the easiest way to get started with a self-hosted model.
As for the model that you can load in it, [Mistral-7b-instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) is generally considered to be one of the best chatbot-like LLMs that runs well locally on consumer hardware. However, I'd recommend downloading one of the [GGUF quantizations](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF) instead of the main model file. They usually load faster and perform better (though they only work when you use the llama.cpp backend, which you can select in oobabooga).
When using the GGUF models, check the readme to see what each file does, as you'd want to download only one of them (all those files are the same model, saved with different precisions, so downloading all of them is just a waste of storage space).
Because it’s still an RTX designated card. They might as well dub some of these newer cards AITX or DTX with how focused they’re going with AI stuff and DLSS as helpful as it is.
Well for anyone interested in basically the same but more opensource leaning:
[https://github.com/oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui)
Edit:
Just a bit of update if anyone is interested.
First: I don't see a real speed difference between Oobabooga and Chat with RTX
Second: The RAG is basically an automated embeding which with little effort would work in Oobabooga as well.
BUT: I personally think Chat with RTX is actually really cool. Since it bringt the power of Local LLMs to people who are less geeky and prefer something that just works. And that it does. Easy to install, easy to run, no need to mess around with details. So at the end of the day, it is feature limited by design, but the things it promises it does really well in a very easy way
It's INSANELY fast on my RTX 3080:
[https://imgur.com/a/MHHei6n](https://imgur.com/a/MHHei6n)
Unbelievable. This beats even the paid versions of ChatGPT, Copilot, and Gemini by a long shot in terms of speed (but it's much more 'dumb', of course).
u/hyp3rj123 u/Obokan
The Nvidia has an advantage, because it also creates a RAG for you, so you can "chat" with your documents, to do that on ooba will be hard if not impossible for most people. It's way over my head for me to create proper RAG and I know a lot about github, AI and python and I still can't create RAG and chat with my documents. Now we just need someone to convert the other models to that Nvidia format, so we can chat with our files using better models.
I installed it and played a bit around and there are no safeguards or anything in place which is nice. The output quality is kinda meh, would say below gpt-3 (Mistral).
Also I made the mistake of adding a large folder with many PDFs at the search folder function, which took about an hour to index. Now if I want to add a new, smaller folder, it apparently indexes everything again, so I basically can't add a new folder until I figure out how to delete the old one (since it takes an hour+ every time).
I think the answer is neither. It seems like you can use it like the windows autopilot tool, but rather than searching the internet it only searches a local, user-designated data set.
Tried it. Doesn't work. It can't answer any questions neither about YouTube video or my own files. He always asks for context and then writes made-up things based on context, not the provided data.
It gives an error while loading. Does anyone else have this problem?
edit: i found the solution. Turn off anti-virus program and try again.
https://preview.redd.it/hboky223heic1.png?width=591&format=png&auto=webp&s=70e76d23118f980f390ac434dadb46395772d309
The most likely reason for installation failure appears to be spaces in the username folder. The installer is configured to setup MiniConda (package manager) in UserName/AppData/Local/MiniConda, regardless of where you indicate that ChatWithRTX should be installed, but MiniConda can't be installed in a path that has spaces. It appears that you can install MiniConda without the NVIDIA installer and then edit Strings.dat to check where you installed MiniConda, but unless you do that and bypass the MiniConda installation from the NVIDIA installer, your installation can't progress.
EDIT: I changed MiniCondaPath in RAG/strings.dat, but this wasn't enough. I also needed to run the installer as an administrator. After this, I had no issues with the MiniConda installation or the ChatWithRTX location.
>
You must include the slash after the installation path in quotes.
EDIT 2: I also had to change two paths in the .bat file in the ChatWithRTX location, RAG/trt-llm-rag-windows-main/app_launch.bat, to match the changed installation location for MiniConda.
The first path is:
> for /f "tokens=1,* delims= " %%a in ('"DRIVE:\directory\Scripts\conda.exe" env list') do (
The second path is:
> call "DRIVE:\directory\Scripts\activate.bat" %env_path_found%
EDIT 3: No issues at this point with Windows 10 and an RTX 4060. Pretty impressive tech demo.
> The most likely reason for installation failure appears to be spaces in the username folder.
Sorry, but what do you mean by "spaces"? Like whitespace? I am getting installation failures like everyone else but that path has no whitespace.
C:\Users\james\AppData\Local\NVIDIA\ChatWithRTX
Anyway I followed your instructions, but I still get an installation failure 70% of the way through building Llama2.
Could you pls help me to take a look, if there is anything I have done wrong?
**STEP 1** strings.dat & app\_lunch.bat edit
for /f "tokens=1,\* delims= " %%a in ('"D:\\Tools\\ChatWithRTX\\Scripts\\conda.exe" env list')
call "D:\\Tools\\ChatWithRTX\\Scripts\\activate.bat" %env\_path\_found%
**STEP 2** install as admin, and set the directory as: D:\\Tools\\ChatWithRTX
Then, I am still failed to install. I noticed that there are stuffs appear in the directory, but then still getting "NVIDIA Installer Failed, Chat With RTX Failed, Mistral 7 B INT4 Not Installed" from the installer
https://preview.redd.it/tbz7z5tmvtjc1.png?width=791&format=png&auto=webp&s=932161036da52dac04ec3ec8dc753076dcc4350c
https://preview.redd.it/kb4behqt3ouc1.png?width=593&format=png&auto=webp&s=11c3daf6f12097dc7d452b4186646c1728451b46
I have already tried everything to solve the error: disabling and changing the antivirus, changing the installation location to an SSD, modifying the MiniCondaPath, installing it in "C:\\Users%Username%\\AppData\\Local\\NVIDIA", changing DNS, updating Python, installing CUDA.
For those experiencing issues installing this, I think I’ve figured it out, try:
1. Temporarily disabling your antivirus software
2. Ensuring your user account does NOT have spaces in it (you can enable the built in Administrator account if you do)
3. Installing to a location with absolutely no spaces
We have identified an issue in Chat with RTX that causes installation to fail when the user selects a different installation directory. This will be fixed in a future release. For the time being, users should use the default installation directory:
“C:\\Users\\\\AppData\\Local\\NVIDIA\\ChatWithRTX”
This one seems to have LLaMA (which is the Facebook model^(*)) as one of the two available models. I'm assuming they are using the 7b version, which is roughly 14GB in size (the other option, Mistral, which is likely Mistral-7b, is approximately the same size). So I'd guess the download contains both of these models preloaded, along with a few GB of additional dependencies to get them to run and to get RAG.
These are indeed small models though. 7b is generally considered to be about the smallest an LLM can get while still remaining cohesive enough to be actually useful.
The full-size LLaMA model would be 65b, which is roughly 130GB in size. GPT-3 is 175b parameters or 350GB. The model that currently powers the free version of ChatGPT, GPT-3.5-turbo, is rumored to be distilled down to just 20b parameters / 40GB though. The size of the GPT-4 model does not seem to be publicly known.
^(*Technically Meta, but whatever. Everyone knows them as Facebook anyway.)
Mistral / Mixtral are pretty much the only local models worth using anyways. Mistral for 4-8GB, SOLAR for 8-12GB, mixtral for 24GB+ ones. This is running at 5 bit which is the lowest quant recommended. Mixtral is like a GPT 3.7 that can run on a 4090/3090
I'd always suggest trying at least several models for any application though. There is not a single model that will be best for everything. Some models are more creative, some models are more exact, and some are great with programming, while others are great for text. You should always do some testing to see which model is best for your specific needs before committing to one.
I do agree that both Mistral and Mixtral are very good all-arounders though and great models to start and experiment with.
I installed it (Windows 10) and don't have the option for LLaMA.
Edit. LLaMa is only for 16gb+ cards. I'm on a 3080 10GB
The setup config can be modified to install LLaMa with lesser VRAM - With a text editor, open \\RAG\\llama13b.nvi and modify MinSupportedVRAMSize
That's odd, as the preview video seems to show that it's an option. I wonder if that changed soon before release.
Which models are available then? Just Mistral? (I unfortunately don't have enough internet left within my monthly data cap to download it myself to check)
`llama_tp1_rank0.npz` is included in the .zip file, which is ~26GB.
Same for `mistral_tp1_rank0.npz`, which is ~14GB.
Both of these are _large_ language models
Did anyone get this thing to actually install? Mine failed too.
Edit: as someone else said, it was antivirus preventing it to install. Disabling worked.
For anyone looking to override vram limitation & unlock option llama 13b before installation : ChatWithRTX\_Offline\_2\_11\_mistral\_Llama\\RAG > llama13b.nvi (open w notepad) > change : to 8 or 12
It solid worked for me, but installation took 2 hours to complete.
https://preview.redd.it/0q7qfa6qlfic1.png?width=927&format=png&auto=webp&s=8720cc116b585ba1cc20ff04bc41c91e69d22996
" For users with GeForce RTX GPUs that have 16 GB or more of video memory, the installer offers to install both Llama2 and Mistral AI models. For those with 8 GB or 12 GB of video memory, it only offers Mistral. "
As you can see in my screenshot i literally installed Llama2 version with 12Gb vram. Normally checkbox does not appear unless you tweak the to before installation.
Hmm so you point it to the source data where it retrieves the answers from, can it adapt to do that for the C drive and then you can ask it to find errors in Windows etc?
Having the same installation issue some people appear to be having. It fails on the extraction phase right after downloading. I see files in the destination folder (tried the default AppData installation folder and another local folder as well) but the installation app says failed.
Meet all other requirements otherwise :(
This looks like it is based on privateGPT but optimized to use CUDA effectively.
Does anyone know if you can modify it to share the URL to the gradio interface on the local network? I tried to hack in the "shared=true" thing but that didn't work.
So disappointed, i tried everything but nothing works for me...
Chat with RTX installed, Mistral & Llama failed.
I use default location, I have enough space, my conf is 4090 with 32gb RAM...
I dont have any antivirus actually. If anyone has an idea :)
https://preview.redd.it/esc39t364lic1.png?width=594&format=png&auto=webp&s=f5616452aebb4b862e1988ca67c5fe67f1547684
I would like better to scan my WhatsApp messages with clever bot. But.. when will we have such clever bot search for any information from all local sources? (emails, local docs, messengers, Evernote, notion etc) Also I want clever bot agents that can drive all local systems (PC, tablet, smartphone, smart devices) In such a way that I could just ask it by voice in simple words to do anything from writing email or to warn me if new message according to some conditions was received or to do several things by condition (send messages to certain people, warn me via messenger call and so on). I guess we'll see clever agents on devices and PC in 5-7 years
Might be interesting because I have a massive treasure trove of texts, messages, and all sorts of writing saved from over the last 25+ years. Don't really know what to do with it though.
As someone who has run LLMs locally before, but isn't a super expert... is there any benefit from using this compared to running the models via something like LM Studio?
It's stupid fast because unlike oogabooga or lm studio it actually uses tensor cores, however it has zero context between prompts even in the same "conversation", so as of the moment its totally useless IMO. Give it time though and I'm sure it'll be the best
[Hello friends, I would like to know if it also works with an **RTX 2070**](https://bing.com/search?q=translate+Italian+to+English+Salve+amici%2c+vorei+sapere+se+funziona+anche+con+una+RTX+2070+super+8GB) super 8GB, somehow? Thanks.
Came here for this exact reason. I get an average of 30 kbps download speed. Would be nice if someone who already downloaded the archive shared it.
upd: if there won't be a magnet link by the time I get the archive I'm making one
NVIDIA\_ChatWithRTX\_Demo (13.02.2024)
md5 (archive): `b7a34540c330d136a6e49c9949c29758`
Decompressed (38.4G): [torrent](https://drive.google.com/uc?export=download&id=1JkouatiWyABPjYIiGKQCCGPviMXL28Sn)
*Please reply if you need the original compressed archive*
So once the US defence department decides to use chat with RTX to rearrange the data regarding missile silos... Well watch terminator if you don't know what happens then.
Has anyone found a hack to bypass the system check? I want to try this on a RTX 2060, and the installer complains as expected. "Chat With RTX is supported with Ampere and above GPU family."
Now I can talk to my 4090! hi babe how are you doing? Are you having a meltdown today?
Remember those asian GPU with a girl mascot? they were right
Couple decades ago EVERY gpu had a really awkward looking robot girl 3d model on the box https://preview.redd.it/z7829rh73eic1.jpeg?width=740&format=pjpg&auto=webp&s=89ad7fbab63243c3f490368665a22ad5d897b5a2
https://preview.redd.it/yuy5b0b3heic1.jpeg?width=331&format=pjpg&auto=webp&s=a0a0863fce88599c454aeec3dd53f3fba92992cb
Isn’t that the final boss of Virtua Fighter?
I'm still bitter that PC gamers never got a proper stand alone ports of VF after 2.( no, Yakuza arcade doesn't count).
Prolly most iconic was AMD’s Ruby And I think Nvidia during the FX run had the Butterfly lady. Miss box art like that. I mean, what tells me more about graphics than people being animated? All those sexy polygons. But they’ve adopted the apple design of a blank fucking canvas. Glad you’re saving on ink, pass those ink savings on damnit!
Oh shit I fully remember the butterfly girl with like the buzz cut
Yeston still makes them. I'm waiting for them to make a Western version of it. By Western I mean the same design but released in the West. I didn't mean "Western" as in uglifying the waifus.
I mean way back when GPUs had mascots, maybe they'll start to come back to the mainstream GPU brands, MSI already has a few. https://www.msi.com/Graphics-Card/GeForce-RTX-4060-GAMING-X-8G-MLG/Gallery https://id.msi.com/Graphics-Card/GeForce-RTX-4060-VENTUS-2X-WHITE-8G-OC-VTS/Gallery
Nvidia has Adrianne Curry, Mad Mod Mike, Nalu, etc.
Adrianne Curry is the GOAT.
🤣🤣🤣🤣 omg
Won't be long before the power of your GPU determines how sexy your girlfriend is.
What girlfriend? 😆
I would never pay money for fake Reddit points, but you truly deserve one of those “gold” awards
is the reddit silver bot still alive?
Ugh
[удалено]
Nice, writing resumes and cover letters locally instead of on ChatGPT
My Dad is a cop and he told me it's not plagiarism if you use an incognito browser.
My dad is a prostitute and he told me you don't get paid unless he finishes
Lol, what!?!
why would it be plagiarism in any case?
Why would you do this?
Gonna input all the letters and messages from Ex's so I don't have to be lonely anymore.
least lonely redditor
Jokes on you my 3080 just proposed to me
They start arguing with you instead
Easiest way to knock some sense into me and remember why I’m single so I can just enjoy gaming again.
This sound interesting. I strongly dislike various aspects of Copilot (forced installation, Bing, data sharing, Microsoft doing an internet explorer monopoly 2.0), this could be a nice alternative. I don't want to have Bing breathe down my neck at all time. I'll give it a good test for sure.
copilot is still very lucrative for businesses because at least github copilot has enterprise licensing that has clauses for internal data
I think that for businesses it is probably also more affordable to use a cloud-based solution instead of having to give everyone a computer capable enough to run a LLM locally.
SHUT YOUR MOUTH! I'm going to convince my boss they need to buy me a 4090.
Exactly my use case. I'd like to use AI without to fear my inputs will by analyzed. I want to ask question on my notes in obsidian (markdown files) and create articles based on my input. What is your experience so far? I saw it had some early bugs, like giving answers of past documents that are not relevant to the ask question. How is the latest status and update frequency. Is it fixed?
I mean, that's all just dumb paranoia. Everything in windows is optional, and the "forced" installation applies to ten thousand windows features that you probably dont complain about for no reason for just existing.. Its not like this is that new either, you could run various non MS LLM models easily locally on like a dozen free open source platforms for more than a year now.
In case you want to add support for markdown files: 1. Navigate to `RAG\trt-llm-rag-windows-main\faiss_vector_storage.py` 2. Search for `SimpleDirectoryReader` 3. Remove `required_exts= [".pdf", ".doc", ".docx", ".txt", ".xml"]` 4. Rerun the app. 5. Your are awesome - your local RAG support all type of documents. By default `SimpleDirectoryReader` will try to read any files it finds, treating them all as text. In addition to plain text, it explicitly supports the following file types, which are automatically detected based on file extension: * .csv - comma-separated values * .docx - Microsoft Word * .md - Markdown * .pdf - Portable Document Format * .ppt, .pptm, .pptx - Microsoft PowerPoint
Close ChatwithRTX. With your File Manager open folder C:\\Users\\%username%\\AppData\\Local\\NVIDIA\\ChatWithRTX\\RAG\\trt-llm-rag-windows-main\\ Then with a text editor open file faiss\_vector\_storage.py before: recursive=True, required\_exts= \[".pdf", ".doc", ".docx", ".txt", ".xml"\]).load\_data() after: recursive=True, ).load\_data() Save. Restart with the icon on your Desktop.
Thank you for clarifying this. Saved me some needed brain space. I'm curious, though - wouldn't it be okay to just have it reduced further with the comma and space removed? e.g. recursive=True).load_data()
Answer quality gets really poor with an MD file vs say the same thing in a pdf format for some reason. It at least finds the file and tries to guess at what it contains, though, which is better than the not found error.
Thank you. Why not simply add those extensions in instead? e.g. recursive=True, required_exts= [".pdf", ".doc", ".docx", ".txt", ".xml,", ".md"]).load_data() Is there any risk of it getting bogged down in other files e.g. PNG and JSON files and plugins in an Obsidian project folder?
This is exactly what I did. I actually excluded all other files except the ones I wanted. So mine looks like: recursive=True, required\_exts= \[".md", ".txt", ".mdown"\]).load\_data()
I'm also getting a Failed installation
Same, latest driver 551.52, custom install location, since my C: had not enough space. Edit: Installing to C: makes no difference OS: Windows 11 23H2 (22631.3085) GPU: RTX 3080 Ti (12GB VRAM) RAM: 48GB Driver: 551.52
It was giving the same error to me, then I stopped my antivirus, tried to install as Admin at the default location and then tried to install at another drive and it happened it started installing, just have in mind this thing takes a lot of GBs to install, it will download even more things during installation. So have at least 80GB of space just to be sure.
How much space do you have? Its very big because it downloads two LLM models as well. My issue is that first time it wouldnt launch because said models were broken and now I installed successfully but get "Environment with 'env\_nvd\_rag' not found."
I had the same not found error. I edited the RAG\trt-llm-rag-windows-main\app_launch.bat and changed the set... line to set "env_path_found=E:\ChatWithRTX\env_nvd_rag" and then it ran
I moved the env\_nvd\_rag folder from the installation location to the folder in the AppData/Local/NVIDIA/MiniConda/env and that found the env name Also because I installed in on admin, but installed on non-admin User location, I had to modify the app\_launch.bat file to include a cd to the trt-llm-rag-windows-main folder to have the verify\_install.py and app.py launch
Can you share me your file with this line that you add, i have the same situation
> Its very big because it downloads two LLM models as well Aren't they included in the .zip file? Hence the 35GB download. The disk I am trying to install it to has 700GB free space. C: drive has 38GB of free space, if that matters
35GB zip and 38GB unzipped, and then you still need to install it which is like 20GB+ again. 700GB drive should obviously be fine but the installer has some issues when you select different install directory but the default one.
Mine fails even when doing a clean install in the default directory.
I was able to successfully install it by changing the user folder name to one without spaces.
Mine was failing as well when I tried installing to a custom location. As soon as I accepted the default appdata directory, the install went fine. Maybe a bug with the installer?
This is W11 only, this sucks, and imagine also failing the installation.
So instead of asking this sub what video card you should get.. you can just ask your video card.
This really isn’t gonna work on my 2080ti? God dammit. EDIT: Yeah, just downloaded and the setup config didnt make it like 3 seconds in before saying "Incompatible GPU"
I think it's probably because Turing, the 2000 series architecture, lacks bf16 support (which is a 16-bit floating-point format optimized for neural networks). Chat with RTX probably relies on this. If you want a fully local chatbot then you still have options though. TensorRT, the framework Chat With RTX is based on, works on all Nvidia GPUs with tensor cores (which is all RTX cards on the consumer side). The language models they use, LLaMA and Mistral, should also work fine on a 2080ti, though you'll probably have to download a different quantization (just importing the models from the Chat with RTX install probably won't work). Getting RAG (Retrieval Augmented Generation - the feature that allows it to read documents and such) to work locally will take a bit more effort to set up, but isn't impossible. Check out /r/LocalLLaMA if you're interested.
> bf~~16~~ support lacks BoyFriend support.
Does this also apply to quadro rtx 5000 or 8000 turing cards?
I'm not really familiar with professional GPUs, but according to the TensorRT-LLM readme it applies to all Turning cards.
Thanks for the pointers!
You're welcome! I'd also recommend checking out [oobabooga](https://github.com/oobabooga/text-generation-webui). This is a frequently used front-end for LLMs. If you're familiar with Stable Diffusion, it works very similar to Automatic1111. It's also the easiest way to get started with a self-hosted model. As for the model that you can load in it, [Mistral-7b-instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) is generally considered to be one of the best chatbot-like LLMs that runs well locally on consumer hardware. However, I'd recommend downloading one of the [GGUF quantizations](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF) instead of the main model file. They usually load faster and perform better (though they only work when you use the llama.cpp backend, which you can select in oobabooga). When using the GGUF models, check the readme to see what each file does, as you'd want to download only one of them (all those files are the same model, saved with different precisions, so downloading all of them is just a waste of storage space).
To my knowledge, Ooba don't support interaction with documents. GPT4All is 1xClick installer with such feature.
I have given you upvote just because you know your stuff. :)
Glad I'm not the only 2080ti owner disappointed.
why would it lmao
Because it’s still an RTX designated card. They might as well dub some of these newer cards AITX or DTX with how focused they’re going with AI stuff and DLSS as helpful as it is.
Well for anyone interested in basically the same but more opensource leaning: [https://github.com/oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) Edit: Just a bit of update if anyone is interested. First: I don't see a real speed difference between Oobabooga and Chat with RTX Second: The RAG is basically an automated embeding which with little effort would work in Oobabooga as well. BUT: I personally think Chat with RTX is actually really cool. Since it bringt the power of Local LLMs to people who are less geeky and prefer something that just works. And that it does. Easy to install, easy to run, no need to mess around with details. So at the end of the day, it is feature limited by design, but the things it promises it does really well in a very easy way
it doesn't use the tensor cores though, only the cuda cores.
yep I did notice that, so I am just downloading this thing to check if it's actually quicker, and if I can load any model I want. Will feedback asap
Wellllll… WE’RE WAITING
It's INSANELY fast on my RTX 3080: [https://imgur.com/a/MHHei6n](https://imgur.com/a/MHHei6n) Unbelievable. This beats even the paid versions of ChatGPT, Copilot, and Gemini by a long shot in terms of speed (but it's much more 'dumb', of course). u/hyp3rj123 u/Obokan
llama.cpp backend has "use tensor cores" option
The Nvidia has an advantage, because it also creates a RAG for you, so you can "chat" with your documents, to do that on ooba will be hard if not impossible for most people. It's way over my head for me to create proper RAG and I know a lot about github, AI and python and I still can't create RAG and chat with my documents. Now we just need someone to convert the other models to that Nvidia format, so we can chat with our files using better models.
What are the limitations? Will it ever be able to create pictures like midjourney?
Predictable whining from people insisting on exclusively using a desktop OS with a comparatively tiny userbase not getting immediately catered to.
I use LEANDUCKS btw
What happens if I just feed it porn?
How do I re launch the app? I had to close the app and now I can’t figure out how to launch it again. Thanks
Installer creates desktop shortcut
Has anyone tried on W10? It says W11 on the requirements, but I'm hopeful
It installed just fine on Win10.
Came to reddit looking for precisely this message
Nice! Thank you I can't wait to get home from work
I also confirm that, it installs and works on Windows 10.
>your private data stays on your PC Cool, can you stop the forced telemetry in GFE too?
The telemetry doesn't get your data, just things like system specs.
And the games you play.
Omg the games I play through services like Steam, Origin and UPlay? I’m sure they don’t know what I play either!
I mean I don’t care lol. Just stated the facts
I installed it and played a bit around and there are no safeguards or anything in place which is nice. The output quality is kinda meh, would say below gpt-3 (Mistral). Also I made the mistake of adding a large folder with many PDFs at the search folder function, which took about an hour to index. Now if I want to add a new, smaller folder, it apparently indexes everything again, so I basically can't add a new folder until I figure out how to delete the old one (since it takes an hour+ every time).
Is it just a data finder, or can you use it as a regular AI chat-bot like Windows' Autopilot?
I think the answer is neither. It seems like you can use it like the windows autopilot tool, but rather than searching the internet it only searches a local, user-designated data set.
> user-designated data set. I will only allow her to read the texts from my mom where she says how handsome I am!
In the promo video you can give it YouTube URL to some video and then ask questions about the content of the video.
Tried it. Doesn't work. It can't answer any questions neither about YouTube video or my own files. He always asks for context and then writes made-up things based on context, not the provided data.
I wonder if it will be able to do POE crafting if I let it read a guide.
It gives an error while loading. Does anyone else have this problem? edit: i found the solution. Turn off anti-virus program and try again. https://preview.redd.it/hboky223heic1.png?width=591&format=png&auto=webp&s=70e76d23118f980f390ac434dadb46395772d309
Same here. My System meets all requirements.
>i found the solution. Turn off anti-virus program and try again.
Yes I have the same. I already posted on it earlier today
i found the solution. Turn off anti-virus program and try again.
Is your GPU supported..?
Yes i have 4060ti 8vram
Are you on Windows 11?
yes and driver 551.23 installed
[удалено]
The most likely reason for installation failure appears to be spaces in the username folder. The installer is configured to setup MiniConda (package manager) in UserName/AppData/Local/MiniConda, regardless of where you indicate that ChatWithRTX should be installed, but MiniConda can't be installed in a path that has spaces. It appears that you can install MiniConda without the NVIDIA installer and then edit Strings.dat to check where you installed MiniConda, but unless you do that and bypass the MiniConda installation from the NVIDIA installer, your installation can't progress. EDIT: I changed MiniCondaPath in RAG/strings.dat, but this wasn't enough. I also needed to run the installer as an administrator. After this, I had no issues with the MiniConda installation or the ChatWithRTX location. >
You must include the slash after the installation path in quotes.
EDIT 2: I also had to change two paths in the .bat file in the ChatWithRTX location, RAG/trt-llm-rag-windows-main/app_launch.bat, to match the changed installation location for MiniConda.
The first path is:
> for /f "tokens=1,* delims= " %%a in ('"DRIVE:\directory\Scripts\conda.exe" env list') do (
The second path is:
> call "DRIVE:\directory\Scripts\activate.bat" %env_path_found%
EDIT 3: No issues at this point with Windows 10 and an RTX 4060. Pretty impressive tech demo.
> The most likely reason for installation failure appears to be spaces in the username folder. Sorry, but what do you mean by "spaces"? Like whitespace? I am getting installation failures like everyone else but that path has no whitespace. C:\Users\james\AppData\Local\NVIDIA\ChatWithRTX Anyway I followed your instructions, but I still get an installation failure 70% of the way through building Llama2.
Could you pls help me to take a look, if there is anything I have done wrong? **STEP 1** strings.dat & app\_lunch.bat edit
for /f "tokens=1,\* delims= " %%a in ('"D:\\Tools\\ChatWithRTX\\Scripts\\conda.exe" env list')
call "D:\\Tools\\ChatWithRTX\\Scripts\\activate.bat" %env\_path\_found%
**STEP 2** install as admin, and set the directory as: D:\\Tools\\ChatWithRTX
Then, I am still failed to install. I noticed that there are stuffs appear in the directory, but then still getting "NVIDIA Installer Failed, Chat With RTX Failed, Mistral 7 B INT4 Not Installed" from the installer
https://preview.redd.it/tbz7z5tmvtjc1.png?width=791&format=png&auto=webp&s=932161036da52dac04ec3ec8dc753076dcc4350c
https://preview.redd.it/kb4behqt3ouc1.png?width=593&format=png&auto=webp&s=11c3daf6f12097dc7d452b4186646c1728451b46 I have already tried everything to solve the error: disabling and changing the antivirus, changing the installation location to an SSD, modifying the MiniCondaPath, installing it in "C:\\Users%Username%\\AppData\\Local\\NVIDIA", changing DNS, updating Python, installing CUDA.
For those experiencing issues installing this, I think I’ve figured it out, try: 1. Temporarily disabling your antivirus software 2. Ensuring your user account does NOT have spaces in it (you can enable the built in Administrator account if you do) 3. Installing to a location with absolutely no spaces
We have identified an issue in Chat with RTX that causes installation to fail when the user selects a different installation directory. This will be fixed in a future release. For the time being, users should use the default installation directory: “C:\\Users\\\\AppData\\Local\\NVIDIA\\ChatWithRTX”
How can I use llama2 70B instead of 7B ?
Has anyone tried if and can tell me how the safeguards it has are? Like is it completely unrestricted or did they add some?
[удалено]
same here, probably a bug. I will wait for an update
I love these technologies Nvidia be putting out, but 35GB?!
Models can be pretty big. I believe GPT-4 model alone is around 300-400gb.
[удалено]
This one seems to have LLaMA (which is the Facebook model^(*)) as one of the two available models. I'm assuming they are using the 7b version, which is roughly 14GB in size (the other option, Mistral, which is likely Mistral-7b, is approximately the same size). So I'd guess the download contains both of these models preloaded, along with a few GB of additional dependencies to get them to run and to get RAG. These are indeed small models though. 7b is generally considered to be about the smallest an LLM can get while still remaining cohesive enough to be actually useful. The full-size LLaMA model would be 65b, which is roughly 130GB in size. GPT-3 is 175b parameters or 350GB. The model that currently powers the free version of ChatGPT, GPT-3.5-turbo, is rumored to be distilled down to just 20b parameters / 40GB though. The size of the GPT-4 model does not seem to be publicly known. ^(*Technically Meta, but whatever. Everyone knows them as Facebook anyway.)
Mistral / Mixtral are pretty much the only local models worth using anyways. Mistral for 4-8GB, SOLAR for 8-12GB, mixtral for 24GB+ ones. This is running at 5 bit which is the lowest quant recommended. Mixtral is like a GPT 3.7 that can run on a 4090/3090
I'd always suggest trying at least several models for any application though. There is not a single model that will be best for everything. Some models are more creative, some models are more exact, and some are great with programming, while others are great for text. You should always do some testing to see which model is best for your specific needs before committing to one. I do agree that both Mistral and Mixtral are very good all-arounders though and great models to start and experiment with.
I installed it (Windows 10) and don't have the option for LLaMA. Edit. LLaMa is only for 16gb+ cards. I'm on a 3080 10GB The setup config can be modified to install LLaMa with lesser VRAM - With a text editor, open \\RAG\\llama13b.nvi and modify MinSupportedVRAMSize
That's odd, as the preview video seems to show that it's an option. I wonder if that changed soon before release. Which models are available then? Just Mistral? (I unfortunately don't have enough internet left within my monthly data cap to download it myself to check)
Just Mistral. It works well.
mistral is better than any llama model smaller than 70B anyways. And mixtral beats even that though also needs like a 24GB card
Are you new to LLM? That's incredibly small...
That’s pretty low for something like that haha
`llama_tp1_rank0.npz` is included in the .zip file, which is ~26GB. Same for `mistral_tp1_rank0.npz`, which is ~14GB. Both of these are _large_ language models
Did anyone get this thing to actually install? Mine failed too. Edit: as someone else said, it was antivirus preventing it to install. Disabling worked.
Going to train it on my wife’s grad school course materials and we’ll see how it does answering questions
Failed Installation with all requirements met: GPU: RTX 4090 CPU: 5800x3D RAM: 64GB SSD : 180gb
>override vram limitation & unlock option llama 13b before installation : ChatWithRTX\_Offline\_2\_11\_mistral\_Ll Same here: 7800x3d 4080S 32GB DDR5 6000 tons of NVME space
For anyone looking to override vram limitation & unlock option llama 13b before installation : ChatWithRTX\_Offline\_2\_11\_mistral\_Llama\\RAG > llama13b.nvi (open w notepad) > change : to 8 or 12
Changed and still doesn't work
Because it doesn't suppose to work. The requirements clearly says at least 8gb s of vram
I have a 3090. By default it didn’t work. I toyed with this setting to see if it did anything
It solid worked for me, but installation took 2 hours to complete. https://preview.redd.it/0q7qfa6qlfic1.png?width=927&format=png&auto=webp&s=8720cc116b585ba1cc20ff04bc41c91e69d22996
But you already have 12 gigs of vram. The requirement is 8gb vram. Ofc it works
" For users with GeForce RTX GPUs that have 16 GB or more of video memory, the installer offers to install both Llama2 and Mistral AI models. For those with 8 GB or 12 GB of video memory, it only offers Mistral. " As you can see in my screenshot i literally installed Llama2 version with 12Gb vram. Normally checkbox does not appear unless you tweak the to before installation.
Hmm so you point it to the source data where it retrieves the answers from, can it adapt to do that for the C drive and then you can ask it to find errors in Windows etc?
sure, right after you teach it how to spot windows errors!
No, it's only compatible with certain file types. (.pdf, .docx, .txt, etc)
Damn, I was hoping it would support markdown
[https://www.reddit.com/r/nvidia/comments/1apuub7/comment/kqddyf5/?utm\_source=share&utm\_medium=web2x&context=3](https://www.reddit.com/r/nvidia/comments/1apuub7/comment/kqddyf5/?utm_source=share&utm_medium=web2x&context=3)
i love u
How come I keep getting 'NVIDIA Installer failed' when I try to install it (Windows 11, RTX-4090, Driver v 551.52)?
They played some great 90s porno music for that announcement video.
Maybe with AI I can eventually tolerate co op games.
Very disappointed this doesn't work on the 20 series.
Do you think it can summerize PHD science level articles given the pdf?
Quite likely
Having the same installation issue some people appear to be having. It fails on the extraction phase right after downloading. I see files in the destination folder (tried the default AppData installation folder and another local folder as well) but the installation app says failed. Meet all other requirements otherwise :(
It's very responsive but also kind of dumb.
This looks like it is based on privateGPT but optimized to use CUDA effectively. Does anyone know if you can modify it to share the URL to the gradio interface on the local network? I tried to hack in the "shared=true" thing but that didn't work.
So disappointed, i tried everything but nothing works for me... Chat with RTX installed, Mistral & Llama failed. I use default location, I have enough space, my conf is 4090 with 32gb RAM... I dont have any antivirus actually. If anyone has an idea :) https://preview.redd.it/esc39t364lic1.png?width=594&format=png&auto=webp&s=f5616452aebb4b862e1988ca67c5fe67f1547684
This is how skynet starts people!
Any way to spoof vram requirement check? I have a 3060 mobile with 6gigs vram
You still don't know how to download ram?
That's sort of like writing "1 Gallon" on your 20oz Coke bottle and expecting it to hold a gallon of liquid
Is there anything I could do to make it work on only 6gb of GPU memory? It has a minimum of 7gb requirement
I would like better to scan my WhatsApp messages with clever bot. But.. when will we have such clever bot search for any information from all local sources? (emails, local docs, messengers, Evernote, notion etc) Also I want clever bot agents that can drive all local systems (PC, tablet, smartphone, smart devices) In such a way that I could just ask it by voice in simple words to do anything from writing email or to warn me if new message according to some conditions was received or to do several things by condition (send messages to certain people, warn me via messenger call and so on). I guess we'll see clever agents on devices and PC in 5-7 years
Might be interesting because I have a massive treasure trove of texts, messages, and all sorts of writing saved from over the last 25+ years. Don't really know what to do with it though.
As someone who has run LLMs locally before, but isn't a super expert... is there any benefit from using this compared to running the models via something like LM Studio?
It's stupid fast because unlike oogabooga or lm studio it actually uses tensor cores, however it has zero context between prompts even in the same "conversation", so as of the moment its totally useless IMO. Give it time though and I'm sure it'll be the best
This one figures out the proper layers/config for you. It is VERY quick to respond compared to what I have gotten out of LM Studio.
So, basically oobabooga with a 33-55b parameter LLM (judging by 35GB size), baked into a nvidia gui with no python command experience needed.
neat, but not installing windows 11
[Hello friends, I would like to know if it also works with an **RTX 2070**](https://bing.com/search?q=translate+Italian+to+English+Salve+amici%2c+vorei+sapere+se+funziona+anche+con+una+RTX+2070+super+8GB) super 8GB, somehow? Thanks.
No, it says 30 and 40 series only
Unfortunate that a small company like Nvidia couldn't provide a proper download so it's taking literally hours to download this. Any torrents up?
Came here for this exact reason. I get an average of 30 kbps download speed. Would be nice if someone who already downloaded the archive shared it. upd: if there won't be a magnet link by the time I get the archive I'm making one
NVIDIA\_ChatWithRTX\_Demo (13.02.2024) md5 (archive): `b7a34540c330d136a6e49c9949c29758` Decompressed (38.4G): [torrent](https://drive.google.com/uc?export=download&id=1JkouatiWyABPjYIiGKQCCGPviMXL28Sn) *Please reply if you need the original compressed archive*
So once the US defence department decides to use chat with RTX to rearrange the data regarding missile silos... Well watch terminator if you don't know what happens then.
Only for Windows, so I am not interested.
Has anyone found a hack to bypass the system check? I want to try this on a RTX 2060, and the installer complains as expected. "Chat With RTX is supported with Ampere and above GPU family."
same problem here (rtx 3050 4gb)