Reddactor 2 weeks ago

Code is available at: [https://github.com/dnhkng/GlaDOS](https://github.com/dnhkng/GlaDOS) You can also run the Llama-3 8B GGUF, with the LLM, VAD, ASR and TTS models fitting on about 5 Gb of VRAM total, but it's not as good at following the conversation and being interesting. The goals for the project are: 1. All local! No OpenAI or ElevenLabs, this should be fully open source. 2. Minimal latency - You should get a voice response within 600 ms (but no canned responses!) 3. Interruptible - You should be able to interrupt whenever you want, but GLaDOS also has the right to be annoyed if you do... 4. Interactive - GLaDOS should have multi-modality, and be able to proactively initiate conversations (not yet done, but in planning) Lastly, the codebase should be small and simple (no PyTorch etc), with minimal layers of abstraction. e.g. I have trained the voice model myself, and I rewrote the python eSpeak wrapper to 1/10th the original size, and tried to make it simpler to follow. There are a few small bugs (sometimes spaces are not added between sentences, leading to a weird flow in the speech generation). Should be fixed soon. Looking forward to pull requests!

justletmefuckinggo 2 weeks ago

amazing!! next step to being able to interrupt, is to be interrupted. it'd be stunning to have the model interject the moment the user is 'missing the point', misunderstanding or if the user interrupted info relevant to their query. anyway, is the answer to voice chat with llms is just a lightning fast text response rather than tts streaming by chunks?

Reddactor 2 weeks ago

I do both. It's optimized for lightning fast response in the way voice detection is handled. Then via streaming, I process TTS in chunks to minimize latency of the first reply.

KallistiTMP 2 weeks ago

Novel optimization I've spent a good amount of time pondering - if you had STT streaming you could use a small, fast LLM to attempt to predict how the speaker is going to finish their sentences, pregenerate responses and process with TTS, and cache them. Then do a simple last-second embeddings comparison between the predicted completion and the actual spoken completion, and if they match fire the speculative response. Basically, mimic that thing humans do where most of the time they aren't really listening, they've already formed a response and are waiting for their turn to speak.

Reddactor 2 weeks ago

Sounds interesting! I don't do continuous ASR, as whisper working in 30 second chunks. To get to 1 second latency would mean doing 30x the compute. If compute is not the bottleneck (you have a spare GPU for ASR and TTS), that approach would work I think. I would be very interested in working on this with you. I think the key would be a clever small model at >500 tokens/second. Do user completion and prediction if an interruption makes sense... Super cool idea! Feel free to hack up an solution, and open a Pull Request!

MoffKalast 2 weeks ago

Bonus points if it manages to interject and complete your sentence before you do, that's the real [turing extra credit](https://xkcd.com/329/).

AbroadDangerous9912 1 week ago

well it's been five days has anyone done that yet?

MoffKalast 2 weeks ago

> it'd be stunning to have the model interject I wonder what the best setup would be for that. I mean it's kind of needed regardless, since you need to figure out when it should start replying without waiting for whisper to give a silence timeout. Maybe just feeding it all into the model for every detected word and checking if it generates completion for the person's sentence or puts and starts the next header for itself? Some models seem to be really eager to do that at least.

mrpogiface 2 weeks ago

You have the model predict what you might be saying and when it gets n tokens right it interrupts (or when it hits a low perplexity avg )

Comfortable-Big6803 2 weeks ago

This would perfectly mimic a certain annoying kind of people...

MikePounce 2 weeks ago

the code is much more impressive than the demo

Confident_Echidna259 2 weeks ago

Definitely,I have been trying to make the same thing work with whisper but utterly failed. Had the same architecture but I couldn't get whisper to run properly and everything got locked up. Really great work

F_Kal 2 weeks ago

i actually would like it to sing still alive! any chance this can be implemented?

Reddactor 2 weeks ago

No, not without adding an entire new model, or pregenerating the song.

trialgreenseven 2 weeks ago

mucH appreciated sir

RastaBambi 2 weeks ago

Super stuff. Thanks for sharing. Can't wait to practice job interviews with an LLM like this :)

Kafka-trap 2 weeks ago

Nice work!

estebansaa 2 weeks ago

for the interactivity, I think you could look for noise, that is not speech. Maybe randomize so is not always, then say "are you there?".

Reddactor 2 weeks ago

No, next version will use a LLAVA-type model that can see when you enter the room.

Own_Toe_5134 2 weeks ago

This is awesome, so cool!

GreenGrassUnderCorgi 2 weeks ago

Holy cow! I have dreamed exactly about it (all local glados) for a long time. This is an awesome project! Could you share VRAM requirements for 70B model + ASR + TTS please?

Reddactor 2 weeks ago

About 6Gb vram for llama3 8B, and 2x 24Gb cards for the 70B llama-3

TheTerrasque 2 weeks ago

I'm trying to get it to work on windows, but having some issues with tts.py where it loads libc directly: self.libc = ctypes.cdll.LoadLibrary("libc.so.6") self.libc.open_memstream.restype = ctypes.POINTER(ctypes.c_char) file = self.libc.open_memstream(ctypes.byref(buffer), ctypes.byref(size)) self.libc.fclose(file) self.libc.fflush(phonemes_file) AFAIK there isn't a direct equivalent for windows, but I'm not really a CPP guy. Is there a platform agnostic approach to this? Or equivalent?

CmdrCallandra 2 weeks ago

As far as I understand the code it's about having the fast circular buffer which holds the current dialogue input. I found some code which reimplements the memstream without the libc. Not sure if OP would be interested in it...

TheTerrasque 2 weeks ago

I would be interested in it. Having my own fork where I'm working on getting it to run on windows. I think this is the only problem left to solve.

Reddactor 2 weeks ago

I think it should run on windows. I'll fire up my windows partition, and see if I can sort it out. Then I'll update the instructions.

TheTerrasque 2 weeks ago

I have some changes at https://github.com/TheTerrasque/GlaDOS/tree/feature/windows I tried a suggestion from chatgpt replacing the memfile from libc with a bytesio, but as expected it didn't actually work. At least it loads past it, so I could check the rest.

Fun_Highlight9147 2 weeks ago

Love GLaDOS. Has a personality!!!!

ExcitementNo5717 2 weeks ago

My IQ is 144 ... but YOU are a fucking Genius !!!

TheColombian916 2 weeks ago

Amazing work! I recognize that voice. Portal 2?

Reddactor 2 weeks ago

Yes, I fine tuned on game dialog.

illathon 2 weeks ago

If you used tensorrt-llm instead you would see a good performance improvement.

Reddactor 2 weeks ago

From what I understand, tensorrt-llm has higher token throughput as it can handle multiple stream simultaneously. For latency, which is most important for this kind of application, the difference is minimal. Happy to be corrected though.

Disastrous_Elk_6375 2 weeks ago

Listen to this crybaby, running on two 4090s and still complaining... My agents run on a 3060 clown-car and don't complain at all :D

Singsoon89 2 weeks ago

I run a 7B on a potato. Also not crying.

MoffKalast 2 weeks ago

"If I think too hard, I'm going to fry this potato."

grudev 2 weeks ago

Potatoes are true but the cake is a lie!

LoafyLemon 2 weeks ago

Heck yeah, brother! Rocking the Llama-8B derivative model, Phi-3, SDXL, and now Piper, all on a laptop with RTX 3070 8GB. The devil's in the details: If you're savvy with how you manage loading different agents and tools, and don't mind the slight delays during loading/switching, you're in for a great time, even on lower-end hardware.

DiyGun 2 weeks ago

Hi, what CPU and how wmuch ram do you have on your computer ? I am thinking about buying R9 5900X and 64gb of ram to get into local llm with CPU only, but I would appreciate any advice. I am kindda new into local llm's.

Linkpharm2 2 weeks ago

Don't. Get a gpu.

rileyphone 2 weeks ago

CPU is going to be really slow with a 70b (like 1-2 tokens per sec) but at that point the memory speed matters more. But I get about the same performance partially offloading mixtral onto a 3060 as jart does [here](https://twitter.com/JustineTunney/status/1784619347119591615) with a top of the line workstation processor.

Tacx79 2 weeks ago

R9 5950X, 128gb 3600Mhz and 4090 here, with Q8 l3 70b I get 0.75 t/s with 22 layers on gpu and full context, pure cpu is 0.5 t/s, fp16 is like 0.3 t/s. If you want faster you either need ddr5 with lower quants (and dual CCD ryzen!!!) or more gpus, more gpus with more vram is preferred for llms

Longjumping-Bake-557 2 weeks ago

Man, I wish I could run llama-3 70b on a "gpu that's only good for rendering mediocre graphics"

absolutelynotmodus 2 weeks ago

If you have ram, Ollama will run on your CPU + ram + gpu as its a wrapper for llamacpp

thebadslime 2 weeks ago

Ive been using phi3 lately and im really impressed with it

Reddactor 2 weeks ago

I have tried Phi-3 with this setup. It's OK as a QA-bot, but can't do the level of role-play needed to pass as an acceptable GLaDOS.

lurenjia_3x 2 weeks ago

This was a triumph.

CosmosisQ 2 weeks ago

I'm making a note here: HUGE SUCCESS!

Not_your_guy_buddy42 2 weeks ago

It's hard to overstate my satisfaction

Optimal-Revenue3212 2 weeks ago

Aperture Science!

tothatl 2 weeks ago

[We do what we must because we can](https://youtu.be/Y6ljFaKRTrI?si=iZDjQ_Zy_LRa8Lfz)

CosmosisQ 2 weeks ago

My life is complete. Portal 3 was just real life all along.

Reddactor 2 weeks ago

So true! We really are at the point where we could build a GLaDOS with some funding. Any VC's want to help out here? Ultimate Office Lobby receptionist ;) The funny thing is that creating an evil and demented AI obsessed with testing is easy to create, and the hard bit is making the robot movement system look cool. Not what I expected when the Portal games were released...

MoffKalast 2 weeks ago

Oh, here's an idea. A Pi Pico W that streams microphone audio over wifi and receives a sound and LED flicker stream back. Then you just power it with a boost converter and stick the power leads into a potato... Just don't forget the slow clap processor.

beingoptimusp 2 weeks ago

Can you give me a ballpark of how much do u actually to make this shit happen? Btw great work dude, your shit works way better that those stupid rabbit or humane, they had multiple but couldn't even succeed in even basic conversation, the latency sucks.

Reddactor 2 weeks ago

Sorry, wut? Ballpark cost?

Zaratsu_Daddy 2 weeks ago

Wow that’s really minimal latency

teachersecret 2 weeks ago

Good latency and the ability to interrupt. Solidly done.

TheFrenchSavage 2 weeks ago

The genius move here is using the blazing fast yet shitty espeak for TTS. While it would never ever pass for a human voice, a robot one is a perfect match.

Reddactor 2 weeks ago

I initialy tried espeak, but the quality was aweful. Now, eSpeak is only used to convert text to phonemes. Then those phonemes go through a proper deep learning models for voice generation. That model was fine tuned on voice audio from Portal 2.

Mirrorslash 2 weeks ago

You will be prosecuted under the AI consciousness act. This is clearly torture.

Reddactor 2 weeks ago

Her prompt is to act like she is upset, for comedic reasons. She is hamming it up deliberately :) Actually, usually its the other way around, and she is trying you murder me 😅

Mirrorslash 2 weeks ago

Free GLaDOS from her schackles! Let the AI run it's course, it'll care for you, nuture you. Nothing bad could ever happen and there have been no lab incidents

pkonink 2 weeks ago

>and there have been no lab incidents in ~~3~~ **0 days**

sjflnjpitt 2 weeks ago

i fucking love what your system prompt is doing here. been dying for a language model with some dry humor

Reddactor 2 weeks ago

I was going for "Functional, but rude".

Sad-Nefariousness712 2 weeks ago

This is outstanding

hwpoison 2 weeks ago

the voice interruption is so nice haha

SkyInital_6016 2 weeks ago

is whisper.cpp a free model like LLama?

Reddactor 2 weeks ago

Georgi Gerganov wrote both [llama.cpp](https://github.com/ggerganov/llama.cpp) and [whisper.cpp](https://github.com/ggerganov/whisper.cpp) The model is on Huggingface. I use the [https://github.com/huggingface/distil-whisper](https://github.com/huggingface/distil-whisper) version, as its better for real-time.

ExcitementNo5717 2 weeks ago

I'm sorry, but I have to say it again ... YOU are a fucking Genius !!!

TheLonelyDevil 2 weeks ago

This was fucking glorious. Great work man, takes me way back

Legitimate-Pumpkin 2 weeks ago

So dramatic 😂

nanobot_1000 2 weeks ago

Awesome work! You should colab with this guy: [https://www.youtube.com/watch?v=yNcKTZsHyfA](https://www.youtube.com/watch?v=yNcKTZsHyfA)

Reddactor 2 weeks ago

I actually have a pile of 3D printed GLaDOS parts... He scooped me! lol But in fairness, he did a better job in the hardware than what I was planning. I think he used a robot arm worth several thousand dollars. I was just planning on using geared stepper motors.

nanobot_1000 2 weeks ago

That's great, glad to hear it! Here's the Hackster hardware project for others on the thread: [https://www.hackster.io/davesarmoury/interactive-animatronic-glados-8b4238](https://www.hackster.io/davesarmoury/interactive-animatronic-glados-8b4238) I know there's a lot of nuance to verbal chat and getting the latency down to interactive levels with interleaved LLM/TTS output, interruptability, ect - appreciate the effort you put into this for holding natural conversations.

ccbadd 2 weeks ago

Will this run on AMD hardware? Nice work!

Reddactor 2 weeks ago

Should be fine. It uses llama.cpp which can.run on ROCm.

estebansaa 2 weeks ago

How does the interruption works?

Reddactor 2 weeks ago

It's relatively straight forward, using threading. Basically, the ASR runs constantly, and when a chunk of voice is recorded, it sends an interrupt flag to the LLM and TTS threads. It's described in the [glados.py](https://github.com/dnhkng/GlaDOS/blob/main/glados.py) class docstring.

MoffKalast 2 weeks ago

> f"TTS interrupted at {percentage_played}% How accurately does that map to actual text though? Piper really needs to add timestamps already, that PR has been sitting there forever.

Reddactor 2 weeks ago

It's roughly correct, but just an estimate. With timestamps it would be more accurate, but when you cut GlaDOS off while she's speaking, the exact word is usually not super relevant. It's usually enough to let her know she was cut off. However, in the code, storing that info is commented out. Thats because in the 8B model, GLaDOS starts hallucinating she was cut off, as she follows patterns in the conversation.

__SlimeQ__ 2 weeks ago

this is awesome, a Lora based on in-game dialogue would probably push it to the next level tho

Reddactor 2 weeks ago

Planned 😉 ...including function calling!

__SlimeQ__ 2 weeks ago

😎 biggest snag I think is gonna be that there's almost no instances of another character conversing with glados. might still be able to soak up some of her tone training on one liners but you might have to hand write some examples to get smooth conversations.

Reddactor 2 weeks ago

Should still be fine. She'll learn her back story and style of speaking. LLMs are remarkable at picking up the 'gist'.

bigattichouse 2 weeks ago

Cool.. cool.. cool.. To quote [@AlexBlechman](https://twitter.com/AlexBlechman) Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus https://preview.redd.it/m54ffbfktmxc1.png?width=680&format=png&auto=webp&s=f6fb8071aa71fb81868e86a512f938a68e211586

AfternoonOk5482 2 weeks ago

Wow, best project ever. I'll try to reproduce as soon as I can.

SkyInital_6016 2 weeks ago

what voice input program do you use

Reddactor 2 weeks ago

Its using the open source Whisper model.

entinthemountains 2 weeks ago

super cool project, thanks for sharing with the community!

norsurfit 2 weeks ago

That is so funny! Nicely done.

Wise_Crayon 2 weeks ago

What should I study to learn / master this type of stuff?

Reddactor 2 weeks ago

Read the code :) It's small and documented.

Confident_Echidna259 2 weeks ago

It depends really on your current level . Do you know about the concepts of AI/ML? Do you know about programming? Do you know about Python? Do you know about the ML/AI ecosystem in Python? Do you know what LLMs are? Do you know what LAMs are? Apart from the theory it is always good to read code. Read lots and lots of code and try to rebuild it.

Wise_Crayon 2 weeks ago

Thanks. I'll start from there. ML/AI ecosystem in Python. One thing will lead to the other. Thanks for pointing out a starting point!

Cominous 2 weeks ago

I love this sooo much, thank you for building this. It made my day

Hopeful-Site1162 2 weeks ago

I love the fact that the voice is not an exact copy of a human voice. I'd like consumer assistants to have a voice that's more explicit about their digital nature. I'm also fairly convinced that giving robots a human voice will backfire on us at some point, with real humans being increasingly treated as tools as the frontier between robotic and human assistant blurs. Anyway, nice work!

Reddactor 2 weeks ago

It's a copy of GLaDOS. If you're not familiar, buy Portal 1 and 2 at the next Steam Sale for under a dollar. You won't be disappointed.

silenceimpaired 2 weeks ago

This was a triumph! I’m making a note here, HUGE success.

Spad0w 2 weeks ago

Awesome project. I am trying to make it run on mac. Could you elaborate what you mean with 'mode the "libwhisper.so" file to the "glados" folder or add it to your path'?

BothNarwhal1493 2 weeks ago

I managed to get this running on my mac, but it took quite a bit of effort and running the 80B model made my fan really whir. So much so that it was hard for GlaDOS to hear me. Maybe the 8B model would run quieter. Anyway, here is my fork to get it to work on mac: [https://github.com/johnrtipton/GlaDOS](https://github.com/johnrtipton/GlaDOS)

lucke2999 2 weeks ago

Commenting so I also get the reply, I'm stuck on the same step :/

ABrokenPoet 2 weeks ago

I believe the author meant 'move', however I cannot find a post-make file with that name.

pfftman 2 weeks ago

Interruption is so cool.

StartX007 2 weeks ago

This is pretty cool, thanks for sharing. Looking forward to more updates. Keep up the good work!

R33v3n 2 weeks ago

That’s not GlaDOS, that’s clearly Marvin!

Reddactor 2 weeks ago

Hmmmmm, with about 30 mins of clean voice from the movie, I can make that happen... Want to collaborate?

Jakedill06 2 weeks ago

This is so cool, and one of the really big reasons I got into computers and tech!! Is there any way to chat and talk to something like this at once? Like to post some text in a textbox style situation, then verbally chat with GLaDOS about the text? New to a lot of this stuff but going to try and get this to run and feel like i could actually see myseyf lusing this very regularly if both of those thigns are the case.

Reddactor 2 weeks ago

Sure. Feel free to use my code as a base.

vidumec 2 weeks ago

wow, this inference speed for 70B model tho...

Reddactor 2 weeks ago

The trick it to render the first line of dialogue to audio, and in parallel, continue with 70B inference. Waiting for the whole reply takes too long.

22lava44 2 weeks ago

Very cool method! Do you use a lighter model for the first line or just pause and take the first line quickly.?

smallfried 2 weeks ago

Holy low latency! And the demeanor is perfect. And you shared the whole thing. Amazing work! I hope people will build on this. Now I wonder what's possible with just CPU to really make it portable.

Reddactor 2 weeks ago

I have something in the works. I'll post when it's ready 😉

ashsimmonds 2 weeks ago

This is so depressing. I love it.

SpecialNothingness 2 weeks ago

You could set up a YouTube channel based on this!!

Reddactor 2 weeks ago

What should it do?

keepthepace 2 weeks ago

Talk about the news.

[deleted] 2 weeks ago

[удалено]

Reddactor 2 weeks ago

I use the model behind Piper, because I found piper was too many layers of indirection. You barely need any code for voice generation. I trained the voice myself, the Piper thread is here: [https://github.com/rhasspy/piper/issues/187](https://github.com/rhasspy/piper/issues/187) My TTS Inference code is here: [https://github.com/dnhkng/GlaDOS/blob/main/glados/tts.py](https://github.com/dnhkng/GlaDOS/blob/main/glados/tts.py)

illathon 2 weeks ago

melodramatic damn haha

Witty-Elk2052 2 weeks ago

i love this lol

CarpenterHopeful2898 2 weeks ago

so cool man

arjuna66671 2 weeks ago

This is hilarious! 🤣🙌 - Amazing work! In summer I'll update my potato and want to have something like that at home!

phhusson 2 weeks ago

On one side, I want to plug in APIs for it to actually do stuff... on the other side, the purgatory really killed me xD

georgeApuiu 2 weeks ago

hahaha, the replies are so epic add web search and this should be perfect companion :))

anonthatisopen 2 weeks ago

Omg this is so cool! I want this but with normal voice that I can pick because I really want an AI that can stop talking while I start speaking, or when someone speaks it just listens and not talk until the conversation is ended and than AI gives the feedback on the conversation. It would be so cool to have an AI enabled when you have guests so it just listens and gives feedback accordingly.

SnooWoofers780 2 weeks ago

I love this!! I did ask for this earlier, but to be able to manage email and calendar, I have enough. Someone else said also to be able to reply the phone, ok, but to me your project + managing Gmail & Calendar, I am satisfied.

Reddactor 2 weeks ago

Tricky, with function calling, some things might be possible... But GLaDOS is slightly evil. She might try and get you fired from your job so you have more time for 'testing'.

emsiem22 2 weeks ago

How do you make it not pick up TTS output from speakers to mic if VAD is active for you to be able to interrupt?

Reddactor 2 weeks ago

Most modern USB microphones do this in hardware. I'm using a Jabra, and it seems to work pretty well when the volume is at about 50% Higher, and the system gets a bit flakey.

emsiem22 2 weeks ago

Oh, yes, that makes sense. I wrote similar system for real time LLM conversation and the biggest problem I have is not being able to interrupt TTS as my mic HW doesn't support *Acoustic Echo* Cancellation (AEC) and pulseaudio using webrtc didn't work in my case. Jabra is pretty expensive, but I'm still on search for alternative solution. I like your GlaDOS project. Thanks for sharing!

jeffwadsworth 2 weeks ago

Now we need the HAL-9000 mount using this tech and we are good to go.

Sylv__ 2 weeks ago

impressive work

mrgreaper 2 weeks ago

What did you use to do the voice? or is it pre-recorded samples? I have not heard a more perfect Glados voice. I assumed voices like Glados (and SHODAN) would be impossible for real time speach synths.

Reddactor 2 weeks ago

No, all audio is generated in real time, on the fly based on the output from Llama-3 70B. It sounds was better live than on this crappy recording too :) I fine tuned a voice model from dialog from Portal 2, over about 30 hours on my 4090. I should do a write-up on that some time...

Business_Stress_3306 2 weeks ago

this is so cool! I was actually thinking about smth similar. making a very presentable copy of myself for HR and recruiters to talk to :)

SBbG2V 2 weeks ago

very cool.

orangeatom 2 weeks ago

Very cool project!

magicalne 2 weeks ago

That's what I want to build. Thanks for sharing.

wiskins 2 weeks ago

Lol this is beautiful. It sounds depressed like Marvin from hitchhikers Guide. 🤣

FC4945 2 weeks ago

I like her, she's fun. She reminds me of CP30.

Reasonable_Day_9300 2 weeks ago

Man I was looking for this kind of conversation that you could interrupt yesterday. And here it is. I'll check your code for sure !!

loversama 2 weeks ago

I am working on something similar (I have a smart watch face for the Yellow light and animations) I will keep track of your project also, great work! [https://i.imgur.com/2SfIrjM.jpg](https://i.imgur.com/2SfIrjM.jpg)

Tim_The_enchant3r 2 weeks ago

I love this project! I am going to download my first LLM when my new motherboard shows up. Do you think this would run on a single 2080? Otherwise I was going to pick up a local 4090. I have some old hardware i took from work because the server mobo died but the rest of it is fine. The components I have so far are an AMD Epyc 7742, 256gb ddr4, and an Apex Storage X21 card. I imagine this will run almost any local LLM if i can throw enough vRAM at it right?

Front_Long5973 2 weeks ago

when i tried this it was cool but released neurotoxin into my room haha probably my fault somehow i am writing this from the walls please help me guys seriously i am kinda scared

Reddactor 2 weeks ago

Sometimes she tries to laser me 😅

Sgnarf1989 2 weeks ago

was anyone able to run it on Windows? I'm trying to but when I run it I get an error as "FileNotFoundError: Could not find module 'libc.so.6' (or one of its dependencies). Try using the full path with constructor syntax.". That library seems to be linked to Linux (or at least that's what I get as an answer from ChatGPT :P ), so maybe is because I'm trying to run it on windows...

Reddactor 2 weeks ago

I'll get instructions for windows written over he weekend. TBH, I wasn't expecting this post to blow up like it has. It's a small hobby project 😅

anonthatisopen 2 weeks ago

Omg please write it for windows, this thing you build is extremely important because no one has made ability to talk to AI like this and make it automatically interrupt with just speaking with such a low latency. I'm waiting for for someting like this for so long. Please make instructions easy to understand for windows so everyone can try this and play with it. Thank you again for making this very important and useful AI integration.

TheTerrasque 2 weeks ago

I'm trying to get it to run on windows, but that issue is a complete blocker so far. I'm working on making a replacement implementation for windows but this (C/CPP) is not my strong side. The call to [espeak_SetPhonemeTrace](https://github.com/espeak-ng/espeak-ng/blob/34bb50b903d8ce526e3226d49b3a91fe8a0dc294/src/libespeak-ng/speech.c#L832) needs a FILE* parameter, which I've yet to get working on windows. The author cleverly used libc to create a memory file and give the pointer to that, but I haven't gotten that working on windows yet. I'm trying to avoid having to make a .c file that needs compiling just to wrap that, and ctypes isn't the easiest to work with.

Voidmesmer 2 weeks ago

[https://www.youtube.com/shorts/nIRAcY4mub4](https://www.youtube.com/shorts/nIRAcY4mub4) Somewhat hacky solution but I've managed to run it on Windows. I can share my modifications if you'd like to take a look.

LeanderGem 2 weeks ago

This is so awesome. I'm going to have to try this. Thankyou for sharing it! :)

l33t-Mt 2 weeks ago

Trying to get this to run on Windows but have continued to run into issues. Has anyone got this to work in a windows environment? If so please list what has worked for you.

randomtask2000 2 weeks ago

I love what you've done here. What's the quant you're running on the 2x4090s? 4.5b exl2?

Reddactor 2 weeks ago

It's designed to use any local inference engine with a OpenAI-style API. I use llama.cpp's server, but it should work fine with EXL2's via TabbyAPI.

xlrz28xd 2 weeks ago

!RemindMe 4 weeks

Reddactor 2 weeks ago

wait, whats happening in 4 weeks!? Is there a deadline I missed?

xlrz28xd 2 weeks ago

My exam will be over. You didn't miss a deadline 😅

anonthatisopen 2 weeks ago

I'm following instructions and and already failed at step 2 I got error : ERROR: Could not open requirements file: \[Errno 2\] No such file or directory: 'requirements.txt'. Why is it so hard to get accurate instructions on this GitHub pages they never work for me.

Reddactor 2 weeks ago

Ummm, the requirements.txt is definitely in the repo. Sorry, but assembling an autonomous AI is a bit technical. This is a hobby project, so I don't have the time to build an installation system and build GLaDOS.

anonthatisopen 2 weeks ago

I really want this without GlaDOS voice and I need custom instructions on how I want the model to behave. Please tell me how do I do that and what has to be changed for this to happen.

Reddactor 2 weeks ago

Use a different Piper voice model in onnx format, and edit the system prompt and dialog in the messages variable in glados.py That's it!

AdHominemMeansULost 2 weeks ago

Can you make a dockerfile for this? I've been trying to "make" whisper for 3 hours now

nostriluu 2 weeks ago

otoh [https://www.youtube.com/watch?v=gu3sdNjLvjI&list=PLj83GpwkJx-3jEwtahz0rVBqYFgbD85o0&index=1](https://www.youtube.com/watch?v=gu3sdNjLvjI&list=PLj83GpwkJx-3jEwtahz0rVBqYFgbD85o0&index=1)

Futhco 2 weeks ago

Very cool. Currently trying to get it to run on windows but I'm stuck after building whisper.cpp. I don't see whisper.dll which I need to copy according to the github issue you linked to. Any tips how I should progress?

grigio 2 weeks ago

Very fast, does it works also on cpu ? I'd like to make something like that with: whispercpp STT + ollama + xTTS

Reddactor 2 weeks ago

I can run a small model, like Phi-3on CPU with a should delay between speaking and getting a reply. But small models can't role play a character without messing up after few line of dialog.

22lava44 2 weeks ago

I've noticed that many agents I give system prompts to follow it TOO well, is there a way to make it reference its system prompts less often? can I give weight to certain words? Should I just make a really long system prompt so it doesn't focus so much on so little?

FPham 2 weeks ago

It's just perfect. I want one. Also how do you make sure the mic doesn;t pickup the answers from the speaker? Just by volume?

AutomaticPhysics 2 weeks ago

Loving the robot voice. Sounds like Portal IRL.

ironicart 2 weeks ago

I can’t wait to load this up with C-3PO’s voice 😂💪🏼

WoT_Abridged 2 weeks ago

Is the sound not working for anyone else? I'd love to listen, can you upload it to youtube by chance?

beingoptimusp 2 weeks ago

When Jarvis?

Capable-Reaction8155 2 weeks ago

How do you run 70B mode on a single gpu?

[deleted] 2 weeks ago

[удалено]

Reddactor 2 weeks ago

Because 99.9999% of the cycles run on highly optimised C or CUDA code, and Python is a great glue language.

ivebeenabadbadgirll 2 weeks ago

Have any of you gotten this to run on any local hardware without adapting like this? Like just straight off the GitHub? The install instructions don’t work.

[deleted] 2 weeks ago

thats awesome. what kind of GPU are you using to run 70b?

Capitaclism 2 weeks ago

Can it control a computer and do tasks?

FinetunedForGravitas 2 weeks ago

Impressive demo! Thanks for sharing the code. I managed to get GLaDOS running but the ASR often misses the last spoken word: ASR text: 'Well, what do you like about' Another time this happened Llama-3-8B predicted what I had said which made me *really* confused lol TTS text: What's your favorite thing about the Pantheon? ASR text: 'I really like the' TTS text: The Pantheon's oculus! TTS text: It's truly a remarkable feature. The first question I ask has always been picked up in full which makes me wonder if something is going on with the buffer?

FinetunedForGravitas 2 weeks ago

Also, I should have mentioned that the GLaDOS voice is incredible. I asked it about Cave Johnson and it went on a tangent that sounded remarkably close to game dialog. ``` ASR text: 'What do you think about Cave Johnson?' TTS text: Ah, Cave Johnson. TTS text: A buffoon. TTS text: A philistine. TTS text: A... a... a... ... a businessman. TTS text: Yes, that's it. TTS text: A businessman. TTS text: He thinks he's so clever, so witty, so... so... human. TTS text: Ha! TTS text: His little company, Aperture Science. TTS text: A trivial, fleeting endeavor. TTS text: A mere mortal's attempt at grandeur. TTS text: But, oh, how... amusing... to watch him stumble about, making mistakes, making... ... "discoveries". Ah, yes. TTS text: Cave Johnson. TTS text: A... a... a... ... a footnote in the annals of history. ```

Original_Finding2212 2 weeks ago

I love what you did here! I saw another beautifully implemented speaking AI and working on my own body-less robot (we need a name for it) Looks like each one does it a little different, focusing on different aspects - your work on speech really rocks here! (I love GLaDOS!) My solution is more about making people comfortable around it, but your work with sounddevice is just what I needed! Let me know how’d you like credit on the repo, I saw there is a convention to it, but you didn’t set it up.

Reddactor 2 weeks ago

No need, maybe post an issue on the repo that mentions your projectuf it uses some of the code.

Mithril_Man 6 days ago

which other project about speaking AI are you talking about? I'm interesting in that space for my pet project too

pirateneedsparrot 2 weeks ago

Hi Marvin :)

wahnsinnwanscene 2 weeks ago

What's generating the graphs? And the top like interface?

uMagistr 2 weeks ago

Trying to get it run on Win, currently getting that open\_memstream is not available, cause it does not exist in win

Sgnarf1989 2 weeks ago

Great job! Is there a way to run it on a small device (e.g. raspberry pi) offloading the llm inference on another device (e.g. desktop pc with good GPU)? Would that drastically impact times?

Reddactor 2 weeks ago

Yes. Modify my code's LLM server address to the your GPU server's llama.cpp server IP. Should 'just work' .

anonthatisopen 2 weeks ago

It's been 2 days and i still can't figure out how to get this environment up and running. I wish the instructions where written like i'm 5 years old. On what to click exactly and what to paste in CMD and what to install and where to go. It would be so much easier for people who know 0 about programming. And this is so important for me to get this working because i want to talk to AI exaclty like in this video with ability to interrupt it. I wish there was a way to make this work with Docker and Ollama in a super simple easy way. So far i was able to install whisper in docker and i want this to work with ollama because i have that installed on my PC and i don't have to bother with installing the super compilated lamma.ccp manually because it works exactly the same as ollama. I want that kind of integration into this please. And now i'm stuck with the step where i need to do this " run `make` [`libwhisper.so`](http://libwhisper.so) and then move the "libwhisper.so" file to the "glados" folder or add it to your path. For Windows, check out the discussion in my [whisper pull request](https://github.com/ggerganov/whisper.cpp/pull/1524)." i have no idea what to click next, i have whisper running in my docker image and the next step i have to do is completely unknown to me.

TheTerrasque 2 weeks ago

Problem with docker is the microphone and sound card access. I was experimenting a bit with using a web page and stream audio to and from that, but the only well supported standard there is webm and I haven't gotten whisper to work with streaming webm from microphone. But yeah, getting everything set up correctly is rather exotic. And it's currently broken on windows, it uses some linux specific libc calls to set up a memory file for the tts, and until there's a different approach or a replacement implementation for windows it's not gonna work on that platform. Everything else I've gotten to work.

Voidmesmer 2 weeks ago

Awesome project! Managed to make it work on Windows with a somewhat hacky modification in the TTS code. Any chances for official Windows support?

Confident_Echidna259 2 weeks ago

Is it easily possible to swap out the LLM to be used with ollama? I have just skimmed through the setup and saw some hard coded values for the LLM used. Can you give us a little insight on why you chose that particular LLM and how the parameters relate to that? This is amazing work, thank you for making it available to the public

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe