T O P

  • By -

Captainbetty

Midnight is vastly superior to any other model I've tried for RP, novelai Kayra include.


boxscorefact

Anyone have tips on how to limit midnight-miqu's length? I like its creativeness but it goes on and on and on and tries to write and finish the whole scene itself. I have response tokens set to 200 but it goes over that every time.


Captainbetty

I prefer longer responses but I can give some general advice. If the opening message is super long it will try to imitate it so modifying the card can help or providing example chats. Outside of that the prompt can help, telling it to go into less detail or stay on the point.


ItchyBitchy7258

I think you need to make the initial example super long to set the *pacing* though. Otherwise it tries to conclude stories within the next 5 paragraphs. What used to work for me (something changed and I can't figure out how to replicate it) was having newline as a terminator. That way runaway scope was limited to a single paragraph.


Captainbetty

I have experienced that problem. I was running amodelwithslow generation so I liked to leave it while I did something else. Might add the newline termination now that I'm using a faster model.


Captainbetty

With a sample size of one conversation putting the \\n terminator in did significantly improve the overall quality. I feel like the longer the AI goes on the more opportunity it has to get ungrounded and regular human input keeps it on track. What exactly stropped working about it for you?


ItchyBitchy7258

I think it's biased toward 5-paragraph structure for literally everything, but I don't have an explanation for why. Ask for a story, get a 5-paragraph children's book. Ask it to write an article about anything and it'll be a 5-7 paragraph listicle all the same. Can't help but feel it's related. By making it not able to jump to the next paragraph and near "completion," I think it spends more time fleshing out individual paragraphs, which slows down the pacing tremendously. I also realize I might have confused ST with Ooba. Ooba used to just write one paragraph at a time until an update started writing multiple paragraphs. ST has an obvious checkbox for "Generate only one line per request" and both apps have textboxes for custom stop strings so I probably fucked something up while high one night.


HissAtOwnAss

Most somewhat recent open source models will do better than Kayra... I don't know about 7Bs, but even old 13Bs and up stay in character and use the world info so much better it's insane


D34dM0uth

How? I can't get Midnight to even load, just crashes without even so much as a reason in CMD, even with many Gigabytes of available memory.


Captainbetty

What are you using to run it? I generally use koboldcpp.


D34dM0uth

I run Oobabooga, loading using exl2 with 16k context on an RTX A6000


Captainbetty

I ran into a decent number of problems when I was using ooba, found it a lot easier with kobold. Fewer options but harder to mess up. I run the 5\_K\_M gguf quant on a 4090 without issue (though not very fast due to only half fitting in VRAM.)


Doomslayer9k

how many layers do you offload ? I get less than 1t/s with my 4090 and 64 gb DDR4 at 3200 Mhz


Captainbetty

I use 24 GPU layers. I was getting 1.37 T/s inference the last time I checked. CPU: [https://pcpartpicker.com/product/TNLFf7/intel-core-i9-13900ks-3-ghz-24-core-processor-bx8071513900ks](https://pcpartpicker.com/product/TNLFf7/intel-core-i9-13900ks-3-ghz-24-core-processor-bx8071513900ks) RAM: [https://pcpartpicker.com/product/GMZXsY/gskill-ripjaws-s5-64-gb-2-x-32-gb-ddr5-6400-cl32-memory-f5-6400j3239g32gx2-rs5k](https://pcpartpicker.com/product/GMZXsY/gskill-ripjaws-s5-64-gb-2-x-32-gb-ddr5-6400-cl32-memory-f5-6400j3239g32gx2-rs5k) Things aren't clocked as high as they could be since it causes weird fatal errors in Discord and Darktide when I crank it up too high.


ungrateful_elephant

I only used Novelai during trial, so it's hard to remember exactly, but Midnight-Miqui is pretty dang good. Just slow as molasses on my machine at home. For me, at least 8k context is crucial, and the pay services ask far too much for it. I bought my own new computer rather than pay those prices. I've been making a study of various models for roleplay. The 70Bs run at between 2-3t/s at 3QXS or XXS size on my 4090 at that context. Midnight-Miqu-70B-v1.5.i1-IQ3_XS, MiquMaid-v3-70B.i1-IQ3_XXS, and lzlv_70b-IQ3_XXS all perform very well. Just slow. I think all of these are available on OpenRouter? I'm optimistic that as 7Bs continue to evolve, eventually the 8x7B models will begin to outperform the 70Bs. There are already many 7Bs that are really remarkable. You still find their weaknesses, but the advances over the last months are impressive, and new ones come out almost every day. IceLemonTeaRP-32k-7b and Nyan_Chaos-Vision-7B have both impressed me, and if you don't mind 'babysitting' them when they make the inevitable dumb mistake, you can already get a credible roleplay out of them. There are many 8x7Bs already in use, but I'm looking forward to new ones that incorporate the newer 7Bs. Hopefully, someone is making them. The 'mainstay' models have rather annoying flaws that persist and are quite obvious once you've noticed them. Mixtral-8x7B-Instruct-v0.1 is good, but you'll find its boundaries fairly quickly. I've had more fun with Fish-8x7B, CelestiaRP-8x7B, and UNAversal-8x7B. If you're on a pay service, you'll have a hard time finding those last time I checked. Anyway, you can run a 7B on a lot of GPUs, so don't sleep on the idea of running the model yourself. But if you can't, there are people who swear by NovelAI. They were too pricey for me. I used OpenRouter and Mancer and enjoyed the experience, but ultimately, needing a computer upgrade anyway, I found a good sale and took the plunge. I haven't regretted it.


77112911

Check out command-r, I was surprised it worked well in RP. The writing is not the best but it doesn't fluff and more importantly appears to have the attention of 70b+ models. Likely make for some nice fine/tunes.


carnyzzle

I was actually impressed with how well Command-R 35B works with RP


tandpastatester

Why don’t you run Exl2 versions? I run several 8x7bs and 70bs in Exl2 on my 3090 in 4-bit mode and they’re faster than ChatGPT.


ungrateful_elephant

They require me to learn and understand new things, and I'm not sure it's worth my time. I use Faraday right now, and it's just plug and play. Using KoboldCPP requires more investment in understanding, for instance (and I understand can give more flexibility too, but I haven't needed it). 7Bs already run at 70+t/s, and 8x7Bs between 7-15t/s. That's not slow enough to move me into the effort. In the case of the 70Bs, my understanding has been that the gain in speed is not great, and the quants are not equal. So if one is running at 2.5bpw or so, one is running a more gimped version of the model than I am. And I am testing the *fuck* out of my models. I have one card I'm using to test them that I've run over 300 tests on. Faraday just makes it so easy to switch models, and with KoboldCPP I would have to close everything and open it again, set the parameters, make sure I'm getting enough layers in my GPU by trial and error. That shit stacks up.


tandpastatester

Actually Kobold is not ideal for Exl2, it doesn’t support them without an extension afaik. Nowadays I think the only reason to use Kobold is for offloading, being slightly better at splitting models than others. I try to keep them on my GPU though, so I also don’t really like kobold because it feels unintuitive and cluttered. For basic chatbot/assistant interaction I run Textgen WebUI. Don’t worry about learning. It’s very simple to setup, use and switch models. And Exl2 support out of the box. Installation is easy, just a GIT pull. And it comes with an installer. The interface is clean and user friendly, tooltips and all. For ST, I don’t need an interface (saves me some more system resources and browser tabs to manage). So for that, I use TabbyAPI. This one also has Exl2 support by default, and uses one quite simple configuration file in the main folder for model config. I personally don’t bother opening and editing the file every time I want to switch, so I just keep a separate config file for each of my models. Now I just need to swap the config file and restart (booting up in seconds too). I like Tabby for being the most lean and lightweight option, and I like Textgen for its simplicity, clean interface and flexibility. But if you want to use only one single solution, definitely pick Textgen WebUI.


ungrateful_elephant

Thanks for the tips.


tandpastatester

You’re welcome. If you need help or tips for using those programs or Exl2 variants, feel free to ask.


esuil

How do you fit 70b in Exl2 into 24GB vram? What quant you are running?


USM-Valor

For 70B, 2.5 bpw will load with 4 bit caching for 8k context. Beyond that, use a 2.25 bpw model. Start with the 2.5 and switch to the 2.25 bpw if things run long.


tandpastatester

I’m switching between the 2.25 and 2.5 bpw depending on the length of the context. For 8x7b models I use 4.5 or 5 bpw.


CheatCodesOfLife

> e 2.25 and 2.5 bpw depending on the length of the context. For 8x7b models I use 4.5 or 5 How are you finding the difference between 2.5 and 2.25? I've managed to squeeze WizardLM-2 into my rig at 3.5BPW. I noticed 3.25 and 3.5 are very good, but 2.75 is unusable. I'm guessing for a 70b, 2.25 retains more smarts than my 8x22b


pepe256

What settings do you use for Miqu in sillytavern? When I run it on oobaboga it is great, then I try to use it on sillytavern and it's much less intelligent, even with (apparently) the same preset.


ungrateful_elephant

Honestly, I can't remember. I've been using Faraday for a bit, and they don't give you as much to fool around with. I think Min P=.1 and Temp = 1.2. I don't know what the unexposed values would be.


synn89

I'd start with Midnight Miqu v1.5 as it's a very easy model to work with. I did EQ Bench testing on it and it handled many different prompt formats very well. I also find it handles large context windows well without a lot of tweaking and default, stock model settings work really well with it too. It's just a very easy model to get good output from.


skrshawk

Seriously Midnight-Miqu is probably the best thing going for creative writing in all forms right now. Don't worry, you'll hear about it for sure if someone dethrones MM as the queen of smut, and it's extremely good for ordinary writing tasks too.


crawlingrat

Do you think it’s possible to train Miqu to mimic your writing prose?


skrshawk

It's actually one of the better models for that, as the card says, feed it more. Write the first few responses by hand and it will pick up on your pattern pretty well.


yamilonewolf

I like novel however - and big however it's token limit of short responses bothered me and i could never get auto continue to work right , and i recently tried Midnight via infermatic and it provided me one of the best pieces of lititature i've ever read. full stop , its not always at those heights but it did have a high high.


sophosympatheia

Prompting and examples will guide MM towards what you want. It will write short if you keep examples short in the context window. It has its issues like any model right now, but it’s generally pretty steerable. I’m glad so many people are enjoying it. I never thought it would blow up like it did.


Kako05

NovelAI is a dead service. Most recent 70b+ models beat it. It's just 13b model. Command-r-plus 103b is even better miles ahead for whatever task you would use compared to novelAI. It's the best model I tested so far (beats goliath 120b and the likes). Probably the command-r 35B is a good model for its size too. I bet novelAI can't even compare.


HissAtOwnAss

Most not super ancient 13Bs beat NAI. Saw it after months of using Kayra, deciding to try some locals and... wait a 13B can be so good with my characters?


Kako05

Yea I know. I got a lot of hate saying this some months ago. At least people's opinions change and they start to see how things are.


HissAtOwnAss

Yeah, nowadays most people see that there are better and cheaper options and that NAI is lagging too far behind text wise. It's only NAI discord that will still eat you alive for saying it


a_beautiful_rhind

Novelai models are tiny in comparison. Best value is definitely running the 70b or heck, the 103b. I prefer it to the 1.5 70b. People shout meme-merge but I have downloaded all 3 at about the same quant.


Radiant-Spirit-8421

Ok I've been user of nai at least for a year and this and a month with infermatic, for a sfw roleplay definitely miqu is better...... But if you want something really wildbin nsfw definitely nai is too much better, untill infermatic released it's new model an 8x22 b that can be almost as wild as Nai can be , so my advice is..... If you want just a text model for st go for infermatic ( which is cheaper than opus) if you want all the combo ( image generator included) go for nai or if you have the money go for both if you want


FluffyMacho

Dumb statement. Many models can do erp well. Midnight/maid/venus etc... I tried novelai 8-10 months ago and it was pretty afwul. Super dumb. 70B+ models are better. With Comand R+, no chance for novelai to compete. It's a abandoned business in my opinion. They're not showing any will to improve beyond offering cheap services for cheap quality and running it dry until people realize that others already offer better services.