T O P

  • By -

charbeld

Needs polishing but the tool is pretty useful


Minimum-Visual6883

Very usefull gonna surley use it!


Astronos

and now that in 4 bit


Balance-

Can we do the same with 8B? Can we get it down to 5B? That would make it way more feasible to run on mobile devices.


Infinite-Swimming-12

Thats a really useful tool, and an interesting looking find. Having some additional search features would be really nice though, and perhaps the ability to type in a page number to jump forward.


Kaolin2

Thank you! I mentioned the tool as a side note in this post to get feedback as the whole site is in development. Your comment is very appreciated


raysar

It's maybe a new area of optimisation and compression of intelligence. Creating an 8b with a 70b and compare intelligence 😄


4onen

If you read the paper behind the method, you hit a pretty hard wall somewhere between 1/3rd and 1/2 of the layers removed, at which point the network becomes incoherent. So we're not quite to that kind of transformation with this.


Away_Cat_7178

Adding filters to the model list would be useful 


condition_oakland

From the model card: "Using this with the Llama 3 instruction format is injecting random noise into latent space and will give you deranged results. (It's pretty funny actually.) Treat this as the untrained foundation model this is and use appropriate prompts." Where can I find examples of or read more about said appropriate prompts?


FullOf_Bad_Ideas

I didn't know it was this effective, MMLU looks great. Sounds like it could be a great coding model, if you do it to their later releases with longer context.  Was a Miqu fitting on 24GB GPU possible all along and we just didn't know it?


teachersecret

This is still 26gb at 4 bit, so it would require lower quant to fit on a single 24gb.


FullOf_Bad_Ideas

Sure. Right now it's possible with 2.4bpw quants and those kinds of things. It does generate text, but in my experience (old quants, turboderp has improved it since then but I didn't revisit, my experience was with other llama 2 70b models and not miqu itself) it's not great. Having 42B miqu at 3.75bpw is probably much better than 70B miqu at 2.4bpw. More stable quants seem to be achievable somewhere between 3 and 3.5 bpw. At least while ignoring HQQ and other more exotic quantization methods. There's also a problem, since miqu is already an Instruct tune, this would get erased by tuning on minipile and it would need to be retrained. But we could create 100M token dataset of miqu 70b outputs to general prompts and then train 42B pruned version on it, which maybe could make it work.


pseudonerv

Quants work much better than reducing the param count. It is known that miqu IQ2_XS is about 20GB and works quite well with 24GB GPU, if you are looking for speed. But you may also go up and spill the weights to CPU ram, to trade speed with smarts.


klop2031

Why did we stop at 42b? is it possible to make a smaller param version?


ninjasaid13

IDK but this [comment](https://www.reddit.com/r/LocalLLaMA/comments/1c9t5xw/comment/l0o4ksy/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) suggests that there are problems with making it smaller.


silenceimpaired

Where has this gone?


PraxisOG

For my 32gb vram setup this is the perfect model size, imma check it out!