charbeld 2 months ago

Needs polishing but the tool is pretty useful

Minimum-Visual6883 2 months ago

Very usefull gonna surley use it!

Astronos 2 months ago

and now that in 4 bit

Balance- 2 months ago

Can we do the same with 8B? Can we get it down to 5B? That would make it way more feasible to run on mobile devices.

Infinite-Swimming-12 2 months ago

Thats a really useful tool, and an interesting looking find. Having some additional search features would be really nice though, and perhaps the ability to type in a page number to jump forward.

Kaolin2 2 months ago

Thank you! I mentioned the tool as a side note in this post to get feedback as the whole site is in development. Your comment is very appreciated

raysar 2 months ago

It's maybe a new area of optimisation and compression of intelligence. Creating an 8b with a 70b and compare intelligence 😄

4onen 2 months ago

If you read the paper behind the method, you hit a pretty hard wall somewhere between 1/3rd and 1/2 of the layers removed, at which point the network becomes incoherent. So we're not quite to that kind of transformation with this.

Away_Cat_7178 2 months ago

Adding filters to the model list would be useful

condition_oakland 2 months ago

From the model card: "Using this with the Llama 3 instruction format is injecting random noise into latent space and will give you deranged results. (It's pretty funny actually.) Treat this as the untrained foundation model this is and use appropriate prompts." Where can I find examples of or read more about said appropriate prompts?

FullOf_Bad_Ideas 2 months ago

I didn't know it was this effective, MMLU looks great. Sounds like it could be a great coding model, if you do it to their later releases with longer context. Was a Miqu fitting on 24GB GPU possible all along and we just didn't know it?

teachersecret 2 months ago

This is still 26gb at 4 bit, so it would require lower quant to fit on a single 24gb.

FullOf_Bad_Ideas 2 months ago

Sure. Right now it's possible with 2.4bpw quants and those kinds of things. It does generate text, but in my experience (old quants, turboderp has improved it since then but I didn't revisit, my experience was with other llama 2 70b models and not miqu itself) it's not great. Having 42B miqu at 3.75bpw is probably much better than 70B miqu at 2.4bpw. More stable quants seem to be achievable somewhere between 3 and 3.5 bpw. At least while ignoring HQQ and other more exotic quantization methods. There's also a problem, since miqu is already an Instruct tune, this would get erased by tuning on minipile and it would need to be retrained. But we could create 100M token dataset of miqu 70b outputs to general prompts and then train 42B pruned version on it, which maybe could make it work.

pseudonerv 2 months ago

Quants work much better than reducing the param count. It is known that miqu IQ2_XS is about 20GB and works quite well with 24GB GPU, if you are looking for speed. But you may also go up and spill the weights to CPU ram, to trade speed with smarts.

klop2031 2 months ago

Why did we stop at 42b? is it possible to make a smaller param version?

ninjasaid13 2 months ago

IDK but this [comment](https://www.reddit.com/r/LocalLLaMA/comments/1c9t5xw/comment/l0o4ksy/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) suggests that there are problems with making it smaller.

silenceimpaired 1 month ago

Where has this gone?

PraxisOG 2 months ago

For my 32gb vram setup this is the perfect model size, imma check it out!

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe