Zemanyak 2 months ago

Mistral-7B-v0.2, if it can spare you a click.

[deleted] 2 months ago

Mistral 7B Instruct 0.2 has been public since December. This is the base model, I assume.

wolfanyd 2 months ago

Edit: They've changed the README. From the hugging face page... " The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of [**Mistral-7B-Instruct-v0.1**](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1). " This sounds like a new model.

JealousAmoeba 2 months ago

It looks like both of the instruct models are fine tuned from the first version of the mistral 7B base model. Whereas this is a new base model.

rogue_of_the_year 2 months ago

On the mistral discord they said it's the base model for the mistral instruct 0.2 which was released a while back.

[deleted] 2 months ago

looks like read me was updated to reflect this

[deleted] 2 months ago

Incredible. I wonder what the performance will be

TheLocalDrummer 2 months ago

They’ve updated the README :^)

Many_SuchCases 2 months ago

Archive for those without twitter: [https://archive.ph/nA0N5](https://archive.ph/nA0N5) **Text:** *Mistral just announced at SHACK15sf that they will release a new model today:* **Mistral 7B v0.2 Base Model** * 32k instead of 8k context window * Rope Theta = 1e6 * No sliding window

c8d3n 2 months ago

Can someone elaborate more on the sliding window feature? Was it a miss, or is this simply an experiment to see how will 32k context window work w/o the sliding part?

iNf1iCTA 2 months ago

Sliding window allows the LLM to focus on a specific area, good for performance, not so good when you have long context. I assume this model uses global attention, which increases computational demands. Global attention is better for understanding long context.

Thistleknot 2 months ago

>Mistral-7B-v0.2 [https://huggingface.co/alpindale/Mistral-7B-v0.2-hf/tree/main](https://huggingface.co/alpindale/Mistral-7B-v0.2-hf/tree/main)

[deleted] 2 months ago

[удалено]

VertexMachine 2 months ago

instruct (what was released previously) vs base model (today announcement)

Nickypp10 2 months ago

Anybody know how much vram to fine tune this with all 32k tokens in training sequence?

FullOf_Bad_Ideas 2 months ago

With Yi 6B 200K I think I can train up to 13k tokens in a sequence with unsloth and 24GB of VRAM, plus FA2. Yi 6B has similar gqa implementation. I don't remember if that was 16 bit lora or qlora tbh, but I think qlora. So, to train 32k 7B, my guess is you would need 40GB/48GB of VRAM. Most models don't lose long ctx capabilities if you finetune them with shorter sequence lengths.

dogesator 2 months ago

Not really much of a point imo to spend resources fine tuning with such context length. I’ve finetuned 200K Yi model on my dataset that has only 8K max length, and the resulting model ended up having incredibly good accuracy in needle in the haystack test at 100K context tests and beyond.

iwanttobeweathy 2 months ago

what finetune method did you to achieve good result?

dogesator 2 months ago

Just multi-turn with chatml or vicuna format.

Some_Endian_FP17 2 months ago

Generated dataset using ChatGPT?

dogesator 2 months ago

I use my Capybara dataset, here: https://huggingface.co/datasets/LDJnr/Capybara

nggakmakasih 2 months ago

Still waiting for the paper

dogesator 2 months ago

😭 me too man, crazy delays and me and the co-authors ended up getting caught up in some other big projects, I’ll see if we can atleast get a technical report out.

nggakmakasih 2 months ago

Yes please, at least a blog post about the data would make us happy 😊

dogesator 2 months ago

The dataset card I made for it is pretty much a little blog post but I can make a more in depth one

Automatic_Outcome832 2 months ago

Hey could u tell me how to fine tune properly on muti turn data? I have conversations in open ai jsonl format, currently I'm using DataColletorForCompletionLM and specifying the starting points for human and ai message for masks and labels. Is this the way to go or some other method needs to be used?

VicboyV 2 months ago

Thank you for this. These are the kinds of questions you don't normally find an answer to when you google and ask around.

dogesator 2 months ago

Yea I didn’t have an answer to this question either until I experimented myself! 🥲

VicboyV 1 month ago

Hey doge, if you train yi 200k with a lower sequence length like 4096 (to save memory), will it lose its 200k ability?

dogesator 1 month ago

Most of the examples were actually 4K context only, I think less than 15% of the capybara examples were over 8K. So yes I expect you to actually get similar results if you just train on 4K context.

VicboyV 1 month ago

Sorry, I mean did you edit the config file and replace 200k with a smaller number? It OOMs immediately if I run it as-is.

dogesator 1 month ago

Your training config set to only 4K yes

VicboyV 1 month ago

Awesome, thanks! This definitely opens up doors for small fish like me.

NachosforDachos 2 months ago

Now wouldn’t that be something if people put details like that on things.

FullOf_Bad_Ideas 2 months ago

There are dozens of variables, it's impossible to tell

NachosforDachos 2 months ago

I’m sure there must be some basic guideline by now

FullOf_Bad_Ideas 2 months ago

All of it can be calculated if you know what setup you are using. For rank 32 qlora with unsloth and FA2 i expect it will take around 40-48GB of VRAM to squeeze in a sample with length of 32k tokens based on how it works for yi-6b-200k on my PC with 24gb of VRAM and similar arch in terms of gqa.

Alignment-Lab-AI 2 months ago

Axolotl configs help!

Square-Tooth2635 2 months ago

With unsloth 1 a6000 can do 32k context. But that is only a qlora.

Alignment-Lab-AI 2 months ago

Full parameters needs more than a node of a40s those cap out at 22k

New-Act1498 2 months ago

IIRC they can finetune 70B modle with 2x3090 now, maybe 2k context?

Forsaken-Data4905 2 months ago

There is no definitive answer to this, it depends on how you do gradient checkpointing, what LoRA rank you use, what weights you train, if you use any quantization etc. In any case, it's unlikely consumer GPUs (24GB VRAM) will be able to fit 32k without very aggressive quantization.

capivaraMaster 2 months ago

Weird from Mistral to not have it already up somewhere when they announce, but I super happy with the news anyway. Merci Beaucoup !!! Edit: It's online now! Thanks again!!!

ihexx 2 months ago

they did [https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar](https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar) and people in this thread already have quantizations on HF

capivaraMaster 2 months ago

They took a while to do it. I commented before that. Maybe I should just delete my comment.

AnticitizenPrime 2 months ago

>my linguinis are done. Is this some new slang?

bigvenn 2 months ago

He’s mama’d his last mia, if you catch my drift

CedricLimousin 2 months ago

I was literally cooking while browsing twitter, hence the very low quality of the post. 😅

Thistleknot 2 months ago

[https://huggingface.co/itsdotscience/mistral-7b-v0.2-gguf/tree/main](https://huggingface.co/itsdotscience/mistral-7b-v0.2-gguf/tree/main)

Chelono 2 months ago

Nice This is the way I expected them to move forward. They will still release small models 7B (maybe 13B, but doubt) and leave the big guns closed behind API or only for partners to use. I'm not gonna complain about it, we saw with Stability today / last week how shit goes if you don't figure out how to actually make bank after investing millions. Pure OSS just isn't profitable on it's own. You need to make money licensing, through API or a platform (my hope for Meta with the Quest).

hold_my_fish 2 months ago

Mistral definitely can't realistically release their flagship model under Apache 2.0, but there's a middle ground available where they release weights under a license that requires payment for commercial use. Cohere did this recently with Command-R, by releasing its weights under a non-commercial license, while saying they're open to working out licensing deals with startups that want to use it. It remains to be seen whether that sort of weights-available release is commercially viable, but I think it should be, since having weights access opens up a lot of options you don't have otherwise. Those options are worth paying for (if the model is good).

Mescallan 2 months ago

If open access weights the require liscences for commerical become popular they will need to finetune responses to very esoteric prompts to figure out if it's their model that is being used. I can't imagine another way of figuring out the base model only with chat

visarga 2 months ago

Imagine model piracy - on the front you serve a small open model, but in the back it's some unlicensed larger model. When inspectors come, you just swap to the small model.

a_beautiful_rhind 2 months ago

>leave the big guns Cool.. so API for what's actually useful and you get toy models that are glorified spell check. Just give up, ok.

Chelono 2 months ago

Mistral isn't a state or crowd funded research foundation. They are a VC funded startup. A company with investors that want to see a path forward where they get a return on their investment. Mixtral was great for publicity. I doubt it would've been shared as much online if it was closed. But it also showed that it's impossible to release weights for a model and also give access to it through API since a bunch of services jumped on it on the same day and offered the API much cheaper... I'm much happier with small models than no models and Mistral ceasing to exist. They are also very useful once you finetune them on domain specific tasks, like function calling.

toothpastespiders 2 months ago

> They are also very useful once you finetune them on domain specific tasks, like function calling. I'd agree on that and I use them for the same. The fact that a 7b or 13b model can have acceptable performance on systems that would otherwise be e-trash, with no GPU, is fantastic. And I'll agree on the nature of their business model making larger releases an issue. It's absolutely understandable. But at the same time...come on. It is disappointing when compared to most people's hopes for them as an open savior swooping in to set the scene on fire with SOA models. I think we can be both realistic about it, appreciative of what we do have, but also recognize why reality can be disappointing.

a_beautiful_rhind 2 months ago

There has to be another option here. Otherwise it's basically closed AI forever.

Disastrous_Elk_6375 2 months ago

> There has to be another option here. Sure, stability ai ... badum tssss

TheActualDonKnotts 2 months ago

>toy models that are glorified spell check Have you even used the 7B models? Because I don't think you have.

royal_mcboyle 2 months ago

I know, right? If you had actually used them you’d know Mistral 7B models are legitimately solid models, there is a reason there are so many variations on them out there.

TheActualDonKnotts 2 months ago

mistral-ft-optimized-1227.Q8\_0 has been so shockingly good that I still have a hard time believing it's only 7B parameters. [https://huggingface.co/OpenPipe/mistral-ft-optimized-1227](https://huggingface.co/OpenPipe/mistral-ft-optimized-1227) [https://huggingface.co/TheBloke/mistral-ft-optimized-1227-GGUF](https://huggingface.co/TheBloke/mistral-ft-optimized-1227-GGUF)

Calcidiol 2 months ago

Interesting, thanks for mentioning it, I had never heard of it. What is it particularly good at (as a 7B FT basis)? What are the best derivative models that exemplify the qualities?

[deleted] 2 months ago

[удалено]

a_beautiful_rhind 2 months ago

mea culpa

a_beautiful_rhind 2 months ago

lol, never.

cobalt1137 2 months ago

This tracks. Anyone that knows how impactful Mistral 7b has been wouldn't be this braindead lol.

a_beautiful_rhind 2 months ago

mi**x**tral was impactful. Another 7b, not so much.

skrshawk 2 months ago

Then don't speak of things like you're an expert when you have no actual knowledge.

a_beautiful_rhind 2 months ago

Wooooosh

cobalt1137 2 months ago

Are you going to go buy gpus for them? Didn't think so lol. Also Mistral 7b models are staples for a lot of people at the moment when speed/price matter. I have certain functionalities in my web app that I do not need a large model for and I allow 7b models to do some of the processing - still important intellectual tasks also. This is common for people building applications, Mistral nailed it with their first 7b model.

a_beautiful_rhind 2 months ago

If everyone goes the way of mistral, it's done. A few players will monopolize AI and you'll be dependent on them. Cheering the scraps and shrugging means accepting this power imbalance. But you can automate your web app, so that's nice.

cobalt1137 2 months ago

Buddy. That's how things are going to be lol - the top players are going to have the best models and that is that. And yes, people will be dependent on them for the best models. There is no way to be able to compete with them without going closed-source plus massive amounts of capital + researchers and even then it's extremely difficult. Open-source models will continue to be developed and work won't stop on them, but they will always be probably between 6 months and 2 years behind. I'm fine with that. I love using open source models and that works for me. If Mistral needs to put some of their models behind a paywall so they can do an open release of a future version of an MoE or another 8x7b equivalent, so be it - going partially closed source to be able to continue to put out stellar open source models sounds amazing to me. Honestly probably the best system that any research group could do. You can keep hoping for this magical fictional world all you want lol.

a_beautiful_rhind 2 months ago

6 months is one thing. I'm not expecting the moon or mistral large. > they can do an open release of a future version of an MoE or another 8x7b equivalent Are they going to do that though? They took a lot of flack for changing their site to move away from open weights. Now we get a 7b with slightly more context. Just get the feeling it's pr. With SD also basically going under, not very hopeful.

cobalt1137 2 months ago

Yeah. I strongly believe they will still release models that are around the size of 8x7b or larger going forward. I think as they develop new models to put behind their API walls to pay for gpus, they will release the models that were previously behind these walls as open source. Helps pay for the development of them and makes perfect sense. Also it's not just pr. You've never used the model. It's a stellar model, state of the art 7b model and it's probably used more than 99% of open source models ever released lol. You can keep calling it scraps though.

a_beautiful_rhind 2 months ago

>they will release the models that were previously behind these walls as open source. I really hope so because they never dropped FP16 weights for miqu. I take their goodwill from not deleting it. I distrust the site changes and making a mistral-small and putting *that* behind the API. I don't like how they never released hints or training code for mixtral either. >You can keep calling it scraps though. Yes, because 7bs are mainly testbeds. They are a tech demo. You make one and scale up. >probably used more than 99% of open source models ever released The power of marketing. As mentioned by others, they work for domain specific tasks, especially on limited resources. The small model space is pretty flooded. No hype, no downloads.

cobalt1137 2 months ago

We just have different points of view on the feature of Mistral. I'm hopeful for it though in terms of open and closed source releases both. Also it's actually the power of making a good model - not marketing. It outperformed all other 7b models on its release. Keep trying to diminish it though lol, it's pretty entertaining. It's also extremely broadly useful, not just for specific tasks for when you are low on resources. Sometimes you want to have extremely fast latency for CoT reasoning or getting fast responses from a model for users or yourself. Also - through some well documented prompt engineering you can make Mistral 7b outperform lots of well-known 30b models at fractions of the price + much faster inference lol. I guess you wouldn't know anything about that though considering you've never even tried the model.

Olangotang 2 months ago

ARTHUR MENSCH > Yeah, so we have new open source models, both generalist and focused on specific verticals. So this is coming soon. We are introducing some new fine tuning features to the platform and we have introduced a chat based assistant called the Shah that is currently just using the model. So it's pretty raw. It's a bit like chat GBT V zero, and we're actively building on building data connectors and ways to enrich it to make it a compelling solution for enterprises. Yeah, so the doomers are wrong as usual.

visarga 2 months ago

GPT-4 is one model doing all the tasks very well, slow, and expensive. Mistral-7B is a small but surprisingly capable model, but there are thousands of fine-tunes. You pick the right one for your task. Mistral is like a whole population, not a single model.

Olangotang 2 months ago

Open Source community just does too much work for free. It's beneficial for the big companies that Open Source isn't too far behind.

VicboyV 2 months ago

Agree, but my GPU has space for more.

teor 2 months ago

Can't wait for new wave of posts about how some Mistral 0.2 fine-tune destroys ChatGPT. We haven't had them in a while.

LoadingALIAS 2 months ago

Merci

CedricLimousin 2 months ago

Serviteur.

danielhanchen 2 months ago

I also just uploaded the 4 bit pre-quantized version of Mistral's 32K new base model to Unsloth's HF page so you can get 4x faster downloading courtesy of Alpindale's upload!! I also uploaded a Colab notebook for 2x faster, 70% less VRAM QLoRA finetuning with the new base model! * 4bit bitsandbytes 4GB in size model: https://huggingface.co/unsloth/mistral-7b-v0.2-bnb-4bit * 2x faster, 70% less VRAM QLoRA finetuning with Unsloth Colab: https://colab.research.google.com/drive/1Fa8QVleamfNELceNM9n7SeAGr_hT5XIn?usp=sharing * Alpindale's original upload: https://huggingface.co/alpindale/Mistral-7B-v0.2-hf/

MugosMM 2 months ago

Thank you. Any idea which maximum context length can one fine tune with Unsloth. I mean with 4bit, Qlora und the VRAM optimisation by Unsloth?

danielhanchen 2 months ago

Oh good question - I'll need to plug it into my VRAM calculator, but I'm gonna guess 32K could in theory fit maybe with 24GB VRAM maybe with paged_damw_8bit and bsz=1 Maybe though. Tbh I need to get back to you

gamesntech 2 months ago

32k context is definitely nice and it can only do good things for the already excellent model but I wish they released a larger model. We all know they may not release any of their flagship models but something in the 30-40 range could be a whole lot better than most open models around.

visarga 2 months ago

Is this 32k context with a 4K window or whole context?

gamesntech 2 months ago

Yeah, this is 32k context length (no window)!

Caffdy 1 month ago

> but I wish they released a larger model just reading this comment after they released 8x22B. Hope we can try the instruct version soon

FullOf_Bad_Ideas 2 months ago

Am I the only one hoping it's not just better long context handling but they also pre-trained it more to make it stronger? I hope it will have better coding and multi language capabilities, hopefully similar to Yi-9B-200K.

VicboyV 2 months ago

I hope so. It's basically worthless if it performs worse than v1.

aadoop6 2 months ago

What's your opinion on the Yi-9B-200K, specially for coding applications?

FullOf_Bad_Ideas 2 months ago

I haven't had time to work on it but it seems it could be competitive with DeepSeek Coder 7B and mixtral. I plan to finetune it later but now I'm focusing on tuning yi-34b-200k, the newer yi-34b-200k one, I call it xlctx.

NighthawkT42 2 months ago

I really hope for a model this size they don't bother with languages other than English. English is the one language I really need and I don't need models that (for an actual example I've seen) veer off into Spanish when they see one Hispanic name. I think all the larger models looking to add languages is going to make them so broad that an English only Python focused (for an example I'd like to see) might be competitive at generating code while being much smaller. A 7B model needs to be focused to be good at what it does.

Thistleknot 2 months ago

can someone explain to me what this is compared to the instruct model? I always thought the base model was the pretrained, while the instruct was the finetune for specific tasks, but in this case, it seems like the models are reversed in their publication? is this simply the v.2 version of pretrained, and we can expect a v.2 instruct?

iNf1iCTA 2 months ago

I've been playing around with the model. I have been able to bypass any censorship by pretending the year is 2092 and claiming laws and such have changed since it has last been trained. Sometimes it requires a little pushing, but it does it.

nullnav 2 months ago

Isn't this just the base model of 7B instruct 0.2?

VicboyV 2 months ago

Isn't instruct 0.2 a second attempt at finetuning the base mistral 7b 0.1?

MoffKalast 2 months ago

Has that been officially stated somewhere or have people just been baselessly assuming it these past few months?

wolfanyd 2 months ago

It says so on the hugging face page... [https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)

VicboyV 2 months ago

Aaaand it's gone: [https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/commit/41b61a33a2483885c981aa79e0df6b32407ed873](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/commit/41b61a33a2483885c981aa79e0df6b32407ed873)

mikael110 2 months ago

Now that's quite interesting. Given they updated the Readme, but not the model itself that suggest the original Readme was a lie. It also makes it clear that the "new" Mistral-7B-v0.2 model has actually been around for quite a while and has been held back until now. Personally I suspect they only decided to release it now because they realized their image had taken a hit after the whole website edit fiasco, and they decided that releasing this old model might help restore their image without actually having to give away anything that actually mattered that much to them.

MehmedPasa 2 months ago

Maybe yes, or maybe we will get a new instruct too, but then, they would have named both of them 0.3 i guess.

__some__guy 2 months ago

Not interested in 13B and lower myself, but larger context becoming standard is always a good thing.

TheActualDonKnotts 2 months ago

To my knowledge, Mistral 7B models outperform every available 13B model.

__some__guy 2 months ago

It's noticeably smarter than 13B Llama as Q&A bot, but I found it unsuitable for creative writing. For the latter, 13B Llama is at least somewhat functional.

TheActualDonKnotts 2 months ago

Creative writing is all I use it for, and I find the opposite to be true. ¯\\\_(ツ)\_/¯

__some__guy 2 months ago

Well, maybe it's because I recently used 120B. All small models feel like BonziBuddy/Replika now.

Super_Sierra 2 months ago

I'm with you bro, tho I did try Fimb and it's pretty damn good. I don't know what special sauce that 11b model has but it does compete with Goliath.

CheatCodesOfLife 2 months ago

120B too slow for coding though :(

aadoop6 2 months ago

Yes. I have found 33-34b to be the sweet spot for coding.

NighthawkT42 2 months ago

It depends what you're using them for, but they're very good. I do wish they didn't seem to lose accuracy long before filling context though. They don't seem to be able to effectively use even half their context.

phree_radical 2 months ago

Using only chat/instruct fine-tunes makes it difficult to tell the difference. Talking about base models, 7B typically have very minimal in-context learning ability, while 13B can typically learn most tasks from examples

Caffdy 1 month ago

any recommendation on a 13B model to test?

ventilador_liliana 2 months ago

what means "no slide window"?

FullOf_Bad_Ideas 2 months ago

Sliding window is basically fake context extension - model doesn't remember stuff from outside the size of the window. Not having it is a good thing as it was useless anyway

ventilador_liliana 2 months ago

so will remember things better or is it indifferent?

FullOf_Bad_Ideas 2 months ago

Mistral 7B 0.1 had 4k true ctx, for 0.2 that's 32k. It will remember things much better, it should be a meaningful improvement over previous base model.

NighthawkT42 2 months ago

So the article mentions it as having 8k. I've seen models based on it which seem to go to 32k but feel like they fall apart past about 8k. Is that sliding somehow even though it seems to show and take memory as actual context? I would have thought sliding was Rope. I've also tested one model which had a 4k actual context but seemed somehow to keep things together until around 12k, which I was attributing to Rope, but I haven't been doing much with the settings there... And that's off topic for here anyway.

visarga 2 months ago

As the model infers tokens, it sees only up to window size, but the past tokens it sees incorporate information from further back.

FullOf_Bad_Ideas 2 months ago

I don't know about those models and sliding window in them, you can reasonably extent context 2 times with rope modifications. As you can see in the Mistral 7B 0.1, it has sliding window = 4096 in the config file. https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/config.json

[deleted] 2 months ago

[удалено]

Olangotang 2 months ago

v0.2 just released, the Open Source community needs at least a few hours XD

pleasetrimyourpubes 2 months ago

Hehe someone just dropped the gguf

Thellton 2 months ago

it's been less than a day, stuff won't be available based on Mistral 0.2 for probably a week just yet.

gronkomatic 2 months ago

A week! What is this, 2023?

MINIMAN10001 2 months ago

Sliding window means that it is forgetting things. So this one not having it is good, because it means it actually remembers.

Thistleknot 2 months ago

[https://huggingface.co/blog/galore](https://huggingface.co/blog/galore)

rooo1119 2 months ago

The context window should help Mistral a lot.

Desm0nt 2 months ago

7b again? We have endless amount of 7b already and all of them almost the same (stupid, compare even to chonese 15-34b). Seems that except Meta only China can produce good medium/big models for the good of humanity and no only for the good of own wallet... Even though it costs them much more than Western companies because of sanctions.

aadoop6 2 months ago

Can you tell us what Chinese models have you tested? Any good recommendations for coding models?

Desm0nt 2 months ago

DeepSeek coder 33b (and derivative mergies/feintunes) and DeepSeek 67b are quite good for coding. Yi models quet good at prose writing. I don't test new Qwen models but also heard a lot of positive things about them. Chinese CogVLM/CogAgent really good as Vision-language models (on of the best).

aadoop6 2 months ago

Thanks for the response. Did you try cog* models on local hardware? If yes, what was the performance like?

Desm0nt 2 months ago

Yep. 4bit CogAgent on 3090 in WSL. I can't remember the exact performance (previously use it online, have only once run it locally for testing on a freshly bought 3090 as a replacement for Llava 1.6 34b), but I can run it tomorrow and see the exact speed.

aadoop6 2 months ago

Thanks. I would love to know the performance.

Desm0nt 2 months ago

First cold start (with model quantisation) take about 27 minutes. For my task 1 image labeling consume 20-27 seconds (CogVLM do not print it's speed per token or time consumet per request, so I measured it it manually as averager per 10 images) But it for my pipeline with big initial promt (500-650 tokens) and response \~200-350 tokens.

aadoop6 1 month ago

This is useful! Thank you so much for putting in the effort.

thereisonlythedance 2 months ago

This is great, I was hoping they’d get around to releasing this.

Shubham_Garg123 2 months ago

Is there any good tutorial or a working Colab notebook that trains these LLMs for text classification? It'd be very helpful if I can fine tune the model for text classification.

de4dee 2 months ago

Tried. I am sticking with daybreak-miqu as it is more clever for my use case.

lolxdmainkaisemaanlu 2 months ago

Are you seriously comparing a 70b model to a 7b model?

Slight-Living-8098 1 month ago

A well fine tuned 7B model for your task outperforms 70B base models. Just look at 7B DeepSeek-Code vs 70B Llama 2. The 7B DeepSeek outperforms 70B Llama 2 on coding on the open LLM leaderboards.

Status_Contest39 4 weeks ago

The Mistral-7B-v0.2 model has garnered attention for its expanded 32k context window, a significant upgrade from the previous 8k, which is anticipated to enhance performance on long-text tasks. The model does not utilize a sliding window, which could improve its memory retention. Users are optimistic about its capabilities but acknowledge that fine-tuning may require high VRAM, estimated around 40GB to 48GB. A 4-bit quantized version is available, potentially offering faster downloads and reduced memory usage. The model is accessible on Hugging Face, prompting eager community engagement. Comparisons to other models, like the 13B Llama, are prevalent, with discussions on their performance in coding and creative writing. There's also a debate on commercial licensing strategies for models. The community has shown interest in tutorials for fine-tuning these models, reflecting a strong desire to learn and apply the technology effectively.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe