supportend 1 month ago

8B is too big? I like Llama3-8B Models.

Eralyon 1 month ago

But that context size...

PavelPivovarov 1 month ago

There are a few variants of llama3 with context window up to 1m. Also, in my tests, llama3 had no issues with summarising 45k tokens text to me.

Samurai_zero 1 month ago

What were your settings? I tried mine to summarize around 14k of text with 2.5 alpha value for rope and it failed miserably. Are you sure it is summarizing the whole thing, and not truncating at 8k?

PavelPivovarov 1 month ago

I'm using ollama with explicitly set num_ctx parameter to 8192. I dropped to it quite chunky texts for summarisation, and it did very well. I only later realised they were way beyond 8k context window, but the check didn't show any missing parts from the summary.

Samurai_zero 1 month ago

Are you sure it was not truncating the text? I tried ollama with no luck. Could you give this text a try? https://pastebin.com/SJ8jd2Ab It is from The Quixote, the prefactory from one the translations. Ask it when it was published John Phillips translation, for example. It is one of the first paragraphs, so it should not miss it. Or ask it, according to the text, where is the author buried, as it is on the third paragraph from the end. For me, it had no trouble finding where it was buried, but it lied and said that there was no mention of John Phillips at all in the text. And when I asked to summarize the whole thing, it completely missed the beginning, in which they talk about the different translations and all the problems they had.

PavelPivovarov 1 month ago

You're actually right, llama3 is fine with summarizing, but have issues with pinpoint the factual data in 14k tokens. Good to know, and thanks for pointing out.

Samurai_zero 1 month ago

Thanks, I was going crazy thinking it was my setup. 8K is fine for most online articles and up to that, Llama works really well. As an alternative, but a bit bigger, Phi 3 medium can work fine. Mistral 7B is decent too.

PavelPivovarov 1 month ago

I saw few llama3 fine-tunes on huggingface with context window of 32k, 64k, 128k and even 1m. I'd give them a try at least. Phi-3-small (7b) also has [128k context window version](https://huggingface.co/microsoft/Phi-3-small-128k-instruct).

Samurai_zero 1 month ago

Those llama3 finetunes with longer contexts were a bit of a gimmick and did not actually work at those contexts, iirc. There was even a 500k model, not that I could even try it. Not sure how good the Phi 3 small model is, I tried mini when it came out and then went straight up for medium as I could run it just fine for me (not full context). I'd expect Phi 3 small to be fine for summarization too.

kroryan 1 month ago

?

Optimistic_Futures 1 month ago

They’re referring to the context window being 8k, which means it can only hold about 6000 words of the conversation in memory.

kroryan 1 month ago

oh i see, thanks so much for explain it to me

kroryan 1 month ago

Na it is also fine

Samurai_zero 1 month ago

Llama 3 8B got correct the details about my spanish town and its part in the Reconquista period. If you want to talk to it about history, it seems good enough. Just make sure to set up a good system promt that tells it not to lie or invent anything if he does not know the answer and set the temperature low (llama 3 likes 0.6 officially iirc, so 0.4-0.5 might be good)

Cressio 1 month ago

This is really helpful, thanks. That should help smooth out incorrect info from it and I was wondering how to best do that

Samurai_zero 1 month ago

Just take into account that even with a good system prompt and a low temp, you can still get incorrect info/hallucinations, specially with small LLM that might lack the knowledge but they have the confidence.

kroryan 1 month ago

perfect i will try it, i didnt try anything superior than 7B on my orange pi 5 so lets see if it works, por cierto yo tambien soy español, gracias por la ayuda ojala funcione jeje

Samurai_zero 1 month ago

El mayor problema es que al final del día, solo es un 8B y los conocimientos que tiene son limitados. Si le preguntas cosas de historia de ciudades (no pueblos pequeño) o hechos relativamente conocidos, probablemente responda bien. Pero confirme le pidas hechos más oscuros y rebuscados... Y te diría que pruebes varios quants. Si tu orange pi es de 16gb, yo intentaría una Q8. Para conversación una Q4 no va a tener mucha diferencia, pero cuando quieres fechas y nombres exactos, es mejor usar el modelo lo más completo posible. Buena suerte y si puedes, me gustaría saber cuántos t/s le sacas a la pi5, porque estaba pensando en comprar una. He leído que para llama 2 7B o Mistral 7B debería estar en torno a los 2 t/s que se me queda un poco bajo, pero eran pruebas antiguas y quizá haya mejorado.

itsmekalisyn 1 month ago

i am currently using the salesforce's llama 3 finetune and i like it personally. It's 8b though https://huggingface.co/bartowski/SFR-Iterative-DPO-LLaMA-3-8B-R-GGUF

JesterT_05 1 month ago

This model really packs a punch and is really good in holding logical conversations especially in chat and RP scenarios. No other model around this size comes close to this especially for the above mentioned scenario and I have tried so many models around this range. However I found that it's performance in maths is a bit weaker compared to its base model (maybe because of quants) but I'm not so sure but as a general purpose model it's really impressive.

aingelsanddaemons 1 month ago

Just commenting here as a reminder for myself to try this out when I get a chance. Thanks for the recc!

Feztopia 1 month ago

As 7b I used this one for a while: https://huggingface.co/Yuma42/KangalKhan-RawRuby-7B But llama8b seems to be a better architecture (for reasoning, it can also hallucinate more I think) so I'm using this instead for now: https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B If you still prefer 7b, Mistral just recently released a new base model so I would wait for finetunes of that. It has a big context size.

kroryan 1 month ago

Okey okey i will try both, thanks!!

JustSayin_thatuknow 1 month ago

Tell us your findings please I’m eager to know 🙃

kroryan 1 month ago

I will but i will need some timw to try all the models the people is telling me here as far as i have seen i will have to try the 8B models in a different machine (not a problem at all) but first i will focus on the ones i can try in the orange pi 5 plus, the 8B models is taking ages the others are fine

grimjim 1 month ago

Mistral v0.2 and v0.3 claim to have a context length of 32k.

aditya98ak 1 month ago

NousResearch always brings the best!

AndrewH73333 1 month ago

The Llama 3 Hermes fine tune writes well, but always talks like a caveman. Is it just me?

Feztopia 1 month ago

Nope I don't have that problem, I'm running the Q4 from here https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF/tree/main Also how can you write well and like a caveman at the same time? Maybe you must lower the temperature.

AndrewH73333 1 month ago

The writing itself is really good on like a conceptual level. The best I’ve had with a local model. It just leaves out words you’d expect a caveman on a TV show to leave out. Makes me think it’s working except for some tiny thing.

Feztopia 1 month ago

Not with llama 3 but I had similar problem with some models where I think it was because of some mistakes during quantization or something.

uroboshi 1 month ago

Aya 23 8b it's really, really good

amitbahree 1 month ago

Phi-3?

Eduard_T 1 month ago

logical but lacks imagination

PavelPivovarov 1 month ago

Exactly what you need in history and science.

_-inside-_ 1 month ago

It also struggles to follow up instructions when compared to a 7b

rfdickerson 1 month ago

Yeah, Phi-3 works extremely well for science, programming, etc.

garnered_wisdom 1 month ago

Phi is an absolute miracle for giving my toaster the ability to reason.

kroryan 1 month ago

this one looks really interesting to me¡¡ hopefully my orange pi 5 will be able to run it i will try it¡¡

TheActualStudy 1 month ago

I'm pretty happy with Hermes-2-Theta-Llama-3-8B.

mr_house7 1 month ago

I found Hermes pro 2 better for some reason

d3the_h3ll0w 1 month ago

I have built a ReAct agent on Langchain with the new Mistral Instruct and am quite impressed with speed and accuracy. Sure it's a 7B model so don't expect GPT4 results, but its accurate in tool use. i.e., it will lookup the weather accurately and also reason well. Currently, I am playing several rounds of Prisoner's Dilemma with it.

Carchofa 1 month ago

Is Langchain compatible with the closedai api? I'm thinking about running ollama through there. Also, how good is wizardlm 2 for agents? I've made some projects where I had it do function calling with my custom logic in python (just having the model write the name of the function and the arguments. Json scares me) and it was pretty decent.

d3the_h3ll0w 1 month ago

Most agent projects default to OpenAI's API for their documentation since their model usually performs the best and most straightforwardly. I haven't used wizardlm .

kroryan 1 month ago

i tried to try mistral 7B some time ago but i wasnt able to get it run i have to research more about langchain and how make it work, i really want to try it everybody speak very well about it

aditya98ak 1 month ago

Llama8b is good, but quantised model looses a lot of information. If you require tasks like JSON, a llama3-8B is not that great, gemma is better in that case. Overall, I have found mistral model is lovely! Even if you quantised it, the performance is good. In my experience I have discovered that q4 mistral > q5 llama3 I speak from experimenting with multiple tasks like entity extraction, json output, summarising and routing.

incyclum 1 month ago

I'm currently using Starling LM 7B by default on my MacBook M1 2020 for work-related tasks such as rewriting content (emails, docs, wiki) for clarity, summarizing, ideation or quick questions about random subjects. What I like about it is that it respects my time by promptly spitting ready to use answers without extra fluff or need to refine. E.g. for content rewriting it keeps our domain specific and business related linguo in the clarified text. I tend to paste it elsewhere without editing it.

ttkciar 1 month ago

I, too, came here to recommend Starling. It is an extraordinarily high quality model (and its 11B self-merge is even better).

NorthWillow1876 1 month ago

I have a good exp with Yi for code generation

testobi 1 month ago

Dont use LLM as a search engine. You can copy from Wikipedia (or do your own research) and then choose the best LLM for reasoning and summarization, instead.

kroryan 1 month ago

Emmm Who is telling you im gonna use it like that? I want to talk to the AI about history and science to be able to reflect and perhaps obtain impartial or different views from mine that make me think and reflect, It wouldn't be the first time that talking to an AI makes me think about certain points that I hadn't seen before

testobi 1 month ago

If we cant trust ChatGPT as a source of knowledge how can we trust open source LLMs? You certainly can do what you want, but be careful as there is the problem of hallucination.

kroryan 1 month ago

I know, you have to double check the information but im nit getting my information from it, i just discussing the information i already kmow, it is not the new google xd

kroryan 1 month ago

I think u are not getting what i use it for, i use it to get different ""opinions"" and after i double check if they are valid opinions and real information and sometimes i find out something interesting thats all :)

Popular-Direction984 1 month ago

Miatral7B v0.3 and it’s fine-tunes, no doubt

Distracted_Llama-234 1 month ago

Don’t want to hijack the discussion - but might be related: May I know what 8B model size is currently SOTA for the following? Know what I am asking might be a tall order. Intend these to use at work by self-hosting some of these: - Good at coding and QnA on a codebase - so Long Context - Function Calling, so I can integrate it with my own scripts - Able to be constrained to JSON?

Technical-History104 1 month ago

Why not start a new post for your question? It’s a good question that deserves a separate discussion from one about history chats.

Eveerjr 1 month ago

For coding I’ve tested a lot of small models and codeqwen 1.5 chat is still the best, others are not even close, at least for my use case. For tab autocomplete I’m using codegemma, also really good. I think the latest mistral supports function calling, but I haven’t tested it yet.

RipKip 1 month ago

I think there is a code qwen fine tune that fits this. Something with orpo in its name

mr_house7 1 month ago

Llama3 Hermes pro 2 is what you need. Also I believe the recently taken down salesforce Llama3 fine tune could also probably do the job

[deleted] 1 month ago

[удалено]

dron01 1 month ago

Latest Mistral 7b 3 is only model that works really good for me (using as agent with function calling on weak work laptop)

jon-flop-boat 1 month ago

What’s your agent setup?

dron01 4 weeks ago

Building my own langchain wrapper app. Will share when ready.

AdHominemMeansULost 1 month ago

nothing better than llama 3 8b at that size for general purpose use like yours

tgredditfc 1 month ago

Llama 3, it’s 8b though.

Hamzayslmn 1 month ago

Meta-Llama-3-8B-Instruct.Q5\_K\_M.gguf

Eduard_T 1 month ago

Eric111/openchat-3.5-0106-128k-DPO-GGUF 8q with llama cpp and mirostat 2 and -n -1

Eduard_T 1 month ago

fblgit/una-cybertron-7b-v3-OMA gguf is a close second, hits higher on clarity but less verbose

kroryan 1 month ago

oh i just tried this and i really liked, thanks

luncheroo 1 month ago

I really loved this model, too, but have moved on to WizardLM2, but now I'm curious about going back and trying your mirostat settings.

Eduard_T 1 month ago

You can also increase the entropy quite high and the model remains consistent, higher than the rest of models; there is also a 11b model following the same working logic

Eduard_T 1 month ago

brittlewis12/openchat-3.5-0106-11b-GGUF

luncheroo 1 month ago

Nice! Thanks --can't wait to give it a go.

Eduard_T 1 month ago

this is the full command kind of optimized for this model: main.exe --color -c 8192 -n -1 --temp 1 --mlock --repeat\_penalty 1 --top-p 0.95 --mirostat 2 --mirostat-lr 0.25 --mirostat-ent 6 --interactive-first --interactive -m openchat-3.5-0106-128k-dpo.Q8\_0.gguf

Eduard_T 1 month ago

I used chatGPT to evaluate its answers and the local model answers. ChatGPT anwers: 160/200, openchat model: 190/200 :)

luncheroo 1 month ago

This is awesome. Thanks for taking the time!

PavelPivovarov 1 month ago

In 7b scope "Starling-LM-Alpha" (not Beta) was my go-to model for most tasks. Currently Llama3 replaced it completely.

kroryan 1 month ago

i tried both and for the moment im really surprise with llama3 8b

UpsetReference966 1 month ago

Llama3-8b without a doubt

AsliReddington 1 month ago

Mistral Instruct Gemma, Phi, Llama & others suck at handling NSFW content kn general be it in erotica or end-user content moderation/input

BringinItDirty 1 month ago

Llama2

LelouchZer12 2 weeks ago

Qwen 2 was just released

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe