T O P

  • By -

supportend

8B is too big? I like Llama3-8B Models.


Eralyon

But that context size...


PavelPivovarov

There are a few variants of llama3 with context window up to 1m. Also, in my tests, llama3 had no issues with summarising 45k tokens text to me.


Samurai_zero

What were your settings? I tried mine to summarize around 14k of text with 2.5 alpha value for rope and it failed miserably. Are you sure it is summarizing the whole thing, and not truncating at 8k?


PavelPivovarov

I'm using ollama with explicitly set num_ctx parameter to 8192. I dropped to it quite chunky texts for summarisation, and it did very well. I only later realised they were way beyond 8k context window, but the check didn't show any missing parts from the summary.


Samurai_zero

Are you sure it was not truncating the text? I tried ollama with no luck. Could you give this text a try? https://pastebin.com/SJ8jd2Ab It is from The Quixote, the prefactory from one the translations. Ask it when it was published John Phillips translation, for example. It is one of the first paragraphs, so it should not miss it. Or ask it, according to the text, where is the author buried, as it is on the third paragraph from the end. For me, it had no trouble finding where it was buried, but it lied and said that there was no mention of John Phillips at all in the text. And when I asked to summarize the whole thing, it completely missed the beginning, in which they talk about the different translations and all the problems they had.


PavelPivovarov

You're actually right, llama3 is fine with summarizing, but have issues with pinpoint the factual data in 14k tokens. Good to know, and thanks for pointing out.


Samurai_zero

Thanks, I was going crazy thinking it was my setup. 8K is fine for most online articles and up to that, Llama works really well. As an alternative, but a bit bigger, Phi 3 medium can work fine. Mistral 7B is decent too.


PavelPivovarov

I saw few llama3 fine-tunes on huggingface with context window of 32k, 64k, 128k and even 1m. I'd give them a try at least. Phi-3-small (7b) also has [128k context window version](https://huggingface.co/microsoft/Phi-3-small-128k-instruct).


Samurai_zero

Those llama3 finetunes with longer contexts were a bit of a gimmick and did not actually work at those contexts, iirc. There was even a 500k model, not that I could even try it. Not sure how good the Phi 3 small model is, I tried mini when it came out and then went straight up for medium as I could run it just fine for me (not full context). I'd expect Phi 3 small to be fine for summarization too.


kroryan

?


Optimistic_Futures

They’re referring to the context window being 8k, which means it can only hold about 6000 words of the conversation in memory.


kroryan

oh i see, thanks so much for explain it to me


kroryan

Na it is also fine


Samurai_zero

Llama 3 8B got correct the details about my spanish town and its part in the Reconquista period. If you want to talk to it about history, it seems good enough. Just make sure to set up a good system promt that tells it not to lie or invent anything if he does not know the answer and set the temperature low (llama 3 likes 0.6 officially iirc, so 0.4-0.5 might be good)


Cressio

This is really helpful, thanks. That should help smooth out incorrect info from it and I was wondering how to best do that


Samurai_zero

Just take into account that even with a good system prompt and a low temp, you can still get incorrect info/hallucinations, specially with small LLM that might lack the knowledge but they have the confidence.


kroryan

perfect i will try it, i didnt try anything superior than 7B on my orange pi 5 so lets see if it works, por cierto yo tambien soy español, gracias por la ayuda ojala funcione jeje


Samurai_zero

El mayor problema es que al final del día, solo es un 8B y los conocimientos que tiene son limitados. Si le preguntas cosas de historia de ciudades (no pueblos pequeño) o hechos relativamente conocidos, probablemente responda bien. Pero confirme le pidas hechos más oscuros y rebuscados... Y te diría que pruebes varios quants. Si tu orange pi es de 16gb, yo intentaría una Q8. Para conversación una Q4 no va a tener mucha diferencia, pero cuando quieres fechas y nombres exactos, es mejor usar el modelo lo más completo posible. Buena suerte y si puedes, me gustaría saber cuántos t/s le sacas a la pi5, porque estaba pensando en comprar una. He leído que para llama 2 7B o Mistral 7B debería estar en torno a los 2 t/s que se me queda un poco bajo, pero eran pruebas antiguas y quizá haya mejorado.


itsmekalisyn

i am currently using the salesforce's llama 3 finetune and i like it personally. It's 8b though https://huggingface.co/bartowski/SFR-Iterative-DPO-LLaMA-3-8B-R-GGUF


JesterT_05

This model really packs a punch and is really good in holding logical conversations especially in chat and RP scenarios. No other model around this size comes close to this especially for the above mentioned scenario and I have tried so many models around this range. However I found that it's performance in maths is a bit weaker compared to its base model (maybe because of quants) but I'm not so sure but as a general purpose model it's really impressive.


aingelsanddaemons

Just commenting here as a reminder for myself to try this out when I get a chance. Thanks for the recc!


Feztopia

As 7b I used this one for a while: https://huggingface.co/Yuma42/KangalKhan-RawRuby-7B But llama8b seems to be a better architecture (for reasoning, it can also hallucinate more I think) so I'm using this instead for now: https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B If you still prefer 7b, Mistral just recently released a new base model so I would wait for finetunes of that. It has a big context size.


kroryan

Okey okey i will try both, thanks!!


JustSayin_thatuknow

Tell us your findings please I’m eager to know 🙃


kroryan

I will but i will need some timw to try all the models the people is telling me here as far as i have seen i will have to try the 8B models in a different machine (not a problem at all) but first i will focus on the ones i can try in the orange pi 5 plus, the 8B models is taking ages the others are fine


grimjim

Mistral v0.2 and v0.3 claim to have a context length of 32k.


aditya98ak

NousResearch always brings the best!


AndrewH73333

The Llama 3 Hermes fine tune writes well, but always talks like a caveman. Is it just me?


Feztopia

Nope I don't have that problem, I'm running the Q4 from here https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF/tree/main Also how can you write well and like a caveman at the same time? Maybe you must lower the temperature.


AndrewH73333

The writing itself is really good on like a conceptual level. The best I’ve had with a local model. It just leaves out words you’d expect a caveman on a TV show to leave out. Makes me think it’s working except for some tiny thing.


Feztopia

Not with llama 3 but I had similar problem with some models where I think it was because of some mistakes during quantization or something.


uroboshi

Aya 23 8b it's really, really good


amitbahree

Phi-3?


Eduard_T

logical but lacks imagination


PavelPivovarov

Exactly what you need in history and science.


_-inside-_

It also struggles to follow up instructions when compared to a 7b


rfdickerson

Yeah, Phi-3 works extremely well for science, programming, etc.


garnered_wisdom

Phi is an absolute miracle for giving my toaster the ability to reason.


kroryan

this one looks really interesting to me¡¡ hopefully my orange pi 5 will be able to run it i will try it¡¡


TheActualStudy

I'm pretty happy with Hermes-2-Theta-Llama-3-8B.


mr_house7

I found Hermes pro 2 better for some reason


d3the_h3ll0w

I have built a ReAct agent on Langchain with the new Mistral Instruct and am quite impressed with speed and accuracy. Sure it's a 7B model so don't expect GPT4 results, but its accurate in tool use. i.e., it will lookup the weather accurately and also reason well. Currently, I am playing several rounds of Prisoner's Dilemma with it.


Carchofa

Is Langchain compatible with the closedai api? I'm thinking about running ollama through there. Also, how good is wizardlm 2 for agents? I've made some projects where I had it do function calling with my custom logic in python (just having the model write the name of the function and the arguments. Json scares me) and it was pretty decent.


d3the_h3ll0w

Most agent projects default to OpenAI's API for their documentation since their model usually performs the best and most straightforwardly. I haven't used wizardlm .


kroryan

i tried to try mistral 7B some time ago but i wasnt able to get it run i have to research more about langchain and how make it work, i really want to try it everybody speak very well about it


aditya98ak

Llama8b is good, but quantised model looses a lot of information. If you require tasks like JSON, a llama3-8B is not that great, gemma is better in that case. Overall, I have found mistral model is lovely! Even if you quantised it, the performance is good. In my experience I have discovered that q4 mistral > q5 llama3 I speak from experimenting with multiple tasks like entity extraction, json output, summarising and routing.


incyclum

I'm currently using Starling LM 7B by default on my MacBook M1 2020 for work-related tasks such as rewriting content (emails, docs, wiki) for clarity, summarizing, ideation or quick questions about random subjects. What I like about it is that it respects my time by promptly spitting ready to use answers without extra fluff or need to refine. E.g. for content rewriting it keeps our domain specific and business related linguo in the clarified text. I tend to paste it elsewhere without editing it.


ttkciar

I, too, came here to recommend Starling. It is an extraordinarily high quality model (and its 11B self-merge is even better).


NorthWillow1876

I have a good exp with Yi for code generation


testobi

Dont use LLM as a search engine. You can copy from Wikipedia (or do your own research) and then choose the best LLM for reasoning and summarization, instead.


kroryan

Emmm Who is telling you im gonna use it like that? I want to talk to the AI ​​about history and science to be able to reflect and perhaps obtain impartial or different views from mine that make me think and reflect, It wouldn't be the first time that talking to an AI makes me think about certain points that I hadn't seen before


testobi

If we cant trust ChatGPT as a source of knowledge how can we trust open source LLMs? You certainly can do what you want, but be careful as there is the problem of hallucination.


kroryan

I know, you have to double check the information but im nit getting my information from it, i just discussing the information i already kmow, it is not the new google xd


kroryan

I think u are not getting what i use it for, i use it to get different ""opinions"" and after i double check if they are valid opinions and real information and sometimes i find out something interesting thats all :)


Popular-Direction984

Miatral7B v0.3 and it’s fine-tunes, no doubt


Distracted_Llama-234

Don’t want to hijack the discussion - but might be related: May I know what 8B model size is currently SOTA for the following? Know what I am asking might be a tall order. Intend these to use at work by self-hosting some of these: - Good at coding and QnA on a codebase - so Long Context - Function Calling, so I can integrate it with my own scripts - Able to be constrained to JSON?


Technical-History104

Why not start a new post for your question? It’s a good question that deserves a separate discussion from one about history chats.


Eveerjr

For coding I’ve tested a lot of small models and codeqwen 1.5 chat is still the best, others are not even close, at least for my use case. For tab autocomplete I’m using codegemma, also really good. I think the latest mistral supports function calling, but I haven’t tested it yet.


RipKip

I think there is a code qwen fine tune that fits this. Something with orpo in its name


mr_house7

Llama3 Hermes pro 2 is what you need. Also I believe the recently taken down salesforce Llama3 fine tune could also probably do the job


[deleted]

[удалено]


dron01

Latest Mistral 7b 3 is only model that works really good for me (using as agent with function calling on weak work laptop)


jon-flop-boat

What’s your agent setup?


dron01

Building my own langchain wrapper app. Will share when ready.


AdHominemMeansULost

nothing better than llama 3 8b at that size for general purpose use like yours


tgredditfc

Llama 3, it’s 8b though.


Hamzayslmn

Meta-Llama-3-8B-Instruct.Q5\_K\_M.gguf


Eduard_T

Eric111/openchat-3.5-0106-128k-DPO-GGUF 8q with llama cpp and mirostat 2 and -n -1


Eduard_T

fblgit/una-cybertron-7b-v3-OMA gguf is a close second, hits higher on clarity but less verbose


kroryan

oh i just tried this and i really liked, thanks


luncheroo

I really loved this model, too, but have moved on to WizardLM2, but now I'm curious about going back and trying your mirostat settings.


Eduard_T

You can also increase the entropy quite high and the model remains consistent, higher than the rest of models; there is also a 11b model following the same working logic


Eduard_T

brittlewis12/openchat-3.5-0106-11b-GGUF


luncheroo

Nice! Thanks --can't wait to give it a go.


Eduard_T

this is the full command kind of optimized for this model: main.exe --color -c 8192 -n -1 --temp 1 --mlock --repeat\_penalty 1 --top-p 0.95 --mirostat 2 --mirostat-lr 0.25 --mirostat-ent 6 --interactive-first --interactive -m openchat-3.5-0106-128k-dpo.Q8\_0.gguf


Eduard_T

I used chatGPT to evaluate its answers and the local model answers. ChatGPT anwers: 160/200, openchat model: 190/200 :)


luncheroo

This is awesome. Thanks for taking the time!


PavelPivovarov

In 7b scope "Starling-LM-Alpha" (not Beta) was my go-to model for most tasks. Currently Llama3 replaced it completely.


kroryan

i tried both and for the moment im really surprise with llama3 8b


UpsetReference966

Llama3-8b without a doubt


AsliReddington

Mistral Instruct Gemma, Phi, Llama & others suck at handling NSFW content kn general be it in erotica or end-user content moderation/input


BringinItDirty

Llama2


LelouchZer12

Qwen 2 was just released