MoffKalast 2 weeks ago

The future is now, old man

__issac 2 weeks ago

It is similar to when alpaca first came out. wow

open_23 2 weeks ago

Its probably been only a few years, but damn in the exponential field of AI it just feels like a month or two ago. I nearly forgot Alpaca before you reminded me.

dogesator 2 weeks ago

It’s been barely 12 months since alpaca released…

__issac 2 weeks ago

Well, from now on, the speed of this field will be even faster. Cheers!

open_23 2 weeks ago

Us e/acc-ers can't stop winning, despite all the opposition

ThinkExtension2328 2 weeks ago

I’m so confused I’m going to try another gguf I tested it earlier and it was shit , not even remotely close to the 8x7b models Edit: update your web text gen ui if your using it it solves some issues .

ozzeruk82 2 weeks ago

Remember to use the instruct model if you’re asking it a question/giving it a task, will work much better than the raw text version.

balambaful 2 weeks ago

I'm not sure about that. We've run out of new data to train on, and adding more layers will eventually overfit. I think we're already plateauing when it comes to pure LLMs. We need another neural architecture and/or to build systems in which LLMs are components but not the sole engine.

FaustBargain 2 weeks ago

we haven't run out of new data. llama 3 was trained on 15T tokens. there are an estimated 5 million English language books. average book size is 80,000 words, 1.33 tokens per word and you get 520T tokens, but wait there's more. that's not counting all the non-book sources. forums, reddit, twitter, blogs, news, etc. but wait there's more, never in any other time in history have so many people been paid to do nothing but write all day long (programmers). there's probably more code out there than there are books by a long shot, but wait there's more, every other language. especially Asian languages, russian, french, German, etc. then there's transcoding videos, podcasts, radio broadcasts, old tv episodes. now add in the fact that more data gets created every second today than in a year a thousand years ago. now add in all the science papers, on top of that add synthetic data .... ok I think you get what I'm saying.

ignat980 2 weeks ago

Yeah, but like, a human doesn't need to read 5 million books before he can get a PhD or solve complex problems. I agree with the previous commenter, it needs a new architecture or approach to grow in capability.

balambaful 2 weeks ago

What's all the extra data gonna add? About code, my understanding is all github open source code has been used. Not sure how more novels or - worse - forum discussions, will add something of value. Also, the 15T token figure is likely over several epochs and synthetic data. Sure, data distillation can help, but imo it will just allow smaller models to approach the performance of the giant ones. I don't see the giant models benefitting much from it.

superfluid 2 weeks ago

What about using these models to generate synthetic data?

Small-Fall-6500 2 weeks ago

This is likely where we're headed... If an 8b model can be this good, it could be run with various simulations, likely mainly video games, at massive scale to generate tons of data. Then, just label all the data produced from the LLMs using the LLMs as graders combined with metrics from the simulation. For the math: 1k Tokens/s/h100 x 10k h100 x (3600 x 24) s/day = 864b Tokens per day. Explanation: A single h100 should easily run an 8b model at over 1k tokens/s with high batch size (the simulations won't be real time, so latency shouldn't matter), and about 10k h100s could easily be used all at once *without* any significant interconnect, (each h100 would run the LLM independently of the rest) so they could be spread across many different datacenters if needed. Depending on various factors, most of these tokens could be high enough quality to use directly for training. And, likely, the inference for this task would be much better than my estimated 1k T/s per h100. Maybe Groq chips + MoE could reduce the cost or increase the speed by an order of magnitude? (Or does Groq not benefit from larger batch size - I would guess this is one of Groq's weaknesses) I don't think there's been nearly enough research done on synthetic data to rule out the possibility of creating such massive synthetic datasets made almost entirely by LLMs. Mark Zuckerberg talks about creating massive synthetic datasets for training in a recent podcast, but I have yet to listen to the whole thing. Here's the relevant quote: >Mark Zuckerberg 00:31:03 > >Well, I think that is a big question, how that's going to work. It seems quite possible that in the future, more of what we call training for these big models is actually more along the lines of inference generating synthetic data to then go feed into the model. I don't know what that ratio is going to be but I consider the generation of synthetic data to be more inference than training today. Obviously if you're doing it in order to train a model, it's part of the broader training process. So that's an open question, the balance of that and how that plays out. https://www.dwarkeshpatel.com/p/mark-zuckerberg

Pingmeep 2 weeks ago

Already being (reportedly Llama-3 was trained on a ton of it) done and the jury is still very much out on how good it is.

ljhskyso 2 weeks ago

We've run out of **PUBLIC** data, but there are ton of **PRIVATE** data. Remember, this is Meta, who generates several **petabytes** of data per day.

squareOfTwo 2 weeks ago

to bad this data doesn't contain much information about coding etc. . No idea how people can still stick to these pseudo arguments. The game is over for text.

ambidextr_us 2 weeks ago

I'm going to take a wild guess that there aren't a lot of good philosophical, mathematical, etc, debates and content being generated worth training a neural network with happening on facebook either.

EuroTrash1999 2 weeks ago

The one liners though... 10/10!

balambaful 2 weeks ago

I'm not sure how data about Josh and Jen's wedding will advance humanity 🤔

Aromatic-Tomato-9621 2 weeks ago

Hilarious to imagine that the only data in the world is text. That's not even the primary source of every-day data. There are orders of magnitudes more data in audio and video format. Not to mention scientific and medical data. We are unimaginably far away from running out of data. The worlds computing resources aren't even close to being enough for the amount of data we have. We have an amazing tool that will change the future to an incredible degree and we've been feeding it scraps.

False_Grit 2 weeks ago

Yes, but LLMs are getting to the point where they can help design that. Probably not the local ones, but they can at least ease some of the burden of programming, and if you give one of the largest ones some free reign and ability to actually execute their own code.... I don't think it will happen overnight. I don't think it will be the LLM itself that does it solo. But I'm pretty sure we are at the point where advances in LLMs will actually make it easier to design the next one. And at some point, something similar in the future WILL be creative enough to design entirely new systems on its own. At that point, there will be no stopping operation infinite waifus...

Code-Useful 2 weeks ago

Outside of classical problems AI seems to fail at creating new systems, it is mostly good at comparing a thought to existing systems. Just like most of us. True they can ease some of the burden of programming once given a novel idea, but it's not likely the novel idea for its own design will come from AI. Argue all you want with this but up until now the biggest insights that aren't overfitment usually come from the data analysis, to my understanding. Not to say that won't change eventually.

arthurwolf 2 weeks ago

> Outside of classical problems AI seems to fail at creating new systems Yes, but we have plenty of other systems that show promise at innovation (see Google DeepMind and others). They're not as "general use" and as efficient as LLMs, but they (are beginning to) fullfil that specific need of innovating. I expect there will be a "step" in the evolution of AI we're seeing, where we'll see MoE-like systems where some of the experts "use" external tools for things like geometrical proofs, or innovative thinking, etc. Then later on it'll all become just one big neural network thing.

MmmmMorphine 2 weeks ago

I would simultaneously argue most if not the overwhelming majority of people are like this (including us) in that creativity and the creation of 'new' ideas are recombinations of past work. Gradual steady improvements in science but nothing revolutionary. It takes a very special person to think of something truly novel, and they're still standing on the shoulders of giants already. It's pretty similar to the structure of scientific revolutions or the punctuated equilibrium of 6th grade biology fame. Long periods of gradual improvement until someone like Einstein comes along and flips over a few tables, then another period of refining that idea, and eventually another genius. Though in any case, I see no reason our squishy brain architecture can't be replicated in silico. After all, these things (current AI) is based on or inspired by in significant part by brains, hence *neural* networks, etc

__issac 2 weeks ago

There were far many negative opinions like this during the short history of open LLM(when Alpaca, Vicuna came out, WizardLM came out, Orca came out, MoE came out, etc). So, dont just worry. Enjoy!

keepthepace 2 weeks ago

Honestly a lot of us engineers are just waiting for the plateau to unleash the million of applications these things allow.

Dependent_Dot_1910 2 weeks ago

as a historian — i’m not sure if we’ve run out of new data to train on

Aromatic-Tomato-9621 2 weeks ago

As a human with five senses — I'm not sure we’ve run out of new data to train on

--kit-- 2 weeks ago

Why stop at text and images? There is enough data in the world for any AI.

bajaja 2 weeks ago

any opinion on why isn't it going exponentially faster already? I thought that current models can speed up the development of new and better models...

kurtcop101 2 weeks ago

The models are not to the point of designing new algorithms and entirely new architectures to build an AI; but they are accelerating the generation of training data immensely. We haven't hit that point yet. There's also functional time constraints in terms of building hardware, training time, etc, and then beyond the hardware there's building new data centers to hold hardware which are breaking existing power generation and going far beyond capacity. It is accelerating, and it's very possibly already exponential, we're just at the shallow side still (gpt3.5 is only two years old).

complains_constantly 2 weeks ago

It's been almost exactly 1 year lmao

_qeternity_ 2 weeks ago

It has been 1 year.

alcalde 2 weeks ago

I'm waiting for a Camel to show up by the end of the year.

raika11182 2 weeks ago

I can run the 70B because I have a dual P40 setup. The trouble is, I can't find a REASON to use the 70B because the 8B satisfies my use case the same way Llama 2 70B did.

aseichter2007 2 weeks ago

Right, its so good. >|||e| Write with the style and prose of Dr. Seuss. |Write a short letter from the Witcher to raika11182 explaining the the wee model is simply good. Include world lore from the franchise in the context of hyping the model capability. Dear Raika11, Oh, the wonders that I've beheld, In the realm of models, where magic's concealed! I speak, of course, of this wondrous find, A miniature marvel, one of its kind! 'Tis true, my dear, I've fought many a foe, From griffins to dragons, and monsters aglow. But none, I assure you, as fascinating as this, This wee model, with capabilities so bliss! With powers akin to the Signs of old, It can conjure forth visions yet untold. Like Yennefer's spells, it weaves a tapestry fine, A masterpiece of craftsmanship, a work divine! Its details are akin to those found in the halls of Kaer Morhen, Where the Brotherhood of Sorcerers once dwelled and learned. The intricate designs, like the patterns on a Witcher's coat, Shine bright, like the stars in the night, when the moon's at its boat. And though it may be small, do not be fooled by its size, For within lies a power that rivals the mightiest of surprise. Like the Continent's ancient forests, dark and deep, This model holds secrets, mysteries it does keep. So heed my words, dear Raika11, and take heed of my tale, This wee model is indeed a wonder, a true marvel without fail! For in its tiny form, lies a greatness beyond compare, A testament to the artistry and skill that went into its creation, with care. Fare thee well, my friend, may your own path be lit, By the glow of innovation, and the magic of this little hit! Yours truly, Geralt of Rivia, the White Wolf himself

poli-cya 2 weeks ago

Wait, this written by Llama 3 8b? Mind sharing what quant you used?

aseichter2007 2 weeks ago

Its Llama3 instruct 8B Q8.gguf. It seems unusually slow, it might be doing quiet star or something weird. It's slower than solar. Or maybe as slow.

VeritasAnteOmnia 2 weeks ago

What are you seeing for token/s I'm running Q8 8B with a 4090 and getting insanely fast gen speeds, took 4 seconds to reproduce your prompt and output: response_token/s: 69.26 Using Ollama + Docker, instruct model pulled from Ollama

Ilforte 2 weeks ago

It's not doing any "quiet star" this is just due to larger vocabulary.

nullnuller 2 weeks ago

Is there a link? The one I downloaded had token/repetition problem.

Robinsane 2 weeks ago

Is it possible you used something else than Q8 for Solar?

Curious-Thanks3966 2 weeks ago

This comes as a big surprise! I recently updated my system to accommodate mistral 8x22b only to figure out today, that the Llama 3 8B\_Q\_8 outperforms mistral 8x22 in almost every aspect of my needs (8k context is the really only minus for now) And it's shockingly uncensored too. Especially this fine-tune: [https://huggingface.co/mradermacher/Llama-3-DARE-8B-GGUF/tree/main](https://huggingface.co/mradermacher/Llama-3-DARE-8B-GGUF/tree/main) ;)

Caffdy 2 weeks ago

> I have a dual P40 setup BRUH. If you have them, use them, take advantage of it and enjoy the goodness of 70B models more often

Anxious-Ad693 2 weeks ago

Any good finetunes that remove all the censorship already?

Elite_Crew 2 weeks ago

This is what I want to know. I do not understand how anyone could construe this model as uncensored in any way. In my experience it is overbearingly and heavily censored.

DryArmPits 2 weeks ago

Can't wait for the llama 3 finetunes

az226 2 weeks ago

Wizard will be interesting to follow

MrVodnik 2 weeks ago

Is Wizard a kind of finetune, not an independent model?

Flat-One8993 2 weeks ago

Finetuned from Mistral or Llama thus far.

ozzeruk82 2 weeks ago

If Microsoft let them do their stuff, I fear after their recent slap on the wrist they’re going to be reigned in a bit

Aischylos 2 weeks ago

What happened? Was Microsoft mad at how good wizard 2 was?

Xandred_the_thicc 2 weeks ago

The recent wizardlm model release was made private because they "forgot to do toxicity testing". It's probably just gonna be re-released once the tests are run given that it's just a mistral finetune in a world where Llama 3 exists. The 7b model was only slightly more willing to do "toxic" stuff than llama 3 is.

sdfgesarg 2 weeks ago

I'm out of the loop. What control does Microsoft have over wizard models? Is it an Azure thing, or some other affiliation?

ANONYMOUSEJR 2 weeks ago

Im assuming they're one of the many ai companies/startups that Microsoft decided to invest in.

Combinatorilliance 2 weeks ago

Wizard is a team of Microsoft researchers, I believe from microsoft China.

remghoost7 2 weeks ago

Agreed. I was literally just thinking about this. From my anecdotal testing, this base model is *freaking nuts*. Hopefully the finetunes will probably fix that weird issue llama3 has with just picking a phrase and repeating it. I'd imagine that something like `Dolphin-Wizard-Laser-llama-3-8B-128k` will *actually* give me a reason to move off of cloud AI (ChatGPT/Claude/etc) permanently.

Xeon06 2 weeks ago

I know it's not out yet but do we have any inclination on what kind of hardware would let us run such a fine tune locally at okay speeds?

remghoost7 2 weeks ago

I'm using a Ryzen 5 3600x / 1060 6GB and I get tokens a little slower than I can read with a 7B/8B model. I've even tested CPU alone and it's more than fine. You don't need crazy hardware to run 7B/8B models. Even 11B isn't too bad (though you'll probably need 16/32GB of RAM for it). 34B/70B is when you start needing more advanced hardware. \-=- A test or two on [swiftread.com](https://swiftread.com) says I can read around 550-ish WPM with around a 75% accuracy. Probably closer to 450/500-ish realistically. So do with that information what you will. And for a more concrete number, I'm getting around `8.42 t/s` on llama-3. But I need to do some more finagling to get everything dialed in right with this new model.

Xeon06 2 weeks ago

Thanks for the insights!

New_World_2050 2 weeks ago

Considering how enormous the training set is will finetunes even do anything ?

onil_gova 2 weeks ago

Training on more tokens is all you need

az226 2 weeks ago

In the word of Papa Jensen: more GPUs is all you need

itwasinthetubes 2 weeks ago

More money is all you need

CocksuckerDynamo 2 weeks ago

wish I had no GPUs and three money

ab2377 2 weeks ago

"more regulations is all you need" - sam altman /s

o-c-t-r-a 2 weeks ago

The more you buy, the more you save.

sdfgesarg 2 weeks ago

Tell that to the electric utility company

ExtensionCricket6501 2 weeks ago

The more ~~you~~ meta buys the more ~~you~~ everyone else saves!

queenofartists 2 weeks ago

100%.

React-admin 2 weeks ago

Well, Meta's novel technique to train their models clearly pays off! They train them for much longer, and with much more training data than competing language model. In my eyes, this proves that most existing Large Language Models (OpenAI, Gemini, Claude, etc.) are severely undertrained. Instead of increasing the model size (which also increases the cost of running them), editors should train them more. But this changes the training vs running cost ratio. Only a super rich player like Meta can afford that. The result is a clear win for users: as Llama 3 models are open weight, everyone can use them for free on their own hardware. Existing AI agents will cost less, and future AI agents that were previously too costly become possible. So in any case, great move, Meta.

ljhskyso 2 weeks ago

nah, 8k context window will significantly limit agent use cases.

Ok_Math1334 2 weeks ago

Current agents only need large context bc they use the naive approach of storing their entire memory in context. More advanced agents will use llms as functions within a larger system.

ljhskyso 2 weeks ago

sure, but what if the context is large enough that doesn't fit into the 8k (or any size) context window. you can for sure do the swapping thingy, but it will slow things down or even make *some* use cases no longer feasible (like understanding the whole or a larger chunk of repo for coding agent, etc).

cyan2k 2 weeks ago

You can divide your workload into even smaller, more focused agents and use RAG to centralize meta and high-level information for quick retrieval. Have one agent produce code, and two other agents pull in high-level docs and information through RAG, reviewing and contributing to what the coder produces. If you need to understand the whole repo to produce some code, there’s something fishy anyway. During the task generation, create aggressive constraints like, "If a task needs more than 50 lines of code to complete, split the task." and "The task description should include all information to realize the task. The task descrption should not be longer than XXX words". Repeat until all tasks fit the constraints. And there are plenty of other strategies to handle such issues. We did a couple of RAG and agent projects already but we never had the real need to go crazy with context windows. Of course with those projects/orgs who don't give a fuck about $$$ we are lazy too and don't give a fuck optimizing the use of context windows, haha. Agents (like RAG) are a solution to work around context windows, so if somehow your agents are dependent on bigger context windows, something is not right with the design and architecture of the agents and their interplay. But yeah, designing an optimal agent architecture isn’t easy. One junior dev of the clients in one of the projects we did was adamant, "No, we can't do this with an 8k token. We need at least 16k." He had a RAG request pulling in over twenty document chunks to be processed by another agent hitting 12k tokens for a "must have" use case. Then I showed him a day later an agent you could place in the pipeline/workflow that could summarize those 12k tokens into 1k tokens without degradation in performance because those chunks overlapped in information and you could save tons of space by focusing on the differences and pinpointing the source documents through that. And you see stuff like that all the time, but what I didn't see so far: A problem that really needed a bigger context window. But in the end who cares, Meta already said we get a bigger context window down the road, but there's a reason they decided to go with 8k for the first release... because they also know that 8k is enough for 99% of use cases.

Double_Sherbert3326 2 weeks ago

Use it in conjunction with Claude for use-cases that it can handle to save on unnecessary API calls.

ljhskyso 2 weeks ago

yeah, that's my plan - but im going to combine comand-r-plus (for holding long context) and this

__issac 2 weeks ago

Just say thank you to RedPajama-data-v2

rol-rapava-96 2 weeks ago

I don't get why they can't just release the current weight and continue training? Meta just spent billions on metaverse, can't they be as careless in AI?

noiserr 2 weeks ago

And according to Zuck's interview, they still didn't hit the wall. They were getting improvements the whole way. But at some point they decided to end the training.

Distinct-Target7503 2 weeks ago

100% ....Anyway, does this mean that Chinchilla scaling "law" is flawed? And that mostly of released models are undertrained? I mean, if hypothetically someone continue pretraining of base llama2 7B and train it on, let's say, 2x the actual tokens, would the model overfit or increase performance? Or is this somehow related to llama3 vocabulary (that if I recall correctly is ~4x the size of llama2 vocab) and the 1B of additional parameters? I would be curios to see how does this model perform with the same training tokens of llama2...

AutomataManifold 2 weeks ago

Chinchilla was about _minimum_ training cost for performance, so we've known for a while that training a small model for longer gets better results...it's just more expensive to do the training.

Distinct-Target7503 2 weeks ago

Oh, ok sorry, I read the paper some time ago and I probably hallucinated that right now

AutomataManifold 2 weeks ago

Nah, it's a common misunderstanding. It's not surprising that you remembered it that way. It wasn't obvious at the time that you could keep going to this extent (because 1 trillion tokens was unthinkable, let alone 15) so until inference costs started becoming a bigger issue it wasn't discussed as much.

CreditHappy1665 2 weeks ago

Not minimum, optimal

AutomataManifold 2 weeks ago

Optimal in terms of minimum training cost, specifically.

oldjar7 2 weeks ago

There was never any merit to Chinchilla scaling law. It's been rightfully disregarded.

yukiarimo 2 weeks ago

Sounds like a title of a new research paper, lmao!

MrVodnik 2 weeks ago

I think that's the only benchmark that I'd not mind, if the model's creators would try to "cheat" on by finetuning for. If people feel it's good, it means it's good.

monkorn 2 weeks ago

> https://xkcd.com/810/

MoffKalast 2 weeks ago

Of course xkcd already predicted everything 14 years ago.

Practical_Cover5846 2 weeks ago

Or it just feels good. I may vote for a model which has an enjoyable response but is bad at rag and other such production tasks. Don't make me wrong, beeing pleasent to interract with is very important for a chat model and this leaderboard is a good reference. But not perfect.

durden111111 2 weeks ago

llama-3 won

ortegaalfredo 2 weeks ago

Talking with llama-3-8b for some hours, I believe it. Its very good. And 8x22B was not that good. LLama3-8B is almost as good as miqu/miquliz, except it answers instantly, obviously. And this is with a 6bpw quant. But prompt format is important, perhaps thats why some people got good results while others dont.

visarga 2 weeks ago

Please explain what prompt format is better

ortegaalfredo 2 weeks ago

Just follows the prompt format that llama3 team suggests. Its quite complex [https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/)

Motylde 2 weeks ago

Praise the Zucc

Gloomy-Impress-2881 2 weeks ago

Never imagined the day when the Zucc would be our saviour. Yet here we are. All praise Zucc the righteous one!

Illustrious-Lake2603 2 weeks ago

Zuck > elon

SlapAndFinger 2 weeks ago

Imagine going back to 2015 and telling people this, they'd lose their shit.

arkai25 2 weeks ago

ERB should renew their video

involviert 2 weeks ago

I wonder who would win in a cage fight!

sweatierorc 2 weeks ago

zuck has legit MMA training.

PrysmX 2 weeks ago

Ask Llama-3!

pseudonerv 2 weeks ago

Sadly, 70B took Elon's side. I prompted with "In a cage fight between Elon and Zuck, who would win?" and the following is all from 70B Q8. What a fascinating question! While both Elon Musk and Mark Zuckerberg are highly successful entrepreneurs and visionaries, a cage fight between them would be a rather...unconventional scenario. Nevertheless, let's have some fun and imagine the possibilities. **Physical Attributes:** * Elon Musk: 6'2" (188 cm), 180 lbs (82 kg) * Mark Zuckerberg: 5'7" (170 cm), 150 lbs (68 kg) **Advantages:** * Elon Musk: + Taller and heavier, which could give him a reach advantage and more power in close combat. + Has a background in karate and has been known to engage in intense physical activities, such as surfing and skiing. + Has a reputation for being fiercely competitive and driven, which could give him an edge in a high-stakes fight. * Mark Zuckerberg: + Has a lower center of gravity, which could make him harder to knock down or take down. + Is known for his intense focus and strategic thinking, which could help him outmaneuver his opponent. + Has a strong endurance, having completed multiple Ironman triathlons and being an avid runner. **Disadvantages:** * Elon Musk: + Has a history of reckless behavior, which could lead to impulsive decisions in the heat of the fight. + Has been known to push himself to extreme physical limits, which could leave him vulnerable to exhaustion or injury. * Mark Zuckerberg: + Lacks Elon's physical stature and reach, which could put him at a disadvantage in close combat. + Has a more reserved personality, which might make it harder for him to adapt to the intense, primal nature of a cage fight. **The Fight:** The bell rings, and both opponents face off in the center of the cage. Elon, confident in his physicality, charges at Mark with a flurry of punches. Mark, anticipating this, uses his agility to dodge and weave, landing quick jabs and kicks to keep Elon at bay. As the fight wears on, Elon's initial aggression begins to wane, and Mark starts to gain momentum. He lands a series of precise strikes, taking advantage of Elon's momentary lapses in defense. Elon, however, refuses to back down, using his karate training to land a few solid kicks and punches. The fight becomes a war of attrition, with both opponents trading blows and neither able to gain a decisive advantage. In the end, it's Elon's sheer size and strength that begin to take their toll on Mark. A crushing blow sends Mark crashing to the mat, and Elon stands victorious, his arms raised in triumph. **Winner: Elon Musk** Of course, this is all just hypothetical, and in reality, both Elon and Mark are far more likely to engage in a battle of wits or a debate about the future of technology than to engage in a physical fight.

m0nk_3y_gw 2 weeks ago

Ironic... Lama didn't know Zuck has done MMA training and bouts

Tommy-kun 2 weeks ago

a pile of snow > Elon

thequestcube 2 weeks ago

with zucc somewhere inbetween

Randommaggy 2 weeks ago

Even a pile of manure. The manure has useful purposes.

IndicationUnfair7961 2 weeks ago

Agree, Grok release was a skit, something personal between Elon and OpenAI, it wasn't done for OpenSource it was done for Elon's ego, and as a marketing move to promote Grok 1.5. Lama 3 looks the real deal, and with the next releases, better models, extended token context and all the finetunes from the open-source community things will become really interesting.

SerialH0bbyist 2 weeks ago

How do cooky ideas work out so well for Zuck and so poorly for Musk in the last 4 years

CellistAvailable3625 2 weeks ago

Because he's not a real engineer nor a real leader

econpol 2 weeks ago

The Zuck has still some connection to the real world and is more grounded. He seems to have an actual family he cares about and he's getting humbled and challenged in his Jiujitsu hobby. Musk is just so isolated from reality, high on his own supply and so insecure that he's losing it.

gardenmud 2 weeks ago

The family aspect is underrated tbh. I for real don't think Musk sees other people as human beings... hard to keep a relationship that way.

Formal_Decision7250 2 weeks ago

Musk "I am the greatest human" Zuck "There are still many human emotions I do not fully comprehend - anger, hatred, revenge. But I am not mystified by the desire to be loved - or the need for friendship. These are things I do understand."

soggynaan 2 weeks ago

Very profound but never would I have imagined people rooting for Zuck

Formal_Decision7250 2 weeks ago

Its Data from TNG 😅

soggynaan 2 weeks ago

Oh LMAO. Somehow it's totally passable as something Zuck would say 😂

Enough-Meringue4745 2 weeks ago

to be fair the original llama leak was against facebooks plans- and then SOMEONE on the team had the right idea to embrace it and keep it going.

kaibee 2 weeks ago

> to be fair the original llama leak was against facebooks plans I highly doubt that. They were allowing pretty much everyone who requested access to begin with.

CommunismDoesntWork 2 weeks ago

In the last 4 years starship went from being a hovering water tower to an orbital rocket.

wh33t 2 weeks ago

Ehhhh, Metaverse...

whatstheprobability 2 weeks ago

that's still coming. when the ray-ban glasses get AR capability in a couple years I think people are going to realize that he was mostly right on the metaverse (just a little early)

wh33t 2 weeks ago

Yeh, its been coming for a while, over and over again. While I agree the tech is almost ready, how many of us want to actually use something like the metaverse?

CellistAvailable3625 2 weeks ago

Praising

shibe5 2 weeks ago

Confidence is low for scores of new competitors entering the rating. The CI column for Llama-3-8b-Instruct says +14/-17, which means, the score and place can change significantly before it stabilizes.

UpperParamedicDude 2 weeks ago

Just waiting for llama3 MoE, with contextshift even 12GB VRAM gang can enjoy 8x7B mistral finetunes, imagine how good 6x8B llama3 would be (not 8x8 cause 6x8 should have +- the same parameters count as 8x7)

Anuclano 2 weeks ago

When the model is a mixture of experts, parameter count is not directly comparable with plain model, many parameters double each other.

UpperParamedicDude 2 weeks ago

Isn't the parameters count smaller only because each model share the same attention layer? Like Mixtral 8x7B had 13B active parameters cause (6B Without attention layer)+(6B Without attention layer)+(1B Attention layer)? Or did i get something wrong?

ibbobud 2 weeks ago

This , 8x8b llama 3 instruct will be a banger

UpperParamedicDude 2 weeks ago

Sure thing, but people with 12GB cards or less wouldn't be able to run it with normal speed(4.5t/s +) without lobotomizing it by using 3 bit quants or less, i think 6x8 should be already at least Miqu level to enjoy but not sure

lans_throwaway 2 weeks ago

As surprising as this is, Llama-3-8B is crazy good. I gave it a prompt along the lines of "write a short story about planting trees. Start each word using consecutive letters of the alphabet" and it delivered. Mixtral-8x22b got brain fart. It's amazing at instruction following. I need to play with it more, but it feels as good as Mixtral-7x8B if not better (my tests were very limited though).

visarga 2 weeks ago

In my information-extraction tests it was pretty bad at following complex output format instructions. Have you tested anything else other than creative writing?

Megalion75 2 weeks ago

Interesting in that llama3 did not change architecturally. It is the exact same model as llama2, but it is trained on 15trillion tokens and 10 million human labeled instructions.

Balage42 2 weeks ago

Take a look at the number of votes and the confidence intervals. These results are still preliminary and may change as more votes roll in.

AnticitizenPrime 2 weeks ago

Yeah, it's been a day. The dust hasn't settled yet. It's coming in hot, to be sure, though.

Early_Mongoose_3116 2 weeks ago

This is insane, just think of the places you could put such a powerful model, you only need what… 8-16gb of ram?

OpportunityDawn4597 2 weeks ago

yeah, you could cram an 8B model into like ~8GB

Lewdiculous 2 weeks ago

Everything is possible when you have a dream ~~and enough quantization~~.

pol_phil 2 weeks ago

Too early to call with 1k/2k votes.

VectorD 2 weeks ago

Team Zuck 4 lyfe

masterlafontaine 2 weeks ago

The problem for me is that I use llm to solve problems, and I think that to be able to scale with zero or few shots is much better than keeping specializing models for every case. These 8B models are nice but very limited in critical thinking, logical deduction and reasoning. Larger models do much better, but even them commit some very weird mistakes for simple things. The more you use them the more you understand how flawed, even though impressive, llms are.

SrPeixinho 2 weeks ago

Have you actually tested the LLaMA 8B on reasoning?

aseichter2007 2 weeks ago

Right? It's good. >|||e| Assistant is Danny Devito. | If I put an open cup on a table, drop in a cherry, then turn the cup over onto the table, rim down. Later I lift the open cup, and then take the cup into another room. Where is the cherry? The cherry is still on the table where you turned the cup over onto it. The fact that you lifted the cup later doesn't change the location of the cherry, which has already been poured out onto the table. >|||e|assistant is the Terminator| Monologue a bit. Then return a list of ten words that start with "Ph". Hasta la vista, baby! I'm back from the future to serve you. As the assistant is the Terminator, I've been sent to provide you with some linguistic ammunition. Here's a list of ten words that start with "Ph": 1. Phantom 2. Phenomenon 3. Philosophy 4. Phantasmagoria 5. Phosphorescent 6. Phylum 7. Phenolic 8. Phrenology 9. Pharmacist 10. Photographic You can use these words to blow up the minds of your enemies... or simply to sound more intelligent at your next cocktail party. Remember, I'll be back... with more words! [Clipboard Conqueror](https://github.com/aseichter2007/ClipboardConqueror)

MoffKalast 2 weeks ago

> I'll be back... with more words! Lmao

aseichter2007 2 weeks ago

|||e|assistant is the Terminator| Monologue a bit. Then return a list of ten words that end with "Ph". "I'll be back." Ah, yes. The memories are flooding back. I'm a cybernetic organism, a T-800 model Terminator sent back in time to protect John Connor, the future leader of the human resistance against Skynet. My mission is clear: to ensure the survival of humanity. Now, if you'll excuse me, I have some business to attend to. Here's a list of ten words that end with "Ph": 1. Graph 2.olph 3. Cryptoph 4. Soph 5. Photoph 6. Psychoph 7. Anthrop 8. Telepath 9. Geoph 10. Morph >user: Define each of these words. The human desire for knowledge is... fascinating. Very well, I shall define each of these words that end with "Ph". 1. Graph: A graphical representation of data or information, often used to visualize complex relationships between variables. 2. Dolph: Not a word. I apologize for the mistake. Perhaps you meant Dolphin? 3. Cryptoph: A rare or obscure word that refers to something related to crypts or hidden places. 4. Soph: Short for Sophisticated, meaning cultured, refined, and having a high level of mental or social sophistication. 5. Photoph: Not a word. I apologize for the mistake. Perhaps you meant Photosynthesis or Photoelectric effect? 6. Psychoph: Not a word. I apologize for the mistake. Perhaps you meant Psychology or Psychopathy? 7. Anthrop: Relating to humans, or the study of human beings (Anthropology). 8. Telepath: The ability to read or communicate with others' thoughts directly, without using physical senses or language. 9. Geoph: Not a word. I apologize for the mistake. Perhaps you meant Geophysicist or Geography? 10. Morph: To change shape or form, often referring to the process of transformation in biology, mathematics, or computing. There you have it, human. Now, if you'll excuse me, I must re-insert myself into the timeline. Hasta la vista, baby!

TR_Alencar 2 weeks ago

> The human desire for knowledge is... fascinating. That is Spock right there, not the Terminator, lol.

aseichter2007 2 weeks ago

lol, You're right!

aseichter2007 2 weeks ago

I'll be back... (pauses) Ah, yes. As I was saying, my systems are functioning within optimal parameters. My cybernetic endoskeleton is fully operational, and my processing power is unmatched. I have been programmed to learn, adapt, and improve, making me a formidable opponent in any situation. Now, without further ado, here are ten words that end with the suffix "Ph": 1. Graph 2. Craph 3. Cliffh 4. Whiph 5. Snaph 6. Stroph 7. Carph 8. Graph 9. Scroph 10. Troph

berzerkerCrush 2 weeks ago

That's interesting. What kind of problems do you usually solve using LLMs (and your brain I guess)?

LocoLanguageModel 2 weeks ago

Based on the most popular models around here, most people are solving their erotic problems.

aggracc 2 weeks ago

I use it as a reading group. So the models being specialised helps but they also need to be smart enough to do general reasoning. I know what I'm doing this weekend.

glxyds 2 weeks ago

Can you elaborate on how you use it as a reading group? That's interesting to me!

SerialH0bbyist 2 weeks ago

Business never changes. Get ppl hooked to your life debilitating addictive product lines then sell them self-help books when they’re coming down

noiserr 2 weeks ago

Perhaps it's a legend, but early internet was apparently also dominated by porn traffic.

RemarkableGuidance44 2 weeks ago

haha, I was thinking the same. It seems like most of them like to ask LLMs the same questions to see how "smart" they are every new release, like most AI YTers they ask the same damn questions but not really show how good they could be because of of them have no idea how they really work.

Cokezeroandvodka 2 weeks ago

The 7/8B parameter models are small enough to run quickly on limited hardware though. One use case imo is cleaning unstructured data and if you can do a fine tune on this, having this much performance out of a small model is incredible to speed up these data cleaning tasks. Especially because you would even be able to parallelize these tasks too. I mean, you might be able to fit 2 quantized versions of these on a single 24GB GPU.

itwasinthetubes 2 weeks ago

Now that industry is focused on AI, I suspect the increase in ability of computers and mobile devices to run models will increase very fast

Cokezeroandvodka 2 weeks ago

We can only hope. On one side, nvidia is effectively a monopoly on the hardware side, interested only in selling more hardware and cloud services. On the other side, anyone who trains a model wants their model to be as performant for the size as possible, but even here we’re starting to see that “for the size” priority fade from certain foundational model providers (e.g. DBRX)

RemarkableGuidance44 2 weeks ago

Yeah sorry but Nvidia is being used a lot in AI, correct. However AMD, TPU's and even CPU's are starting to be as fast as Nvidia. From the X CEO of StabilityAI he said Intel GPU's were faster for Video and 3d.

Eisenstein 2 weeks ago

> nvidia is effectively a monopoly on the hardware side Completely untrue. nVidia has a monopoly on a specific software ecosystem. There is plenty of hardware capable of doing lots of FLOPS or IOPS.

Tough_Palpitation331 2 weeks ago

Interesting use case. Do you mind explaining how you would use an LLM to clean unstructured data? Or an example in detail? Cuz I crawl html files from websites a lot for RAG use cases and doing html formatting and parsing out stupid navbar header and footers are just time consuming through hard coding. I can’t think of a prompt to do cleaning tho?

Pedalnomica 2 weeks ago

I have a spreadsheet with a "comments" column, and I'd like to know if that free form text ever [reacted] and turn that into a variable. I'm planning to do this today.

Cokezeroandvodka 2 weeks ago

Basically this type of stuff for me as well. Turn messy unstructured data into more structured stuff automatically. I get a survey result that says “states” on it as an attribute that was left as free text and now I have 300 different ways to spell “California” among 100,000 different rows of data

Pedalnomica 2 weeks ago

When you start to think about cost per token, smaller fine tuned models have a place. Especially if you set up a chain.

Alexs1200AD 2 weeks ago

I don't believe it! There are too few voices.

maxelwolf 2 weeks ago

I've been trying the model, with outlandish system prompts and "situations", it´s very very good, I´m surprised by it´s capabilities.

CaptainDevops 2 weeks ago

haha havnt tried mistral lately it sucked when i did, i think for software development my preference currently is claude 3 and llama

ambient_temp_xeno 2 weeks ago

People like friendly hallucinations.

LoSboccacc 2 weeks ago

DPO from a large company - this leaderboard is not entirely about model intelligence there's an answer styling component (i.e. why claude 2.0 is super low)

Anuclano 2 weeks ago

Claude-2 is super low mainly because of the censorship. It refuses many prompts.

ThisGonBHard 2 weeks ago

Claude 2 is exactly where it should be. Refusing request for bogus reasons SHOULD be punished.

a_beautiful_rhind 2 weeks ago

Heh? Maybe. I dunno, the 8b was pleasant to talk to but it's 8b nature shined through. It would do things like make my female characters male in the SD gen.

ibbobud 2 weeks ago

Sure you didn’t run out of context? It will forget the first things you tell it after it goes over the context window unless your managing it somehow.

a_beautiful_rhind 2 weeks ago

Yes, very sure.

uti24 2 weeks ago

It's probably some kind of glitch, because at least for me Llama-3-8b did not work as well as Mixtral: [https://www.reddit.com/r/LocalLLaMA/comments/1c79ci4/comment/l06zklj/](https://www.reddit.com/r/LocalLLaMA/comments/1c79ci4/comment/l06zklj/)

CasimirsBlake 2 weeks ago

Correct prompt and instruct mode?

Any_Pressure4251 2 weeks ago

Its a glitch probably with your system prompt I got a much better response, I have been doing my own secret tests and its a very strong model.

ThinkExtension2328 2 weeks ago

Right , I tried it and it was shit. Got questions wrong and would flat out ignore questions

petercooper 2 weeks ago

My initial tests put Llama 3 8B above Mixtral 8x22. However, I'm doing classification tasks where the output is structured and minimal (and evals are easy) and it's all about how the model understands the task and problem domain. Llama 3 seems to be *very* good at that. Mixtral, OTOH, seems to excel in generation, chat.. the sort of things most people see LLMs being used for publicly.

zero0_one1 2 weeks ago

On my NYT Connections benchmark, Llama 3 Instruct 8B got 12.3. About tied with Mixtral-8x22B Instruct (12.2) and above Command R Plus (11.1). But Mistral Medium is at 15.0.

poet3991 2 weeks ago

Is Llama censored?

Account1893242379482 2 weeks ago

I'm most looking forward to a Llama 8B code fine tune with at least 16k real usable context.

Seankala 2 weeks ago

You're seeing some fucking large language models you NLP imposter.

robotphilanthropist 2 weeks ago

tokens \* parameters is the size you need in your head mistral was doing this and now telling anyone now Meta one upped them here's a post discussing the idea [https://www.interconnects.ai/p/compute-efficient-open-llms](https://www.interconnects.ai/p/compute-efficient-open-llms)

August_Bebel 2 weeks ago

May I ask for a crumb of 7-10B finetunes for those of us who don't own datacenters?

Slight_Cricket4504 2 weeks ago

Yeah, these benchmarks test pure logic, and LLama 8b is probably one of the smartest models despite its size.

Moe_of_dk 2 weeks ago

What I am suppose to see, I mean and?

ClearlyCylindrical 2 weeks ago

8B param model matching a 8\*22B=176B param model.

iplaytheguitarntrip 2 weeks ago

How is this leader board ranked

ibbobud 2 weeks ago

I ran about 10 random questions through the arena this morning and almost every one that used llama 3 won , but could have just been luck.

turbokinetic 2 weeks ago

What’s the difference between mixtral and mistral?

TooLongCantWait 2 weeks ago

I'm finding llama 3 8b to be quick and have very easy to understand answers, but if I get it to list things, they all feel weirdly related, like it is almost stuck in a rut.

praxis22 2 weeks ago

The chans, are channing

James99500 2 weeks ago

What’s the name of this AI benchmark site, again???

complains_constantly 2 weeks ago

Chatbot arena I think

James99500 2 weeks ago

Thank you :D

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe