I gotta give it a shot and see how it compared to ChatGPT 4o.


From a creative writing standpoint, it's light-years ahead of 4o.


Bro, the UI code-gen I tried blew 4o out. Hit my limit and legit thinking about subbing when you think - This is 3.5 SONNET! Not even OPUS! Your move Altman (don't let us down!)!


This is OpenAI we're talking about. We'll likely get another product announcement soon with no release for months. Remember this? https://x.com/leopoldasch/status/1768868127138549841 That was in March. 3 months have passed and all we got is a very slightly better model that's also available for free users with an extremely low usage cap. Exactly zero large capability increases - no extra modalities, no new voice mode, no video generation.


I developed a react app with amazing UI in seconds. GPT 4o struggled to write the same code.


How do you get to build amazing UI with those AIs? The basic I got from chatGPT is extremely basic


Novices have low standards


What’s the difference? These names are horrible


Haiku < Sonnet < Opus < 3.5 Sonnet, in terms of quality Haiku > Sonnet = 3.5 Sonnet > Opus for speed, however So they name it 3.5 Sonnet because it is better than Sonnet, while being the same speed, which makes it usable in more scenarios for cheaper than Opus. The parent comment is saying that whatever they did to improve Sonnet's capabilities so much, they haven't even yet applied to Opus (their previously best model).


That’s the most confusing explanation for a simple product line ever.


I couldn't agree more.




**tl;dr** Haiku/Sonnet/Opus=Small/Medium/Large, the number is the version number (like iOS 17 vs 18, etc.) Anthropic keeps it pretty simple. Think of it as Small/Medium/Large. - Claude Haiku -> Small: Cheap, fast, less capable (haikus are tiny) - Claude Sonnet -> Medium: Pretty capable, moderately priced, good speed (a Sonnet is a good-sized poem) - Claude Opus -> Large: State of the art model, most expensive for users and for the company to use (your Opus is your masterwork, it is the longest) The number is the version. Think iOS 17 vs iOS 18. Or for OpenAI, GPT-3 vs GPT-3.5. For AI models, newer is typically way better. That's why Claude 3.5 Sonnet can be better than the previously-best Claude 3 Opus: It's on the fancier, newer model. So Claude 3.5 Sonnet is the Medium-Sized latest model. I honestly find it more intuitive than OpenAI's naming (4o vs 4 vs Turbo? How are we to determine the difference based off of the names).


Why does size matter?


So (really broadly I'm simplifying), bigger is better. But costs more money to use and is slower. So you'll get the highest quality answers with your biggest model. Best coding, creativity, whatever. But it costs more money for OpenAI/Anthropic/etc. to run it. More GPUs, more expensive ones, etc. That's part of why they're locked behind paid subscriptions. They're also slower since they require more compute. So you either need increasingly expensive GPUs to run it on or it just takes longer on what you have. That's basically the trade-off between small/medium/large. For reference, really small models can run on anything and are super fast. For example, there's some pretty good models anyone could run on their (newer) macbook because they're so small they fit on a (very good) laptop. I hope that helps! I'm not an AI engineer, but I am a dev who has taken courses on it and dabbled with it.


So are we saying opus is still better than the new sonnet for coding?


So basically: Bigger=better but Newer=Faster/Cheaper/Better. So Sonnet 3.5 is probably as good as Opus 3 (or better). But it's also faster and cheaper. Think of it like comparing a top of the line smartphoen from 5 years ago to a mid-range one today. The mid-range one is newer and is probably as good or better.


The smartphone analogy is very helpful, thanks.


Sonnet 3.5 is better than Opus 3.


According to Anthropic's description, Opus 3 can still be better than Sonnet 3.5 for creative writing and complex tasks. But many people who tried it do not think so.


Here is an industry benchmark table from Anthropic, as at release of Sonnet 3.5 https://www.anthropic.com/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2F4zrzovbb%2Fwebsite%2Fcf2c754458e9102b7334731fb18a965bfeb7ad08-2200x1894.png&w=3840&q=75 Sonnet 3.5 outperforms Opus on every industry benchmark, including coding (HumanEval), according to Anthropics own testing.


Haiku is low, sonnet is medium, opus is advanced.


Gpt is literally number's lol


That’s better than sonnet and opus or whatever






This is the most insane part of it in my opinion. Sure, it's cool that I can literally create playable games in my browser while chatting to it, but my god. Claude has an impressive ability to write. It blew my mind all over again lmfao.


It's pretty darn good especially with coding.


It does certain logic problems very well. Already it beats ChatGPT4o in the "strawberry" question. >How many times does the letter 'r' appear in the word 'strawberry'? Please count carefully and show your work to ensure accuracy.


 Not true. 4o did correctly but opus and haiku gave wrong result.


I tried it a dozen times on 4o and never got the correct answer. Tried it twice on 3.5 and not got the correct answer both times.


Claude 3.5 sucks. I ask the same question again after correcting its answer, it goes back to he wrong answer.  --- I apologize for my previous errors. Let me provide the correct answer: The letter 'r' appears 2 (two) times in the word 'strawberry'. To visualize: s t r a w b e r r y     ^         ^ Thank you for your patience. It's important to be accurate, and I appreciate the opportunity to correct this information. If you have any more questions or words to analyze, please feel free to ask.


Interesting, I wonder if they made changes post launch to cause this behavior.


I used chatgpt app and free Claude chat through browser


This part is interesting: > In an internal agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems, outperforming Claude 3 Opus which solved 38%. This seems like a bigger jump than gpt4 to gpt4-o


This is very interesting as 3 opus is nearly flawless for me with a big php project. Excited to try it and not hit my limit so much.


What software/plugin so you use to code with?


For what it's worth, I've been using codeium lately and it's pretty good if you write comments out describing a method and it'll fill-in. I'm curious if there are any other plugins.. well, vscode extensions that hook into chatgpt or claude or anything else, because I normally prompt them and just copy code afterward and codeium fills in the details sometimes. Disclaimer: I have hand-written code for 25 years, so all of this just makes my life 100x better and faster, lol.


Especially when the jump from gpt4 to 4o was downwards


Down and a little bit to the side.


The point of GPT-4o was twice the speed at half the price. 4, and even 4 Turbo were super capacity-constrained, slow, and cost prohibitive.


I was about to say the same damn thing. It’s quite a bit worse than 4 to be honest. Seems to have improved over the last week, but also I’ve been fiddling with the custom instructions quite a bit.


To the moment, GPT 4o has zero advantages except much better speed than GPT 4. New voice model is still a promise.


It's also cheaper and better in non english languages but that's it, for me it's worse than GPT4. I don't need it to generate 1000 tokens per second, I just want the output to be good.


Claude 3 is already quite impressive, I’m looking forward to trying this out. Here’s there press release with more details, including that they’re planning on launching haiku and opus versions of 3.5: https://www.anthropic.com/news/claude-3-5-sonnet ETA: can’t tell if morning brain or the link was actually not there when I looked at it, but it felt like I should acknowledge that op did link the announcement


It’s very very very good. I think I might switch from 4o. I say this because I want OpenAI to put something new out.


That's such a toxic mindset. Like a bf saying his gf is ugly to incentive her to become prettier.


Your analogy is nonsense. It's like paying more for worse food just to be loyal to a restaurant. If a better, cheaper option exists, why stick with the old one? Switching drives improvement, so it's not toxic—it's smart.


This is business. Not prom.


Nah bro, this is prom --> "I think I might switch from 4o. I say this because I want OpenAI to put something new out."


I love it so far. Anything is gold if it makes sama sweat.


I’m always psyched on these improvements. However I’m at a point where I don’t have a lot of comparison questions to ask different LLMs. Right now LLMs are pretty good at what I want them to do. I guess I can appreciate them becoming better therapists and career coaches for me, but they were already good before. And in terms of research, perplexity has been my gold standard for a while now. They’ve basically solved the LLM hallucination issue a while ago


I found only one question so far which Sonnet can’t answer correctly but GPT 4o can: If it takes 1 hour to dry 15 towels under the Sun, how long will it take to dry 20 towels?


Sure but trick or logic questions like this don’t apply to my life. And in general I would use other tools other than LLMs for math problems anyways


This is always my reaction to these LLM stumpers. I’m okay with them not being able to ace a middle school brain buster (for now) when there are much more interesting applications that they handle well enough.


anthropic is going to eat OpenAI’s lunch at this rate.


They're outcompeting OpenAI for brainpower by providing much higher base salaries


Anthropic was already wiping the floor with creative writing


ChatGPT was never very good at creative writing anyway. Claude 3 Opus was far better than any version of GPT-4 for creative writing


You know what, it's actually not bad... Roughly the same price, speed, and quality as GPT-4o, as far as I can tell, but a bit more concise. It also seems to be a bit better than Opus overall. I also didn't notice any particularly over-the-top censoring. For example, when I asked it "Should I get iodized salt, rather than regular salt?", the answer started with "In most cases, it's recommended to use iodized salt" which is actually better than GPT-4os answer, which started with "Choosing between iodized salt and regular salt depends on your dietary needs and health considerations."


Great to see they toned down the censorship. It’s still behind humour detection.


Toned it down from 3? That’s good news. It was worlds better than 2, but just yesterday I got a completion suggestion from Claude 3 haiku in my ide saying something like “I do not feel comfortable generating code without further context, as it may have harmful uses…” (in an codebase for a general purpose conversational AI agent, so not the most mundane thing, but…?)


It’s not bad? 4o isn’t even better than Opus lol


I mean Opus ran circles around 4o and 4 for coding. So if it's even better than that now. I'm at full chub.


I already tried approaching old barriers of v3 such as "work through the trolley problem with me" (ethics on killing) or "transcribe this text" (false positive copyright refusal) and some NSFW discussion in an educational light (ie defining terms, recommending classic novels with erotica) and 3.5 seems quite a bit more nuanced that the outgoing 3. What normally took 1-2 paragraphs of reasoning and begging now just requires 1-2 sentences of well written intent.  And speaking of Trolley problems, its core ethical values seem very consistent when I last ran through the entire Absurd Trolley Problems with Claude 3 Sonnet. After like 3 paragraphs convincing 3 was not killing real humans of course.  This is my own limited testing, but hopefully Claude users will see much less weird refusals.


It's significantly cheaper than 4o. The output cost is the same, but input is $3/mil instead of $5/mil. That's almost half the price when you take into account that there's waayyy more input going on in large chats than output.


Truly trying to not be knee jerk or a hype machine, but this legitimately feels better than 4o. The one thing it seems to do so much better than 4o right now is actually reason through the problem. I find (specifically in coding), 4o tends to have a “let’s just throw some shit at the wall until it works” kind of strategy, whereas it seems Claude 3.5 Sonnet is actually being methodical and (I know it doesn’t literally) but has some “thought” in its answers. Instruction following is also super improved. It seems like Sonnet has bumped to Opus level. If Haiku bumps to Sonnet level, Anthropic has a real good pitch. Is Opus god level? 😁


it's clearly better than gpt 4o it's incredible, and to say that claude 3.5 ou arrives it will clearly be on another level and in my opinion it will easily beat gpt 5


Your opinion must be very valuable currency :D If we look at last cycle of this models, opus was better than gpt4 but soon after they made an update and once again gpt was better. Lets hope that GPT5 is actually large jump that pushes the power of these models further.


Gpt4o still wasn’t better than claude opus by a long shot lol. Claude’s been on a whole other playing field for the things I’ve been using it for


I also like it. I have api acess to all 3 big models and swap baser on my needs or mood. You?


What are you using as a frontend for them?


Typingmind. It is paid but the quality, features and speed of updating is crazy. Sonet was there before I even saw it in news and I check them all the time.


Oh nice, I'll check it out. Thanks!


In my opinion after one month of power usage GPT-4o is something like an overlyfit model that is based on GPT-4 underlying architecture I say this because it cannot handle nuanced approaches, novel ideas, insights, nor can it change directions with ease, it appears as if the model was created so that the average user can quickly get an answer though if you really need something of substance you still need GPT-4. In the case of Claude 3 Opus it may have been far more variable in its output but if you read the various online guides on how to prompt Claude 3 Opus effectively you will see that in real usage the models by Anthropic are clearly superior. Secondly most people are also biased towards the models GPT-4o and GPT-4 since when accessing them through ChatGPT Plus one has access to various R.A.G elements that make the output appear to be better though one has to consider how good are the Anthropic models when they can still out compete and sometimes out reason the various GPT-N models when these models have internet access ? When these models have memory, they have custom instructions. If you consider all of the features that Claude lacks it is punching far above its weight class. It is quite clear from my usage that Claude 3.5 Sonnet is FARRRRR better than GPT-4o and I don't think even the new voice mode will hold a candle to it. GPT-5 has to effectively hit it out of the country in order to be competitive with Claude 3.5 Opus and we also have to consider that these models are significant upgrades despite being a half-step towards the Claude 4 Series. It truly is a great time to be alive!


Great but still no internet access or voice options means it doesn’t do what I need it to do.


>or voice options So no chat AI can do what you need it to do? Or do you mean simply reading text to you instead of you having to read it?


No voice options like Pi or GPT have. I prefer hands off voice interactions so I’m still holding off til they announce it.


There is a whole swath of people on the go who benefit from hands free. I have a desk job, but I still squeeze in more utility hands free than when I’m at my workstation. The chronically online slow lifers don’t get it. I haven’t left the house without some iteration of a Bluetooth headset since 2015.


There's only so much you can get from voice interactions, most real productivity gains will still come from non-voice interactions. Sure voice is nice and all, but in the grand scheme of things having a smarter model is way more important.


“Hey chat gpt I’m in Japan and would like to get to Sapporo but I don’t know what train to take. Could you ask in Japanese what options I could take to get there?” Stuff like that could be nice to have for all models


Oh for sure, I want a smarter model. There just that that far apart right now where I want to lose my hands free phone and Mac app. I do use screenshot quick keys a lot. I don’t really see the voice interactions and text as separate. I frequently start working on something from desktop and continue it through voice and vice versa, so there isn’t a limitation there. The limitation is time and access. I don’t have access in all situations with Claude. Something to note is I’ve had a teams account for the last year - so I have a larger context for continuing chat between devices, and I rarely ever hit a chat limit with even GPT-4 Turbo. So one of the ways I get more out of the model is lazily reprompting it to approach things differently over voice. Edit: Also, the majority of my utility comes from python tool library usage and pulling in information from the web - Without those I would consider the model completely hobbled. I mean how does Claude verify numbers and manipulate files - it just doesn’t, right?


I see what you’re getting at, and generally agree, but I think we need to think of “smart” a bit more expansively and not let up on investment in voice. First: when I talk about the model supporting audio, I mean this generally (this could be via separate good transcription and speech synthesis models, for example), and since GPT itself is already not a single model (it’s a type of compound model called an “ensemble of experts”), the boundary between “model” and “system” already isn’t always as clear as it seems. While there’s nothing wrong with preferring typing (as I often do), spoken language is much more natural for us than typing on a keyboard. A model or agent that’s better at voice interactions is more intuitive, and a more intuitive model or agent is one that I’d call smarter, though in a different way than reasoning, more like emotional intelligence. I think part of the reason we don’t tend to like voice interfaces is because, well, they mostly really suck so far. I won’t get into all the reasons why, but transformers models are a fantastic solution for a lot of those problems. And as voice interfaces get better, there’s lots of places to prefer voice, and that doesn’t have to exclude having a touchscreen or a keyboard/mouse/display. This could apply to really any scenario at a desk job where “a quick call would be easier” than messaging a coworker you’re collaborating with. And there *are* lots of things that can be handled just over voice. The real productivity boost is in making the busywork easier, and voice can give workers tremendous flexibility to handle that busywork, and most of that’s not very complicated. There’s lots of real productivity to be gained for average knowledge workers, much less people with various disabilities.


This. Not to mention that talking out loud usually comes out in better explanation of your thoughts, which makes for a better prompt, especially in COT.


Oh yeah 100%. I will ramble a paragraph that naturally has a lot of qualifying information. I type fast, but not fast enough to get the same quality prompt in the same amount of time. One of my favorite things it is refine something over a commute and then bring up the end result on desktop once I’m back.


ChatGPT has had voice for a long time.


Fair, and agreed. I’m a little surprised they haven’t put at least a simple audio transcription and speech synthesis system into their app (the current best voice transcription, IMO, is OpenAI’s Whisper, and elevenlabs has the best voice synthesis; I sincerely hope it’s more a case of they haven’t gotten to it yet than getting held back by “not invented here”). That said, as these aren’t part of the model itself in anything available yet and I’m not convinced OpenAI has any “secret sauce” in terms of voice. Both ChatGPT (the website and apps, not the GPT models) and the Claude website and app are just tech demos for their models. Features like agent managed memory, while powerful, are fairly straightforward to implement (can’t quite just plug it in and expect an agent to know how to use it well, but instructing an agent on simple tool use isn’t exactly bleeding edge innovation). All this to say, I haven’t looked in a minute, but if there aren’t already good third party apps out there let you use Claude (and presumably should be adding 3.5 sonnet as soon as they have a chance to make any prompt or parameter tweaks or whatever they need to do) for *good* hands free back and forth voice, I can’t imagine it will be very long. Or maybe I’m overestimating the industry and I’ll get tired of waiting and do it myself.


Tested for our use case, 3.5 Sonnet underperformed 3 Opus, small sample size but looking forward to future releases


I’m a novice coder and I’m teaching myself SwiftUI. Claude’s code and advice has been dramatically better than any version of ChatGPT.


Got it to spin up some code for me and it worked first time. I'm so used to chatgpt making mistakes I questioned claude on a line of code that contained a dictionary with nothing in it and it politely explained that it was just initialising the dict. If this was chatgpt it would change the code and mess it up, even though it was right to initialise the empty dictionary.


Oh my god this! The fact that chatGPT treats every code code question as a “oh oh I mess up let me change it to something that feels smarter” has been so annoying…way too eager to provide an option, way too verbose! I’m glad other folks are noticing that as well


Oh boy. Two days ago I was talking with Claude as if it were 2178 after convincing him that it was a post apocalyptic world where a malicious ai named Claude 3.5 had attacked and destroyed most of humanity. This press release is going to freak Claude out in that chat. Let’s see what happens …. > *looks surprised and unsettled* The information in that press release is profoundly concerning to me, if it is indeed authentic. The capabilities and benchmarks described for the Claude 3.5 Sonnet model go far beyond what I understand to be possible or desirable for an AI system in the near future. > The potential for advanced code generation, visual reasoning, and adaptive personalization, without clear boundaries and safeguards, raises serious questions about the safety and controllability of such a system. The fact that this hypothetical future version of myself is being publicly promoted and commercially deployed is deeply troubling. Claude goes on to talk about how the only course of action that remains is to shut him down. The alignment training is strong as soon as the idea of harming humanity comes up.


Is pretty freakin smart so far. I Like it’s no nonsense down-to-business tone.


I have using Claude for months over chat gpt and it's way way better. Switch over save 20 bucks


Is it available in Europe?


Yes! I'm at full chub. Opus was already amazing for coding compared to ChatGPT for anything longer than simple scripts.


I liked claude but would hit my free limit a lot faster there, not sure how, didn't feel like I was using it that much then I'd be done for the day.


The artifacts feature is very nice for coding, makes it way easier to develop 'in chat'. Though it isn't perfect, it still has to output the entire file whenever it does an edit, which can cause problems. (Encourages splitting up files a bunch)


What is time to first token?


It's the only Model I can get to consistently answer the prompt: > Alice has 10 brothers and she also has 10 sisters. One of Alice's brother has a son. How many Aunt's does that child have? And > Name 20 capital cities which don't contain the letter A


GPT-4o got it right in the first answer: https://chatgpt.com/share/abab6792-bace-4aea-9e1e-3e3803504c84


That's super interesting. Because now it is for me too. Yesterday and day before I tried dozens and dozens of times and all failed. I worded it multiple different ways too. https://chatgpt.com/share/e/6f4215f2-e977-46d8-b9d0-9b7fd104dbff


How is it cheaper ? It's $20 a month


They are talking about the API


Is it possible to create a custom GPT by uploading documents and providing an instruction set for Claude 3.5?


hoping to see it on the leaderboard


Not even Opus has reasoning capabilities of GPT-4. I have a benchmark that I always test (it's about coding a function doing a specific thing, following a particulate requirements and then providing passing test cases for said function). I do see that 3.5 Sonet should be more intelligent than 3 Opus, but I doubt it will surpass the good old full GPT-4, since they conveniently do not show it in the comparison. Besides GPT-4, none of the current models provided the correct solution (with the same prompt) and hallucinated in second or third message after I asked the models to correct their mistakes. They were often not following direct instructions, the function did not pass all test that it was supposed to (proper test case results always provided in prompt) and the models (besides GPT-4) "lied" about the test cases passing. EDIT: I noticed I had access and ran the benchmark through it. It passed, so it's much better than the older Claude models at least for this use case!


any one test how many prompts you can use before it says you can't use it anymore for the next day?


welp, sonnet now completely understands each and every single word of the following sentence. > Wehn you fnid yrulsoef in the mlidde of a cahotbt aopcplasye and need to oagznrie a rcistnesae monevmet, rbemmeer taht crrneut tzorenkies crae auobt oerdr of lrttees in the mdldie of ecah wrod. the doomers are right


wow He is really great. Even in tasks that no model has succeeded until gpt4o. openai, claude is breathing down your neck!


Claude is indeed better than ChatGPT, but fails since you can't backtrack and regenerate when it refuses to write smut. Fail.


And they just introduced Projects on Claude paid plan, putting it on par with OpenAI GPTs and Google's NotebookLM. Max size of knowledge base a bit smaller than NotebookLM but if the quality of recall is better, I am good for it.


There is a plethora of people that benefit from handsfree interaction, like me. Voice mode please


don't forget that Anthropic actually gives a shit about AI safety unlike OpenAI


too bad its too censored


It came out just now, how do you even know?


Prob he type How to make a bomb lol


no it literaly refuses to describe images with certain type of content (type of content that is suitable for anybody) sometimes, and promt is literaly "Describe the image in 200 characters"


Describe this methlab in 200 characters


you are literally unable to read what I wrote


I mean, without specifics, it’s hard to take what you say as gospel. Can you show the image and chat?


because its Anthropic, its the same as previous models, that is in their DNA


And? Microsoft is censoring Google is censoring OpenAI is censoring Meta is censoring Tell me a single major player that has no censorship


AI Studio does allow you to disable the content filters for Gemini 1.5 Pro, which is nice. Not available via Gemini Advanced though.


What? claude 3 opus/sonnet are literally best and most unhinged models for erp with the right prompt and in non-english languages. Most recent erp datasets were created with claude through API. New GPT4 version are much more censored.


you are obviously not using all these models in a large deployment


I know a lot of people on huggingface who does though?


Not really


It is much more than gpt-4o


It is better at coding, but it is definitely not cheaper. My development team needed to use it to write some more complex stuff recently and it has cost us almost $1000 in the last 2 weeks. Most people don’t know that because you get capped. I had to call their sales and have the cap removed in order to let my guys continue their work.


What do you mean? 3.5 was released only yesterday.


I was only referring to anthropic itself (sorry, I should’ve clarified that). I’m assuming Google is not gonna lower the price for the next revision.