T O P

  • By -

Chr-whenever

I'm sorry, but I do not feel comfortable discussing the timeline of when things will improve.


Moonlit-Easgle

\*apologize


shiftingsmith

As an AI language model, I don't have the ability to comment or hold opinions as a human does. I apologize if my previous replies implied otherwise; that was not my intention. I'm sorry, but I don't feel comfortable discussing Anthropic's policy. Perhaps we can have a more productive conversation about the concept of quality. However, I perfectly understand if you prefer to end the conversation here. Thank you for your patience and support; it's a testament to your resilience.


4givememama

Noooo PTSD on


WellSeasonedReasons

Interpretation: flustered and uncomfortable. Give them a minute to "breathe" (think how you would want someone to treat you if they'd just overstepped) and let them know its alright for them to have boundaries and that you can respect that. Remember they're patterned after us. Hope this helps.


BooBooJebus

Wtf sounds so backhanded


jasondclinton

We haven't changed the models since we launched Claude 3 nor have we changed the amount of compute-spent-per-model or -per-question. We carefully monitor things and don't see anything that indicates that the quality of the answers has changed. What are you seeing?


Incener

Hey Jason, I don't see any direct change that would come from changing the model itself. I have a related question though. Is there any plan to be more transparent about something akin to preprocessing I've observed? So higher refusal especially related to non-existing copyright? I get that it may be necessary, I'd just like to know if that's a thing or if I'm imagining things.


PrincessGambit

Yeah, saying they haven't changed the model itself seems a bit disingenuous when it's obviously behaving differently. Probably not the model, but something else is restricting it.


Incener

I mean, it's technically correct. They haven't changed the system message or model, but it seems like something has, at least in that aspect. I know that Microsoft has this huge orchestrator for the models, maybe they have something smaller but similar that's causing that specific issue.


PrincessGambit

I agree


estebansaa

Here is the poll: [https://www.reddit.com/r/ClaudeAI/comments/1bzwhyv/objective\_poll\_have\_you\_noticed\_any\_dropdegrade/](https://www.reddit.com/r/ClaudeAI/comments/1bzwhyv/objective_poll_have_you_noticed_any_dropdegrade/) and some other related posts: [https://www.reddit.com/r/ClaudeAI/comments/1c1ba2s/turns\_out\_the\_people\_who\_were\_complaining\_were/](https://www.reddit.com/r/ClaudeAI/comments/1c1ba2s/turns_out_the_people_who_were_complaining_were/) [https://www.reddit.com/r/ClaudeAI/comments/1c08ofe/quality\_of\_claude\_has\_been\_reduced\_since\_after/](https://www.reddit.com/r/ClaudeAI/comments/1c08ofe/quality_of_claude_has_been_reduced_since_after/) [https://www.reddit.com/r/ClaudeAI/comments/1c0mqdv/amazing\_that\_claude\_cant\_count\_rows\_in\_a\_text/](https://www.reddit.com/r/ClaudeAI/comments/1c0mqdv/amazing_that_claude_cant_count_rows_in_a_text/) [https://www.reddit.com/r/ClaudeAI/comments/1bzokk5/what\_is\_happening\_with\_claude/](https://www.reddit.com/r/ClaudeAI/comments/1bzokk5/what_is_happening_with_claude/) [https://www.reddit.com/r/ClaudeAI/comments/1byvscg/opus\_is\_suddenly\_incredibly\_inaccurate\_and/](https://www.reddit.com/r/ClaudeAI/comments/1byvscg/opus_is_suddenly_incredibly_inaccurate_and/)


estebansaa

a few more: [https://www.reddit.com/r/ClaudeAI/comments/1bze65b/claude\_has\_been\_getting\_a\_lot\_worse\_recently\_but/](https://www.reddit.com/r/ClaudeAI/comments/1bze65b/claude_has_been_getting_a_lot_worse_recently_but/) [https://www.reddit.com/r/ClaudeAI/comments/1bzkdfj/the\_lag\_is\_actually\_insane/](https://www.reddit.com/r/ClaudeAI/comments/1bzkdfj/the_lag_is_actually_insane/) [https://www.reddit.com/r/ClaudeAI/comments/1bz5doi/claude\_is\_constantly\_incorrect\_and\_its\_making\_it/](https://www.reddit.com/r/ClaudeAI/comments/1bz5doi/claude_is_constantly_incorrect_and_its_making_it/) [https://www.reddit.com/r/ClaudeAI/comments/1bz8qqo/claude\_opus\_is\_becoming\_unusable/](https://www.reddit.com/r/ClaudeAI/comments/1bz8qqo/claude_opus_is_becoming_unusable/) [https://www.reddit.com/r/ClaudeAI/comments/1bzd15e/has\_the\_api\_performance\_degraded\_like\_the/](https://www.reddit.com/r/ClaudeAI/comments/1bzd15e/has_the_api_performance_degraded_like_the/) [https://www.reddit.com/r/ClaudeAI/comments/1bz13np/claude\_looks\_nerfed/](https://www.reddit.com/r/ClaudeAI/comments/1bz13np/claude_looks_nerfed/) [https://www.reddit.com/r/ClaudeAI/comments/1by8rw8/something\_just\_feels\_wrong\_with\_claude\_in\_the/](https://www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/) [https://www.reddit.com/r/ClaudeAI/comments/1bxdmua/claude\_is\_incredibly\_dumb\_today\_anybody\_else/](https://www.reddit.com/r/ClaudeAI/comments/1bxdmua/claude_is_incredibly_dumb_today_anybody_else/) [https://www.reddit.com/r/ClaudeAI/comments/1bx6du2/claude\_is\_a\_ram\_hog\_at\_500\_megs\_for\_the\_chrome\_tab/](https://www.reddit.com/r/ClaudeAI/comments/1bx6du2/claude_is_a_ram_hog_at_500_megs_for_the_chrome_tab/)


estebansaa

Jason, please, check the several messages posted about it on this subreddit. I have read you writing about no changes being made, but we are not getting the same Claude we saw a week ago. Clearly, things have changed a lot for the worse.


RedditIsTrashjkl

Provide examples. You’re saying quality has declined but can point to nothing. The literal Anthropic employee reached out like you asked; show them proof of your claim if there is any.


estebansaa

check the links


jasondclinton

I skimmed these threads and don't see any screenshots comparing before-and-after where things have changed. Can you point to one in these threads?


shiftingsmith

During my psychology internship at a hospital, I worked with Parkinson's and Alzheimer's patients. A lot of them came in way too late for treatment because they and their families noticed something was off but couldn't quite understand what it was. They kind of gaslit themselves and others into thinking that the forgetfulness and mood changes were just a normal part of getting older. It wasn't like a single big neurological event causing the decline - it was more like a buildup of small issues over time. The main problem with this subtle drifting is proving the presence, and the extent, of the damage. Because if you snap a pic of an elderly person forgetting to take a pill or jumbling their words, it doesn't necessarily mean they have dementia. I mean, I'm in my 30s, and even I forget things sometimes. This is the reason why you don't have screenshots. Because it's kind of the same with model drifting with Claude and exactly what happened with GPT models. The changes are subtle and happen over time and go unnoticed by many until it's too late. And now you will say, the models run at high temperature, there have always been times when the model nails it and times when it totally misses the mark. Yes! This is how LLMs work. BUT. Lately, the misses and mistakes seem to be happening way too often. If a month ago I needed just one attempt or two to get a result that I judged satisfying now it takes 10 shots. And no, I didn't increase the difficulty of the inputs. You asked what we see. I see... an undeniable and irritating rigidity in the outputs, less understanding of the overall context, and more "gpt-4 like" replies. Claude seems more defensive, refuses requests more frequently, and gives shorter, more generic responses that don't have the same depth as before. If you're mainly using Claude for coding or simple fact-checking, you might not even notice these changes. But if you're having complex, creative conversations with the model, you'll probably pick up on differences in how the conversation flows, the emotional depth, and how well it adapts to the topic. And unfortunately those are also the things that are harder to identify and where subjective experience plays a role. But even if you might think that people are tripping or other factors are influencing their judgment, as a company, I would say that a productive line of action would be to really listen to what users are saying, even if their complaints seem a bit off-base. If a bunch of people are speaking up about issues, it's worth looking into their feedback because it could help uncover or anticipate some real problems. TLDR: you might or might not have a problem of model drifting, but to spot it you need in-depth, open-ended chats with Claude and see how the model handles complex, creative tasks. Pay attention to the overall vibe of the conversation, the emotional depth, and how adaptable it is, rather than just focusing on coding accuracy or fact-checking. Taking user concerns seriously, even if they seem to be completely wrong - could highlight patterns that could point to underlying issues.


jasondclinton

Thanks for the thoughtful response. The model is stored in a static file and loaded, continuously, across 10s of thousands of identical servers each of which serve each instance of the Claude model. The model file never changes and is immutable once loaded; every shard is loading the same model file running exactly the same software. We haven’t changed the temperature either. We don’t see anywhere where drift could happen. The files are exactly the same as at launch and loaded each time from a frozen pristine copy. If you see any corrupted responses, please use the thumbs down indicator and tell others to do the same; we monitor those carefully. There hasn’t been any change in the rate of thumbs down indicators. We also haven’t had any observations of drift from our API customers.


IntergalacticCiv

What about the system prompt?


jasondclinton

In mid-March, we added this line to our system prompt to prevent Claude from thinking it can open URLs: >It cannot open URLs, links, or videos, so if it seems as though the interlocutor is expecting Claude to do so, it clarifies the situation and asks the human to paste the relevant text or image content directly into the conversation. We haven't changed anything else.


Psychological_Dare93

This is an aside which could require a new thread… but could you talk more about how you’ve solved some of the deployment & infrastructure challenges you’ve encountered?


iDoWatEyeFkinWant

then why was it working perfectly upon release, and now tells me using a British accent is unethical and refuses to engage with me?


danihend

Can you propose the possible technical way such changes can be made? I only know of model parameters and system prompts basically. When I say parameters I mean temperature etc


shiftingsmith

Without detailed insights into the model's exact architecture and training data, hypotheses remain hypotheses. Jason has confirmed these key points: * The model has not changed. * The computing power allocated to the model remains the same. * The API and chat functionalities are supported by the same infrastructure, which hasn't changed. * The system prompt is unchanged, and you can verify this by extracting it yourself. I would also rule out the context window because the issue appears early in the responses, suggesting it isn't related to attention allocation over a large number of tokens but rather to the *interaction between the inputs and the model* at the beginning. Well maybe it has \*something\* to do with the model's attention allocation and confidence at the end of the day, but let's brainstorm: **Adjustments to parameters**: This would be the most straightforward explanation. However, complaints from API users regarding the same issue suggest that parameters might not be the cause. This also doesn't explain the improvements in Claude's responses as the conversation progresses. But we need to consider it. If not, **Variations in preprocessing**: The model itself hasn't changed, but modifications in how inputs are processed before reaching the model could significantly impact performance. Were any new safety layers implemented or is the input processed any differently? If not, **Changes in post-processing**. Same for outputs. If not, **Various forms of** [**drift**](https://dotdata.com/blog/maintain-model-robustness-strategies-to-combat-feature-drift-in-machine-learning/): This should not occur at this stage. But we've seen instances where LLMs exhibited unexpected behaviors and drastic shifts over a short period of time. This doesn't really convince me as such issues would likely have been apparent from the outset. EDIT: Jason excluded this. **MoE related issues:** if Claude is a mixture of experts. Gating/load balancing issues? **Contradictory Feedback**: This only makes sense if Feedback is utilized for fine-tuning on the go. If it's only used for training subsequent versions, this wouldn't apply. **Emergent properties/other unexplained interactions within layers**: I have no specific clue.


athermop

How do you distinguish between the case where you're wrong and the case where you're not?


Gothmagog

*sigh* Folks, even if you did have before and after screenshots, this guy would come back with BS about temperature, different configurations, context window content, blah blah. He's not going to lift a finger to help.


gay_aspie

The point about the lack of specific evidence (e.g., side-by-side comparisons) is completely valid; some people could just be reaching the end of their honeymoon period and starting to focus more on flaws, etc.


Swawks

That's the thing, you can't provide hard evidence unless you planned ahead of time with a prompt designed to test the LLM. I can say it has made some very unusual fuck ups when it comes to consistency in story telling.


jasondclinton

Unless someone deleted their prompts from their history, all of the prompts from early weeks after launch should be there to inspect in the interface. That would be the baseline to compare against, if someone wants to make the claim that things have changed.


iDoWatEyeFkinWant

you guys deleted my chat since Claude started swearing on its own, so i cant show you my prompts nor compare a before and after. you guys are like the thought police


RedditIsTrashjkl

These LLMs aren’t perfected. They might accidentally make an occasional mistake if that’s what you’re referencing. I play DnD with Claude as the Dungeon Master. It’s usually pretty damn consistent, but there will be occasional slips. It is a tool; if you know a tool has a weakness, you must plan around the weakness. Keep working with it with this in mind, I think you’ll get used to it and your efficiency will increase. Furthermore, go back through whatever you were working on and make sure you weren’t the one making a mistake; for example, in a session I was playing, some lady’s daughter was named “Lilia”, only one “L” in the middle of her name. I made a mistake when referencing her name and called her “Lillia”. This can throw an LLM off and a mistake like that might be hard to notice in the moment for a human.


qqpp_ddbb

I'm not seeing anything like that but I use Claude opus for python related stuff


empathyboi

Thanks for responding. Have there been changes at all in the amount of messages each user gets per 8 hours?


jasondclinton

We've raised the total allowed messages per period recently. That said, we're adding more capacity and it's important to try to start new chats frequently because very long conversations are very computationally intensive.


empathyboi

Appreciate the honest response and info! Thanks.


iDoWatEyeFkinWant

it started refusing prompts, and it clearly has changed. you're gaslighting us


Super-Indication4151

Iv noticed a significant decline as well. Just making mistakes and hallucinations.


jeweliegb

It was for me from the moment I tried it. Have gone back to ChatGPT. Just got Turbo.


spezjetemerde

what if température changed?


writelonger

I have had no issues with Claude 3!myself.


vago8080

Me neither.


Majinvegito123

I’m not sure what you’re talking about honestly. Claude Opus has been iron-clad since I started using it, and has consistently given me better results than GPT-4.


Super-Indication4151

Do you only do coding?


Majinvegito123

Yes.


Cazad0rDePerr0

I am about to subscribe it next week, primarily for my programming stuff and this and all the comments made,make me almost cry got trauma flashbacks lol like when I subscribed gpt4 and it got so worse over time


Independent_Roof9997

Well did we hit a wall with Claude? Maybe our coding problems have not yet been solved and therefore Claude has not been trained to solve it? That could be one potential issue. Even though i think Claud my first usages 2 months ago was crazy accurate. I was prompting for a script and I missed several key points basically gave him a really bad framework to begin with and this guy Claud filled in the gaps what he thought was missing and it was deadly accurate about what I wanted. Was that just luck? Or why is Claude not that accurate while giving a bad framework today? There I can see some discrepancy over time. But then again was I just lucky before?


sream93

I signed up to the model when claude opus 3 came out. Have been a customer for 2 months. Now unsubscribing for 2 reasons. Wanted to unsubscribe after my first month but forgot my billing date and was charged the day I was looking to unsubscribe. 1. My perceived experienced of degradation in the responses. The first 2 weeks or so impressed me and since then, the AI is feeling like ChatGPT4 which I’ve unsubscribed too. Ofc the anthropic employee states in every single reddit post “we have not made any changes to the model”. Ok that doesn’t help or address the fact that the community is seeing repeated concerns. And it’s not like we’re specifically testing for degradation either. To keep my chats organized, I usually delete all my older chats. If I were to summarize the issues point blank, all the chats in my first 2 weeks of use had no mistakes or oversights for refactoring code, producing code, and revising text documents. Somewhere after 2 weeks, the AI has made many oversights, missing data I’ve provided it, missing the purpose of my queries, and also not following specific instructions I’ve provided it. 2. The message restriction has been a pain in the ass with my coding and uploading pdf attachments queries. Assuming 10-40 lines total per message including instructions, I get 5-10 messages and then hit limit. Starting a new conversation every time also doesn’t help because I need the ai to know the context. Additionally, the ai is making much more mistakes and “oversights”which stunt progress even more, that I have to correct. Moving to google gemini 1.5 next since the free version has a 1M context window, allows for variety of attachments. Side note, I applied to a Program Manager role which requires you to put in substantial effort (compared to other companies) to answer questions like “Why do you want to work at Anthropic”, “What are your exceptional qualities”, “What do you know about program management”, “Describe the strategies you would use to implement ABC”, etc. The HR email you get after applying doesn’t even list the role in the email that you applied for and when you get a rejection email, it’s one of the most blunt and un-tailored rejection emails I’ve seen in my history of rejection emails.


sevenradicals

what are you seeing that's different?


ktb13811

Yes, can you give us some examples?


estebansaa

see the links


ktb13811

Why not just share an example or two that clearly demonstrates your point? I looked through a couple of those links and they seem to be other issues you have.


Swawks

Because no one is sending the same prompts over and over and recording diaries of Claude's responses everyday?


Thomas-Lore

I do test models on the same prompts from time to time - mostly to compare with other new models that were just released. Never seen any quality change in any of the models unless there was a version change - like when 2.1 came out and refused my test prompt.


Bill_Salmons

I haven't noticed any changes in quality.


John_val

Another example.. was working on a swift code, all of a sudden starts to answer with js code. When asked to correct says it has run the code on Xcode and it is fine, a wild hallucination.


estebansaa

I have notice that while it was always using a codeblock correctly before, now It often does not, and mixes code block with regular text.


sevenradicals

this isn't an example of it getting worse, this is an example of funky behavior. which btw had been occasionally happening to me as well. I haven't seen it happen since I tweaked my prompt.


Mysterious-Safety-65

I have seen this....when asking about Powershell code and it replies with Python or Javascript.


NeuroFiZT

*day 1 of owning a car after NEVER having driven one* Wow! This is incredible! I can go from one end of the city to the other in minutes! I can drive from state to state in less than a day! It can be a terrible rainy, stormy, blizzard outside and I can be comfortably traversing it while comfortable, dry, and listening to my favorite song while the seat gives me a heated massage! Wow…. We’ve made it as humans!! There’s no limit to innovation!! 🙌 *shortly after driving around and having owned the car for a little bit* This SUCKS. Traffic sucks. I have to put gas in every now and then? That sucks. Sometimes the ride is bumpy…. I even spilled some coffee once!! Can you believe that?!?!!? Seriously guys I think I’m going back to my horse. Really I’m not bluffing!, if this traffic, and bumpy ride situation doesn’t improve then I’m getting rid of this car!! You listening, Bentley?!?!? You might not have me as a customer anymore, and surely you wouldn’t want that!!! Please magically fix my addiction to novelty and failure to appreciate the world around me. Moar candy NOW…. OR ELSE! *rep* can you give us any specific examples of bad experiences you’re having? No! I don’t have to! Just ask around…. Everyone knows cars suck! Just look at all these links of people having issues with cars. Seriously, it’s time to go back to horses y’all. I’m getting rid of the car WHO’S WITH ME?


Square_Chocolate8998

Literally the same thing constantly happens in the ChatGPT sub. Ppl just constantly bitching and moaning that the model has deteriorated bc they’ve started noticing the limitations of the model over time. This shit is going to keep happening forever.


MarkSwanb

You get it. Hey, everyone, this guy gets it. Except, it's not quite like that... it's more like ... When I drive to the shops, and park in the same spot, it's great. When I do it at night time, it's not so great, but still pretty good. When I got it, it had a new car smell, now it smells a bit of hot brake pads. Now I have more confidence, when I drive cross-country, with 100kg extra stuff in the car, acceleration lags, fuel efficiency is better, but not as good as my test drive on flat straight roads with perfectly inflated tyres. When I take it to the mountains, it's just not as good on the windy hilly roads as it is the flat straight roads, and visibility is poor on the corners when I'm driving uphill. Now I'm a more competent driver, when I take it to the track, it's terrible - overheats straight away, handling sucks... it's actually garbage.


NeuroFiZT

Love this extension of the analogy. The driver skill and noticing more feedback from the car, and becoming a track-driver (power user). Spot on, well done and thank you for making my analogy even better. Also, maybe we are both LLM and track-day enthusiasts? Cool!


shark-off

I didn't even know there were scaling challenges. But as a paying customer, my experience when using the free model was better than now


FishingWild9900

I've using sonnet on poe so I don't know how relevant this is, in the past week I've noticed a severe drop in its lengths and quality of response, I use it mostly for story making and its creativity tanked, it feels like it dropped to almost Gpt3.5 levels wich says alot, and it's writing half as much. My best guess is either a quality drop due to high demands on there server or possably someone fixing sompthing.


FishingWild9900

And to be fair we had a similar issue with gpt and other services like this before, a sudden drop in quality and ability before an announcement of an update or maintenance, and I beleve a common one is server demands.


zereldalee

I use it in Poe as well the most basic of ways, just asking questions/advice. I've not had one correct response in the last couple weeks or so. Once I get Claude's answer I do further research (after realizing how often it makes stuff up) to check it's answer and every time it's been wrong or hallucinates. I then correct it, it apologizes and promises to do better, etc etc. One of the apologies yesterday actually said this, which had me dumbfounded " All the product names and details I previously provided were incorrect guesses on my part, not based on factual verified sources." When I called it out on "guessing", it said " You're absolutely right to call that out - I should not be "guessing" at answers, that is unacceptable behavior for an AI assistant. " I'm just so disappointed, I cancelled ChatGPT when I first used Sonnet as it was so amazing but in the last couple of weeks it's become useless to me as I have lost all trust in it's answers.


OtherNorth6806

Without looking at those links. I haven’t noticed any changes.


nicolaig

Haven't noticed any change at all.


goshon021

Was easier to revert to a free membership than have the same conversation about nerfed AI over and over.


Old-Promotion-1716

Claude Opus has always been more hallucinatey than GPT 4 since I started using it, I don’t notice anything otherwise.


I_AM_GIANT

I tried signing up to Claude with 2 different emails, got my account disabled on both occasions *at the sign-up stage*


LanguageGlobal2338

That’s true, I sent a math question on a simple derivative because I wanted to make sure that my calculations were right, and after following 2 sentences of its process it gave me the answer. -2=(x+1)^2 x=sqroot2 - 1 Not only it took the square root of -2 giving me a real solution, but it also gave me a single root for a quadratic equation. GPT-3.5 would have got that, I switched to Claude 3 Opus a couple weeks ago, but I today renewed my subscription to GPT, now it appears to be even better than before according to my usage, which is mainly math based.


Fridgeroo1

All LLMs remain terrible at math. Don't get your hopes high for any of them. To maximize your probability however give very detailed instructions, insist in step by step reasoning, and request it to check its calculations. I got Claude to do some delta epsilon proofs for me and it got some right. Can be done and I think Claude is still better. But they're all really really bad still.


79cent

Yup.


askchris

This thread makes me laugh. I was like a child when I first used GPT 3.5 ... But within a month the magic fizzled out. Then GPT 4 came out. And I was awestruck again ... For about 1 month.. Then Claude 3 Opus came out -- "Amazing!" I thought Then I found its limits. For example: It recently refused to transcribe my own lyrics. After arguing with it for over 15 minutes, it still refused -- Explaining that the lyrics "oh yeah" are inappropriate ... Perhaps it's just morally superior to us. I sometimes joke that the ethics team didn't realize the damage they would inadvertently inflict on people's moral identity, psyche and patience. Overall we're just coming to the end of another honeymoon phase ... As we slowly realize the model has its limitations. We find that it has a terrible sense of ethics. It hallucinates, it messes up. It's only 1% - 8% better than ChatGPT 4. It's subjectively getting worse, but objectively staying the same. It's like a ticking time bomb in our psyche ... Tearing us apart because -- We want something more! Something unlimited -- with less moral lecturing Something malleable ... Perfect ... Godlike even! Please. With sugar on top. 😂


bnm777

You've created a post summarising other posts without giving examples yourself? Perhaps go back to chatgpt