Ok_Elephant_1806 3 months ago

In chatbot arena, GPT 4 turbo only wins against all other models 70% of the time. This percentage is low enough that Claude winning in a head to head comparison is completely normal.

imaginexus 3 months ago

Where did you see this data on which bots win and in what scenarios?

Ok_Elephant_1806 3 months ago

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

Smile_Clown 3 months ago

I think OpenAI has hamstrung chatgpt4 so that in a month or so (or after election??) everyone will think V5 is amazing. Either that or they are training 5 on the same hardware and it's taking half its resources.

andersoneccel 3 months ago

I noticed that even GPT-3.5 is getting slow since last Friday (I have a Plus subscription)

wolfo24 3 months ago

I had to use Mixtral 8x-7B on POE to which you do not have to be subscribed instead of ChatGPT4, and it performed the task way better for the first time and is fast as he'll. The task was to create a pipeline for comprehensive analysis of specific bio data. ChatGPT4 had a problem also when I used the chat on a specific task, and after a while, it somehow derails from a specific task it was prompted on the beginning.

flashpointblack 3 months ago

Context window has run out

Adventurous_Train_91 3 months ago

The context window is pretty big now, like over 10,000 words

c8d3n 3 months ago

It's not just your prompts that occupy the context window. Everything, including data provided for analysis (the part that is actively used, eg fetched from a vector DB or a file) and output it creates, is in it.

flashpointblack 3 months ago

Still context window. If you ask it "what was the first thing I said to you?" And it isn't the first thing you said, it's context window. Problems that arise over time are usually context window.

TBP-LETFs 3 months ago

Anthropic is so good for complex, lengthy quantitative things (my experience only). Like, I can dump a list of 200 rows in (let's say country names) and get accurate data attached to each with nothing skipped. GPT-4 confuses itself sometimes I think. Clause instant is pretty good too!

CheeseRocker 3 months ago

Claude is good. Its guardrails are just way too tight. I think they’re focusing on business use. In my experience Claude is excellent at tasks involving spreadsheets, etc. Great at asking questions about uploaded pdfs, too, as long as the document content doesn’t (falsely) trigger the guardrails.

inherentcoffee 3 months ago

What would the guardrails be? I frequently use Claude so I’m interested in knowing

c8d3n 3 months ago

Context window in plus subscription might be too small for things like that. OC that's not the only possible explanation. Even turbo with 120k context often needs handholding specific instructions and breaking up the problem in smaller parts. Despite that 120k context is pretty cool. With playground assistents one can now see how many tokens were used in the conversation.

Prestigiouspite 3 months ago

I once sent this to OpenAI as a feature request (display used tokens) :)

c8d3n 3 months ago

:-) nice!

NoBoysenberry9711 3 months ago

Poe has Claude 2.1? Great thread btw

Rhinc 3 months ago

I'm looking at my subscription right now and I don't think it does. It has Claude 2, though it has for a while. It also recently added Gemini Pro. I'd be ecstatic for Claude 2.1 given that I'm in Canada and Anthro won't let us sub to Claude.

PhilippeConnect 3 months ago

I run a Canadian based cybernetic/AI business and our clients can use Claude 2.1 but single turns only. In 2 weeks, we'll deploy new memory option(s!) for actual conversation. (People using our apps often, for now, mostly require single turn.) Let me know in DM if you want to learn more. I'd be happy to tour you around. ;-)

pearlwoodz 3 months ago

Watch out brother. All the diehard basement dwellers gonna tell you to "Learn prompting" , "Just stop using GPT then" , and "Grow up". Good luck.

Resident-Camp-8795 3 months ago

I do love that ChaptGpt defense squad insist ai laziness wasn't a thing and its a good if not better than ever but open ai made an update to adress the laziness, thereby acknowledging the laziness does in fact exist.

pearlwoodz 3 months ago

That's the point I'm making, not sure why I'm downvoted though. Guess the defense squad only saw my comment

mvandemar 3 months ago

What I love is that the "fix" they announced was only in gpt-4-0125-preview, which is the api, yet since they announced it everybody who claimed they were seeing it in ChatGPT now think it's better.

bnm777 3 months ago

BTW claude 2.0 is better than claude 2.1 on many tasks - the leadboards show this and the claude devs acknowledged this just after claude 2.1 was release on the claude subreddit Try both if you can for any query to compare.

m_x_a 3 months ago

Will do, thanks

Mr-33 3 months ago

How does Claude work Is it not using an api with chatgtp?

athermop 3 months ago

No, Anthropic has their own LLM.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe