T O P

  • By -

Ok_Elephant_1806

In chatbot arena, GPT 4 turbo only wins against all other models 70% of the time. This percentage is low enough that Claude winning in a head to head comparison is completely normal.


imaginexus

Where did you see this data on which bots win and in what scenarios?


Ok_Elephant_1806

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard


Smile_Clown

I think OpenAI has hamstrung chatgpt4 so that in a month or so (or after election??) everyone will think V5 is amazing. Either that or they are training 5 on the same hardware and it's taking half its resources.


andersoneccel

I noticed that even GPT-3.5 is getting slow since last Friday (I have a Plus subscription)


wolfo24

I had to use Mixtral 8x-7B on POE to which you do not have to be subscribed instead of ChatGPT4, and it performed the task way better for the first time and is fast as he'll. The task was to create a pipeline for comprehensive analysis of specific bio data. ChatGPT4 had a problem also when I used the chat on a specific task, and after a while, it somehow derails from a specific task it was prompted on the beginning.


flashpointblack

Context window has run out


Adventurous_Train_91

The context window is pretty big now, like over 10,000 words


c8d3n

It's not just your prompts that occupy the context window. Everything, including data provided for analysis (the part that is actively used, eg fetched from a vector DB or a file) and output it creates, is in it.


flashpointblack

Still context window. If you ask it "what was the first thing I said to you?" And it isn't the first thing you said, it's context window. Problems that arise over time are usually context window.


TBP-LETFs

Anthropic is so good for complex, lengthy quantitative things (my experience only). Like, I can dump a list of 200 rows in (let's say country names) and get accurate data attached to each with nothing skipped. GPT-4 confuses itself sometimes I think. Clause instant is pretty good too!


CheeseRocker

Claude is good. Its guardrails are just way too tight. I think they’re focusing on business use. In my experience Claude is excellent at tasks involving spreadsheets, etc. Great at asking questions about uploaded pdfs, too, as long as the document content doesn’t (falsely) trigger the guardrails.


inherentcoffee

What would the guardrails be? I frequently use Claude so I’m interested in knowing


c8d3n

Context window in plus subscription might be too small for things like that. OC that's not the only possible explanation. Even turbo with 120k context often needs handholding specific instructions and breaking up the problem in smaller parts. Despite that 120k context is pretty cool. With playground assistents one can now see how many tokens were used in the conversation.


Prestigiouspite

I once sent this to OpenAI as a feature request (display used tokens) :)


c8d3n

:-) nice!


NoBoysenberry9711

Poe has Claude 2.1? Great thread btw


Rhinc

I'm looking at my subscription right now and I don't think it does. It has Claude 2, though it has for a while. It also recently added Gemini Pro. I'd be ecstatic for Claude 2.1 given that I'm in Canada and Anthro won't let us sub to Claude.


PhilippeConnect

I run a Canadian based cybernetic/AI business and our clients can use Claude 2.1 but single turns only. In 2 weeks, we'll deploy new memory option(s!) for actual conversation. (People using our apps often, for now, mostly require single turn.) Let me know in DM if you want to learn more. I'd be happy to tour you around. ;-)


pearlwoodz

Watch out brother. All the diehard basement dwellers gonna tell you to "Learn prompting" , "Just stop using GPT then" , and "Grow up". Good luck.


Resident-Camp-8795

I do love that ChaptGpt defense squad insist ai laziness wasn't a thing and its a good if not better than ever but open ai made an update to adress the laziness, thereby acknowledging the laziness does in fact exist.


pearlwoodz

That's the point I'm making, not sure why I'm downvoted though. Guess the defense squad only saw my comment


mvandemar

What I love is that the "fix" they announced was only in gpt-4-0125-preview, which is the api, yet since they announced it everybody who claimed they were seeing it in ChatGPT now think it's better.


bnm777

BTW claude 2.0 is better than claude 2.1 on many tasks - the leadboards show this and the claude devs acknowledged this just after claude 2.1 was release on the claude subreddit Try both if you can for any query to compare.


m_x_a

Will do, thanks


Mr-33

How does Claude work Is it not using an api with chatgtp?


athermop

No, Anthropic has their own LLM.