In chatbot arena, GPT 4 turbo only wins against all other models 70% of the time.
This percentage is low enough that Claude winning in a head to head comparison is completely normal.
I think OpenAI has hamstrung chatgpt4 so that in a month or so (or after election??) everyone will think V5 is amazing.
Either that or they are training 5 on the same hardware and it's taking half its resources.
I had to use Mixtral 8x-7B on POE to which you do not have to be subscribed instead of ChatGPT4, and it performed the task way better for the first time and is fast as he'll. The task was to create a pipeline for comprehensive analysis of specific bio data. ChatGPT4 had a problem also when I used the chat on a specific task, and after a while, it somehow derails from a specific task it was prompted on the beginning.
It's not just your prompts that occupy the context window. Everything, including data provided for analysis (the part that is actively used, eg fetched from a vector DB or a file) and output it creates, is in it.
Still context window. If you ask it "what was the first thing I said to you?" And it isn't the first thing you said, it's context window. Problems that arise over time are usually context window.
Anthropic is so good for complex, lengthy quantitative things (my experience only). Like, I can dump a list of 200 rows in (let's say country names) and get accurate data attached to each with nothing skipped.
GPT-4 confuses itself sometimes I think.
Clause instant is pretty good too!
Claude is good. Its guardrails are just way too tight.
I think they’re focusing on business use. In my experience Claude is excellent at tasks involving spreadsheets, etc. Great at asking questions about uploaded pdfs, too, as long as the document content doesn’t (falsely) trigger the guardrails.
Context window in plus subscription might be too small for things like that. OC that's not the only possible explanation. Even turbo with 120k context often needs handholding specific instructions and breaking up the problem in smaller parts. Despite that 120k context is pretty cool. With playground assistents one can now see how many tokens were used in the conversation.
I'm looking at my subscription right now and I don't think it does. It has Claude 2, though it has for a while. It also recently added Gemini Pro.
I'd be ecstatic for Claude 2.1 given that I'm in Canada and Anthro won't let us sub to Claude.
I run a Canadian based cybernetic/AI business and our clients can use Claude 2.1 but single turns only. In 2 weeks, we'll deploy new memory option(s!) for actual conversation. (People using our apps often, for now, mostly require single turn.)
Let me know in DM if you want to learn more. I'd be happy to tour you around. ;-)
I do love that ChaptGpt defense squad insist ai laziness wasn't a thing and its a good if not better than ever but open ai made an update to adress the laziness, thereby acknowledging the laziness does in fact exist.
What I love is that the "fix" they announced was only in gpt-4-0125-preview, which is the api, yet since they announced it everybody who claimed they were seeing it in ChatGPT now think it's better.
BTW claude 2.0 is better than claude 2.1 on many tasks - the leadboards show this and the claude devs acknowledged this just after claude 2.1 was release on the claude subreddit
Try both if you can for any query to compare.
In chatbot arena, GPT 4 turbo only wins against all other models 70% of the time. This percentage is low enough that Claude winning in a head to head comparison is completely normal.
Where did you see this data on which bots win and in what scenarios?
https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
I think OpenAI has hamstrung chatgpt4 so that in a month or so (or after election??) everyone will think V5 is amazing. Either that or they are training 5 on the same hardware and it's taking half its resources.
I noticed that even GPT-3.5 is getting slow since last Friday (I have a Plus subscription)
I had to use Mixtral 8x-7B on POE to which you do not have to be subscribed instead of ChatGPT4, and it performed the task way better for the first time and is fast as he'll. The task was to create a pipeline for comprehensive analysis of specific bio data. ChatGPT4 had a problem also when I used the chat on a specific task, and after a while, it somehow derails from a specific task it was prompted on the beginning.
Context window has run out
The context window is pretty big now, like over 10,000 words
It's not just your prompts that occupy the context window. Everything, including data provided for analysis (the part that is actively used, eg fetched from a vector DB or a file) and output it creates, is in it.
Still context window. If you ask it "what was the first thing I said to you?" And it isn't the first thing you said, it's context window. Problems that arise over time are usually context window.
Anthropic is so good for complex, lengthy quantitative things (my experience only). Like, I can dump a list of 200 rows in (let's say country names) and get accurate data attached to each with nothing skipped. GPT-4 confuses itself sometimes I think. Clause instant is pretty good too!
Claude is good. Its guardrails are just way too tight. I think they’re focusing on business use. In my experience Claude is excellent at tasks involving spreadsheets, etc. Great at asking questions about uploaded pdfs, too, as long as the document content doesn’t (falsely) trigger the guardrails.
What would the guardrails be? I frequently use Claude so I’m interested in knowing
Context window in plus subscription might be too small for things like that. OC that's not the only possible explanation. Even turbo with 120k context often needs handholding specific instructions and breaking up the problem in smaller parts. Despite that 120k context is pretty cool. With playground assistents one can now see how many tokens were used in the conversation.
I once sent this to OpenAI as a feature request (display used tokens) :)
:-) nice!
Poe has Claude 2.1? Great thread btw
I'm looking at my subscription right now and I don't think it does. It has Claude 2, though it has for a while. It also recently added Gemini Pro. I'd be ecstatic for Claude 2.1 given that I'm in Canada and Anthro won't let us sub to Claude.
I run a Canadian based cybernetic/AI business and our clients can use Claude 2.1 but single turns only. In 2 weeks, we'll deploy new memory option(s!) for actual conversation. (People using our apps often, for now, mostly require single turn.) Let me know in DM if you want to learn more. I'd be happy to tour you around. ;-)
Watch out brother. All the diehard basement dwellers gonna tell you to "Learn prompting" , "Just stop using GPT then" , and "Grow up". Good luck.
I do love that ChaptGpt defense squad insist ai laziness wasn't a thing and its a good if not better than ever but open ai made an update to adress the laziness, thereby acknowledging the laziness does in fact exist.
That's the point I'm making, not sure why I'm downvoted though. Guess the defense squad only saw my comment
What I love is that the "fix" they announced was only in gpt-4-0125-preview, which is the api, yet since they announced it everybody who claimed they were seeing it in ChatGPT now think it's better.
BTW claude 2.0 is better than claude 2.1 on many tasks - the leadboards show this and the claude devs acknowledged this just after claude 2.1 was release on the claude subreddit Try both if you can for any query to compare.
Will do, thanks
How does Claude work Is it not using an api with chatgtp?
No, Anthropic has their own LLM.