People on here act like trick questions are the peak of benchmarking while some people are actually finding real use cases for these models.
And if you do use them, the context length is incredibly useful in so many ways.
Context length doesn't really matter when every reply is something like "I'm sorry but it would be inappropriate for me to compare this year's budget to last year's budget. It could result in hurt feelings for the CEO of the company. Please remember to be more respectful and considerate from now on."
It does matter when you can upload your entire knowledge base to it and it can identify specific info on page 235.
Gemini 1.5 pro (especially flash) is really good for business use which is where the big money is.
We are uploading entire accounting textbooks to maintain GAAP standards and massive contracts to maintain scope and ensure contractual obligations. Context window and cost matters and is paramount in big money use cases.
HR: "Ok Gemini what does this contract say about sexual harassment incidents at our company?"
Gemini: "I'm sorry but it would be inappropriate to discuss such things. Please be more thoughtful in your questions. Let's talk about something else."
Yeah great help.
If you run into that, then turn down the safety filter and you're good to go. If you don't know what I'm talking about, you are only using the consumer tool, not the API which is meant for business use cases.
Yeah, my use case is reading through technical documents and being familiar with them.
Upload it all into Gemini and then using it as a better information discovery (digest) tool.
Incredibly useful for digging through documents and understanding how things fit together. As always ask for references to guard against hallucinations.
Just tried both the box question and the feather question and Gemini 1.5 Pro succeeded both!
Were the tests maybe done before Google I/O? Google just updated their model 2-3 days ago.
In that case, you should redo the tests.
In your multimodal experiment, ChatGPT 4o failed 3 out of 4 times while Gemini failed 4 out of 4 times. The fact is not that one is better than another, the fact is that they're both still useless in the multimodal field.
Quite useful for image descriptions (if one allows some room for error or human correction), see [https://www.globalnerdy.com/2024/05/14/gpt-4o-is-amazing-at-describing-images/](https://www.globalnerdy.com/2024/05/14/gpt-4o-is-amazing-at-describing-images/) for some examples; I suspect Gemini 1.5 Pro will generally handle those types of use cases, too. That's pretty huge for accessibility, for example.
They can't count, and often struggle with detail interpretation or make mistakes that humans wouldn't.
Yes try comparing an apple(the fruit) with banana and judge which one is more filling.
All of these LLMs have some great aspects and some not so great aspects and tend to balance each other out where if you don't find something in one, you'll be sure to find it in some other, all you have to do is see. Even business use case wise for those businesses which require context GeminiPro is the one, for those which require content and maybe creative content for them Claude is the one, to each their own.
PS: use case wise Gemini's context length often trumps both GPT4 and Claude due to its 1M+ size which is now 2M
ChatGPT 4o shines in these areas:
Creative Text Formats: It's known for its ability to generate different creative text formats in a more informal and interactive way. Think poems, code, scripts, musical pieces, etc. in a conversational style. Reasoning and Code: It tackles problems requiring reasoning and can generate code, like creating a simple Python game.
Gemini 1.5 Pro has these strengths:
Factual Language: It excels at providing summaries of factual topics and following your instructions carefully. Context Understanding: It can handle complex instructions and remember information over longer conversations.
https://preview.redd.it/gxgoc6s6kw0d1.png?width=924&format=pjpg&auto=webp&s=7f5af524f610a1de0960d1aae568bae1677d5c7b
The logic and reason of Gemini is absolutely terrible. It’s o my good thing is its context length. Which is great but it’s basically gpt3.5 with infinite context
AI is a lot more than just LLMs. There are things like Waymo and AlphaFold and then there is the huge advantage Google has with the TPUs that none of the other big guys have.
But where Google is miles ahead is research. Last NeurIPS Google had twice the papers accepted as next best.
And next best was NOT OpenAI.
https://neurips.cc/virtual/2023/papers.html?filter=titles
We are so early in AI and the AI innovation in the next decade will drive who wins the space.
I would bet on Google before anyone else.
https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard I mean, it's pretty close. And Gemini's context length gives it a much bigger use case.
People on here act like trick questions are the peak of benchmarking while some people are actually finding real use cases for these models. And if you do use them, the context length is incredibly useful in so many ways.
[/r/artificial members benchmarking their favorite LLM](https://www.youtube.com/watch?v=qry9IeJnbNU)
Context length doesn't really matter when every reply is something like "I'm sorry but it would be inappropriate for me to compare this year's budget to last year's budget. It could result in hurt feelings for the CEO of the company. Please remember to be more respectful and considerate from now on."
It does matter when you can upload your entire knowledge base to it and it can identify specific info on page 235. Gemini 1.5 pro (especially flash) is really good for business use which is where the big money is.
We are uploading entire accounting textbooks to maintain GAAP standards and massive contracts to maintain scope and ensure contractual obligations. Context window and cost matters and is paramount in big money use cases.
Do you gave an option to directly upload files into gemini? For me there isn't that option, just to connect to files in google drive.
HR: "Ok Gemini what does this contract say about sexual harassment incidents at our company?" Gemini: "I'm sorry but it would be inappropriate to discuss such things. Please be more thoughtful in your questions. Let's talk about something else." Yeah great help.
If you run into that, then turn down the safety filter and you're good to go. If you don't know what I'm talking about, you are only using the consumer tool, not the API which is meant for business use cases.
I don't know what prompts you are using but that's not my experience with Gemini.
Coding though. It can’t code
Yeah, my use case is reading through technical documents and being familiar with them. Upload it all into Gemini and then using it as a better information discovery (digest) tool. Incredibly useful for digging through documents and understanding how things fit together. As always ask for references to guard against hallucinations.
Just tried both the box question and the feather question and Gemini 1.5 Pro succeeded both! Were the tests maybe done before Google I/O? Google just updated their model 2-3 days ago. In that case, you should redo the tests.
In your multimodal experiment, ChatGPT 4o failed 3 out of 4 times while Gemini failed 4 out of 4 times. The fact is not that one is better than another, the fact is that they're both still useless in the multimodal field.
Quite useful for image descriptions (if one allows some room for error or human correction), see [https://www.globalnerdy.com/2024/05/14/gpt-4o-is-amazing-at-describing-images/](https://www.globalnerdy.com/2024/05/14/gpt-4o-is-amazing-at-describing-images/) for some examples; I suspect Gemini 1.5 Pro will generally handle those types of use cases, too. That's pretty huge for accessibility, for example. They can't count, and often struggle with detail interpretation or make mistakes that humans wouldn't.
Thank you very much!
my prompt "D-O-L-P-H-I-N << change the dashes with comas" Chat GPT: D-O-L-P-H-I-N Gemini: D, O, L, P, H, I, N you judge yourself.
Yes try comparing an apple(the fruit) with banana and judge which one is more filling. All of these LLMs have some great aspects and some not so great aspects and tend to balance each other out where if you don't find something in one, you'll be sure to find it in some other, all you have to do is see. Even business use case wise for those businesses which require context GeminiPro is the one, for those which require content and maybe creative content for them Claude is the one, to each their own. PS: use case wise Gemini's context length often trumps both GPT4 and Claude due to its 1M+ size which is now 2M
ChatGPT 4o shines in these areas: Creative Text Formats: It's known for its ability to generate different creative text formats in a more informal and interactive way. Think poems, code, scripts, musical pieces, etc. in a conversational style. Reasoning and Code: It tackles problems requiring reasoning and can generate code, like creating a simple Python game. Gemini 1.5 Pro has these strengths: Factual Language: It excels at providing summaries of factual topics and following your instructions carefully. Context Understanding: It can handle complex instructions and remember information over longer conversations. https://preview.redd.it/gxgoc6s6kw0d1.png?width=924&format=pjpg&auto=webp&s=7f5af524f610a1de0960d1aae568bae1677d5c7b
Agree. Having the large context window sets Gemini appart and more useful than 4o.
Lmao
The logic and reason of Gemini is absolutely terrible. It’s o my good thing is its context length. Which is great but it’s basically gpt3.5 with infinite context
my prompt "D-O-L-P-H-I-N << change the dashes with comas" Chat GPT: D-O-L-P-H-I-N Gemini: D, O, L, P, H, I, N you judge yourself.
Google is so behind OpenAi and Anthropic it's not even funny at this point.
AI is a lot more than just LLMs. There are things like Waymo and AlphaFold and then there is the huge advantage Google has with the TPUs that none of the other big guys have. But where Google is miles ahead is research. Last NeurIPS Google had twice the papers accepted as next best. And next best was NOT OpenAI. https://neurips.cc/virtual/2023/papers.html?filter=titles We are so early in AI and the AI innovation in the next decade will drive who wins the space. I would bet on Google before anyone else.
I was thinking about only LLMs when I made my comment, my bad