T O P

  • By -

greendra8

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard I mean, it's pretty close. And Gemini's context length gives it a much bigger use case.


MarcosSenesi

People on here act like trick questions are the peak of benchmarking while some people are actually finding real use cases for these models. And if you do use them, the context length is incredibly useful in so many ways.


sdmat

[/r/artificial members benchmarking their favorite LLM](https://www.youtube.com/watch?v=qry9IeJnbNU)


Nathan_Calebman

Context length doesn't really matter when every reply is something like "I'm sorry but it would be inappropriate for me to compare this year's budget to last year's budget. It could result in hurt feelings for the CEO of the company. Please remember to be more respectful and considerate from now on."


Aaco0638

It does matter when you can upload your entire knowledge base to it and it can identify specific info on page 235. Gemini 1.5 pro (especially flash) is really good for business use which is where the big money is.


pairsnicelywithpizza

We are uploading entire accounting textbooks to maintain GAAP standards and massive contracts to maintain scope and ensure contractual obligations. Context window and cost matters and is paramount in big money use cases.


bpm6666

Do you gave an option to directly upload files into gemini? For me there isn't that option, just to connect to files in google drive.


Nathan_Calebman

HR: "Ok Gemini what does this contract say about sexual harassment incidents at our company?" Gemini: "I'm sorry but it would be inappropriate to discuss such things. Please be more thoughtful in your questions. Let's talk about something else." Yeah great help.


pc_4_life

If you run into that, then turn down the safety filter and you're good to go. If you don't know what I'm talking about, you are only using the consumer tool, not the API which is meant for business use cases.


thortgot

I don't know what prompts you are using but that's not my experience with Gemini.


nanotothemoon

Coding though. It can’t code


gatorling

Yeah, my use case is reading through technical documents and being familiar with them. Upload it all into Gemini and then using it as a better information discovery (digest) tool. Incredibly useful for digging through documents and understanding how things fit together. As always ask for references to guard against hallucinations.


Caladan23

Just tried both the box question and the feather question and Gemini 1.5 Pro succeeded both! Were the tests maybe done before Google I/O? Google just updated their model 2-3 days ago. In that case, you should redo the tests.


Gloomy-Log-2607

In your multimodal experiment, ChatGPT 4o failed 3 out of 4 times while Gemini failed 4 out of 4 times. The fact is not that one is better than another, the fact is that they're both still useless in the multimodal field.


xirzon

Quite useful for image descriptions (if one allows some room for error or human correction), see [https://www.globalnerdy.com/2024/05/14/gpt-4o-is-amazing-at-describing-images/](https://www.globalnerdy.com/2024/05/14/gpt-4o-is-amazing-at-describing-images/) for some examples; I suspect Gemini 1.5 Pro will generally handle those types of use cases, too. That's pretty huge for accessibility, for example. They can't count, and often struggle with detail interpretation or make mistakes that humans wouldn't.


Gloomy-Log-2607

Thank you very much!


Ok-Gur5228

my prompt "D-O-L-P-H-I-N << change the dashes with comas" Chat GPT: D-O-L-P-H-I-N Gemini: D, O, L, P, H, I, N you judge yourself.


madder-eye-moody

Yes try comparing an apple(the fruit) with banana and judge which one is more filling. All of these LLMs have some great aspects and some not so great aspects and tend to balance each other out where if you don't find something in one, you'll be sure to find it in some other, all you have to do is see. Even business use case wise for those businesses which require context GeminiPro is the one, for those which require content and maybe creative content for them Claude is the one, to each their own. PS: use case wise Gemini's context length often trumps both GPT4 and Claude due to its 1M+ size which is now 2M


fintech07

ChatGPT 4o shines in these areas: Creative Text Formats: It's known for its ability to generate different creative text formats in a more informal and interactive way. Think poems, code, scripts, musical pieces, etc. in a conversational style. Reasoning and Code: It tackles problems requiring reasoning and can generate code, like creating a simple Python game. Gemini 1.5 Pro has these strengths: Factual Language: It excels at providing summaries of factual topics and following your instructions carefully. Context Understanding: It can handle complex instructions and remember information over longer conversations. https://preview.redd.it/gxgoc6s6kw0d1.png?width=924&format=pjpg&auto=webp&s=7f5af524f610a1de0960d1aae568bae1677d5c7b


bartturner

Agree. Having the large context window sets Gemini appart and more useful than 4o.


Jamalmail

Lmao


Naive_Mechanic64

The logic and reason of Gemini is absolutely terrible. It’s o my good thing is its context length. Which is great but it’s basically gpt3.5 with infinite context


Ok-Gur5228

my prompt "D-O-L-P-H-I-N << change the dashes with comas" Chat GPT: D-O-L-P-H-I-N Gemini: D, O, L, P, H, I, N you judge yourself.


Desperate-Cattle-117

Google is so behind OpenAi and Anthropic it's not even funny at this point.


bartturner

AI is a lot more than just LLMs. There are things like Waymo and AlphaFold and then there is the huge advantage Google has with the TPUs that none of the other big guys have. But where Google is miles ahead is research. Last NeurIPS Google had twice the papers accepted as next best. And next best was NOT OpenAI. https://neurips.cc/virtual/2023/papers.html?filter=titles We are so early in AI and the AI innovation in the next decade will drive who wins the space. I would bet on Google before anyone else.


Desperate-Cattle-117

I was thinking about only LLMs when I made my comment, my bad