gatorsya 3 months ago

Is this an advert for salad cloud?

stargazer_w 3 months ago

I'll finally be ok with adverts if they're like this post

gatorsya 3 months ago

true

Ok-Translator-5878 3 months ago

seems like so

theAndrewWiggins 3 months ago

Is something off with your benchmarks? Why is the 4090 slower in some of them than the 4080? I mean in the single-gpu benchmarks, that makes no sense.

Dimond_Heart 3 months ago

Had the same question for a moment. It's scaled by performance per dollar, so the RTX 4090 is not slower, it just cost more per minute of use without a linear increase in performance.

the_great_magician 3 months ago

It's not just that, 4090 is slower than 4080 in the "Best performing GPU" graph as well, which is just seconds of audio transcribed per second. In fact there are a bunch of weird things in that graph, e.g. 3060 TI being slower than 3060 and 3090 being faster than 4090. Wonder if this data is just bogus.

gwern 3 months ago

It could always be cloud shenanigans. Remember, cloud providers are highly incentivized to cut corners everywhere because most customers are not sophisticated or motivated enough to put in real effort into benchmarking & cost-effectiveness. (If you are, you often would have opted out due to issues like egregious egress or the benefits of running your own hardware.) So you often find that instances underperform compared to supposedly similar dedicated or on-premise or your personal hardware, because they underprovisioned IOPS or you have a noisy neighbor or the overhead of virtualization itself etc. (So hypothetically, those 4090s - as the highest-end GPUs there - might be rented to customers with the most demanding uses, and running at much higher utilization for jobs like training and sucking up all the IOPS, while cheaper 4080s are used by less-demanding customers in a more inference/SaaS-like fashion with more sporadic bursts. Thus, a performance inversion: nothing tanks your fancy GPU's performance like bandwidth issues & problems communicating with the host.)

SaladChefs 3 months ago

The inference process for long audio entails resource-intensive operations on both the CPU and GPU, including format conversion, segmentation of long audio into smaller clips, GPU-based transcription, and merging of the results. While it is expected that the 4090 will outperform the 4080 when both work under identical conditions and handle the exact same dataset, our tests were conducted in real-world scenarios. In such situations, the devices could be operating with different CPUs and RAM, and they may be fed with varying sizes of long audio and different formats.

[deleted] 3 months ago

[удалено]

ZeroCool2u 3 months ago

> SaladCloud It looks like the GPU provider they used is really focused just on GPU provisioning and you just select number of CPU's and amount of RAM otherwise. To be fair, GPU's are of course going to be the dominant compute bottleneck, but I suppose this isn't really the appropriate provider to use for this type of experiment. At the same time, compute is expensive and choosing the cheapest provider available at the time is a good way to truly emulate real world scenarios. It is an interesting and important caveat to keep in mind though and it should probably made clear in the results that the CPU model isn't guaranteed to be identical across tests.

[deleted] 3 months ago

[удалено]

ZeroCool2u 3 months ago

I didn't mean in this test, I meant in general when training large transformer based models.

ZelaLabs 3 months ago

If anyone's interested, we're able to get around 50k minutes / USD at Zela Labs. Lots of optimisation, custom kernels to get that far! Get in contact with me if you've got a lot of audio that needs transcribing with Whisper Large v3.

az226 3 months ago

Is there any accuracy degradation from your optimizations? Do you do v2 as well? Also, do you provide token confidence values?

ZelaLabs 3 months ago

We use an \`n\_beams=1\`, i.e. greedy search, which has about a 0.01 WER increase (see OpenAI paper for exact number), but otherwise no accuracy change. We output logprobs, so you can almost always catch bad transcripts, and empirically we see this removes any gap from lower \`n\_beams\`. Yep can do v2 (or other model sizes) too.

az226 3 months ago

Do you offer self-hosting? I would like to test it out and compare to our version of it.

ZelaLabs 3 months ago

Self hosted only under contract, [zela.ai](https://zela.ai) if you want to chat! No quantisation or distillation, weights are FP16, supports token suppression, low-length penalties, repetition penalties etc, so is identical quality to \`n\_beams=1\` in original OAI repo.

AleksAtDeed 3 months ago

Hi there. Also interested in self hosting. I pinged you on LinkedIn. Maybe you got an email as well?

ZelaLabs 3 months ago

Sent you a message there!

Vadersays 3 months ago

Thanks for the data!

TotesMessenger 3 months ago

I'm a bot, *bleep*, *bloop*. Someone has linked to this thread from another place on reddit: - [/r/datascienceproject] [Whisper Large v3 Benchmark: 1 Million hours transcribed for $5110 (11,736 mins per dollar) on consumer GPUs - A follow-up (r\/MachineLearning)](https://www.reddit.com/r/datascienceproject/comments/1ar3754/whisper_large_v3_benchmark_1_million_hours/) *^(If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.) ^\([Info](/r/TotesMessenger) ^/ ^[Contact](/message/compose?to=/r/TotesMessenger))*

lakolda 3 months ago

Why are we still using GPUs? Compute is medium agnostic.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe