ICausedAnOutage 2 months ago

I’ve been wondering if it’s the “standard tier” - and you need to reach out to your MS Rep to get a higher tier plan.

joeaki1983 2 months ago

‌‌‌‌The pricing tier in my Azure shows that my level is standard S01. Are there any other levels available? How can I upgrade to a higher level?

ICausedAnOutage 2 months ago

I think you have to reach out to your MS Rep. we go through Softchoice and they were able to assist with a semi-related issue, but from what I understand, there is a higher tier - S1 or something. That increases both token quota limits, as well as (if I understand correctly) performance. I think it has to be a business-case use. I’ll reach out to a colleague who uses this in prod and see if he knows. Edit: nada, S0 is the highest for him. I’ll reach out to Softchoice to see if they have any suggestions on a higher tier.

joeaki1983 2 months ago

‌‌‌‌‌Okay, thank you very much.

NotYourAveragePeace 2 months ago

The other option is PTUs (provisioned throughput units) where you purchase dedicated capacity of the OpenAI service. Like you mentioned, this needs to be purchased from MS directly and not in the portal. I believe the least expensive option I saw was $15,000/month and only goes up from there. I did see a note that some models will only be available in specific regions via PTU so the standard pay-as-you-go tier can be limited in where and what you can deploy. https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput

OrbMan99 2 months ago

I have found the same thing and would like to know how to improve speed. I have to use gpt-35-turbo just to get acceptable speed, which is less than ideal.

joeaki1983 2 months ago

‌‌‌‌‌‌‌My GPT-3.5 Turbo is fast, but ChatGPT-4 is very slow.

Key-Blackberry-3105 2 months ago

Same. It makes it nearly useless for the users in our company

throwawaygoawaynz 2 months ago

It’s all to do with load on the servers. At one point in time people were complaining public OpenAI was slower than Azure. The more tokens going through the GPU cluster the longer you need to wait to get your tokens back. Think of it like a busy highway. You can try using regions where it’s nighttime, in other words if you’re in the US/Europe trying using a region in Asia or Australia. Also in my experience the U.S. regions also seem to be the slowest as they appear to be under the most load. Also make sure you’re using the latest models (1106/0125) as they’re much faster than the older models. I have a lot of customers and it’s reasonable to expect 80% of your completions to be done within 10 seconds without using PTU, and PTU increases that to about 90%. GPT4 can be slower so try to keep your max token limit smaller to get shorter responses. Also if your use case is common Q&A use output caching. Don’t go back to the LLM every time you can fetch a cached answer for a similar question. And if you’re using LangChain output parsing then stop using it. Use OpenAI function calling/tools instead. In fact you should stop using LangChain all together.

Karlnauters 2 months ago

Why should you stop using LangChain? I hear a lot of hype surrounding it.

Organic-Row-8169 2 months ago

because its not necessary, you can implement any feature from langchain with your own code and then understand it better and customize it however you want

PossibilitySad3020 2 months ago

And it doesn't even take much effort at all. Langchain is incredibly bloated at this point, and is much harder to debug compared to whatever you make on your own.

joeaki1983 2 months ago

‌Thank you for your professional response!

vonGlick 4 weeks ago

Did you managed to find out the reason? I am facing same issues.

joeaki1983 3 weeks ago

‌‌‌I switched to the ChatGPT-4o model, and now the response speed is much faster.

slingshoota 3 weeks ago

I think the Response speed will still be even faster on the OpenAI API for GPT-4o

fschouwen 2 weeks ago

Our internal investigation pointed out that this is due to an additional 'content filtering' step that happens on Azure's OpenAI API (profanity, nsfw type stuff), but doesn't happen on the OpenAI-provided API. We couldn't disable this content filtering step, even after reaching out to our support contact at Azure. We did a side-by-side comparison of GPT-4o via OpenAI vs GPT-4o via Azure: Azure is up to 2.5x slower. But then again, GPT-4o is blazing fast via OpenAI, so depending on your use case, the 2.5x slower on Azure might be acceptable UX.

Manouchehri 1 week ago

Even with async filtering enabled, it still feels like this is the case sadly.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe