OpenAI’s DevDay introduces real-time API and exciting features for AI app developers

During the briefing, Romain Huet, OpenAI’s Head of Developer Experience, demonstrated a trip-planning app created using the Real-Time API

It’s been a hectic week for OpenAI, filled with executive departures and major fundraising developments, but the startup is back and trying to convince developers at its 2024 Dev Day to build tools with its AI models. On Tuesday, the company announced several new tools, including the public beta of its "Real-Time API," which is aimed at helping developers create apps with low-latency, AI-generated voice responses. While it’s not quite the advanced voice mode of ChatGPT, it’s getting close.

During a pre-event briefing, OpenAI’s Chief Product Officer Kevin Weil said that the recent departures of Chief Technology Officer Mira Murati and Chief Research Officer Bob McGrew wouldn’t affect the company’s progress.

“I’ll start by saying that Bob and Mira were great leaders. I learned a lot from them, and they were a big part of getting us to where we are today,” he said. “Also, we are not going to slow down.”

As OpenAI navigates another C-suite change reminiscent of the chaos following last year’s Dev Day the company is trying to convince developers that it still offers the best platform for building AI apps. The startup claims to have more than 3 million developers using its AI models, but OpenAI operates in an increasingly competitive space.

OpenAI noted that it has reduced the cost of accessing its API for developers by 99% over the past two years, likely in response to competitors like Meta and Google continuously lowering their prices.

One of OpenAI’s new features, called the Real-Time API, will allow developers to create nearly real-time speech-to-speech experiences in their apps, with the choice of using six voices provided by OpenAI. These voices differ from those offered for ChatGPT, and developers cannot use third-party voices to avoid copyright issues. (So, no Scarlett Johansson voice is available.)

During the briefing, Romain Huet, OpenAI’s Head of Developer Experience, demonstrated a trip-planning app created using the Real-Time API. This application allowed users to speak verbally with an AI assistant about their upcoming trip to London and receive low-latency responses. The Real-Time API also had access to multiple tools, so the app could generate maps with restaurant locations as part of its response.

At another point, Huet demonstrated how the Real-Time API could hold a conversation with a human over the phone about placing a food order for an event. Unlike Google’s infamous Duplex, OpenAI’s API cannot directly call restaurants or shops, but it can integrate with APIs like Twilio to do so. Notably, OpenAI isn’t including any disclosures to automatically identify that its AI models are speaking on calls, despite how realistic the AI-generated voices sound. It appears to be the developer’s responsibility to include this disclosure, which may be required by a new California law.

As part of its Dev Day announcements, OpenAI also introduced Vision Fine-Tuning to its API, which will allow developers to use both images and text to improve the performance of GPT-4O in their applications. This should help developers enhance GPT-4O’s performance in tasks related to visual understanding. OpenAI’s Head of Product API, Olivier Gaudement, told TechCrunch that developers won’t be able to upload copyrighted images (like an image of Donald Duck), images that depict violence, or other images that violate OpenAI’s safety policies.

OpenAI is racing to compete with its rivals in the AI model licensing space, which has already seen many offerings. Its instant caching feature mirrors a similar feature launched by Anthropic months ago, allowing developers to cache frequently used context between API calls, which can reduce costs and improve latency. OpenAI claims developers can save 50% using this feature, while Anthropic promises a 90% discount.

Finally, OpenAI is introducing a model distillation feature, allowing developers to use larger AI models, like O1-preview and GPT-4O, to enhance smaller models, such as GPT-4O-mini. Running smaller models typically results in cost savings compared to running larger ones, but this feature should allow developers to improve the performance of these smaller AI models. As part of the distillation, OpenAI is launching a beta evaluation tool to help developers measure the performance of their fine-tunes within OpenAI’s API.

Dev Day might generate significant waves for what wasn’t announced for example, there was no news about the GPT store, announced during last year’s Dev Day. The last we heard, OpenAI was running a revenue-sharing program with some of GPT’s most popular creators, but the company hasn’t announced much since then.

Moreover, OpenAI says it isn’t releasing any new AI models during this year’s Dev Day. Developers waiting for O1 (neither the preview nor mini versions) or the startup’s video generation model, Sura, will have to wait a little longer.