OpenAI Says Its Latest Reasoning Models Upgrade Agentic Capabilities

OpenAI unveiled its latest AI models this week that it says are poised to bring more capable AI agents to business. These new reasoning models can use all of ChatGPT’s tools and even incorporate images — a first for the company.

The new o3 and o4-mini reasoning models are the “smartest” it has released to date and represent a “step change” in ChatGPT’s capabilities, according to an OpenAI blog post. The models make ChatGPT more capable — and more similar to holistic human capabilities.

“There are some models that feel like a qualitative step into the future,” OpenAI President Greg Brockman said in an OpenAI video. He said GPT-4 was one of those models — and now o3 and o4-mini.

These models represent a big step toward more robust agentic artificial intelligence (AI) systems that can independently execute tasks on behalf of users, the company says. With full access to ChatGPT’s tools and custom tools, these models can autonomously coordinate multiple actions to solve complex problems.

OpenAI’s latest models are the latest salvo in an increasingly crowded AI market.

Google DeepMind’s Project Astra, a multimodal AI assistant, comes the closest in capability and was first unveiled a year ago. Astra can see, hear and understand its surroundings. However, Astra is not as advanced in reasoning, is not agentic and hasn’t been released publicly.

In March, OpenAI CPO Kevin Weil said at a conference that while ChatGPT is currently at the top, “it doesn’t mean that we’re going to have a lead forever. I think those days of us having a 12-month lead are probably gone — there’s just too many smart people, too much going on in the ecosystem.”

Whether or not OpenAI stays in the lead, companies are all in on AI.

Nearly 90% of CFOs report that they are seeing a “very positive” ROI from generative AI, according to a February PYMNTS Intelligence CAIO Report. That’s three times as many as those who said so in March 2024.

Moreover, at least 91% of CFOs surveyed have “high” or “complete trust” in generative AI’s output in 10 key areas, partly due to the use of their company’s own data as the basis for the AI’s responses.

However, 29% did say that the AI’s responses “might not be very insightful” — this was the top concern about generative AI’s outputs.

How o3 and o4-mini Are Different

Here’s what makes these models different from OpenAI’s other models:

They can use every tool within ChatGPT, including searching the internet, analyzing uploaded files and other data, reasoning about images and generating images.
They incorporate images directly into their reasoning, which boosts their problem-solving skills. Images can be blurry, upside down or drawn by hand. The models can zoom into an image, if needed.
They merge the reasoning capabilities of OpenAI’s o-series AI models with the conversational abilities of the GPT series of large language models (LLMs).
They reason through which tools to use — one task called for using 600 tools — to solve complex problems, usually within a minute. OpenAI said this translates to “significantly” better performance.

“The reason we’re so excited about tools is that it makes our reasoning models that much more useful and that much smarter,” Mark Chen, OpenAI’s chief research officer, said in the OpenAI video.

One example: A user asks, “How will summer energy usage in California compare to last year?” The model searches the internet for public utility data, writes Python code to build a forecast, generates a graph or image and explains key factors for the prediction, according to OpenAI.

As for performance, the startup said o3 makes 20% fewer major errors than predecessor o1 (o2 was skipped) on difficult, real-world tasks. It especially excelled in programming, consulting and coming up with creative ideas.

The o4-mini model, meanwhile, focuses on balancing performance with efficiency. This smaller model excels in mathematics, coding and visual analysis tasks. Efficiency gains let o4-mini support higher usage volumes than o3-mini, making it ideal for larger and more complex tasks.

OpenAI said for most real-world uses, o3 and o4-mini will be cheaper than o1 and o3-mini while outperforming them on tasks.

The models are now available for ChatGPT Plus, Pro and Team users. ChatGPT Enterprise and Edu users will get them in a week. Free users can try o4-mini by selecting “Think” before entering a prompt. Developers can access the models via the Chat Completions API and Responses API.

OpenAI o3-pro should be out in a few weeks.

Source: https://www.pymnts.com/

How o3 and o4-mini Are Different

Related Posts