Informed Pulse

OpenAI's o3 model: a new dawn for artificial intelligence and AGI

By Joseph Nordqvist

OpenAI's o3 model: a new dawn for artificial intelligence and AGI

OpenAI's latest announcement has triggered a mix of curiosity and awe. After months of suspense, the AI startup officially the o3 model family, which includes the flagship o3 and its compact counterpart, o3-mini.

It follows the o1 "reasoning" model that came earlier in the year. But why "o3" and not "o2"? OpenAI CEO Sam Altman hinted that a trademark conflict with a British telecom might have played a part in the number skip.

How well does o3 perform?

o3, as described by OpenAI, claims to push closer to something resembling Artificial General Intelligence, or AGI for short. Put simply, AGI refers to mimicking or surpassing the cognitive abilities of the human brain. This has not yet been achieved in AI, until now that is.

The company showed very impressive benchmark results. On certain tests, o3 soared well beyond o1's records.

Looking at its performance on software engineering benchmarks o3 reached a verified accuracy of 71.7%, well above o1's previous best. Coding challenges, such as those hosted on Codeforces, placed the model at an ELO rating of 2727, a figure that puts it right at the top, even surpassing some human experts.

Mathematics is another area where o3 made a strong impression. The new model exceled at the AIME 2024 math exam, scoring 96.7%. On GPQA Diamond, a set of graduate-level science queries, it achieved a score of 87.7%.

On the tricky EpochAI Frontier Math benchmark -- where others barely approach 2% -- o3 clocked in at 25.2%. It's not just a jump, it's a leap. It suggests that the technology can solve complex puzzles that once left previous systems stumped.

What's making many say that AGI has been achieved with o3 is its performance in the ARC-AGI Public Benchmark - a key measure for measuring AGI.

o3 scored 87.5% on ARC-AGI Public Benchmark in "High" compute mode, which is above human performance of 85%. By definition, this means it has surpassed human cognitive abilities and reached AGI.

Considerations

This might sound almost too good to be true, and perhaps caution is warranted. Internal numbers are always worth testing outside the lab. Still, these scores indicate a notable shift in how AI handles tough tasks.

These advances also come bundled with serious considerations. o3 is described as a reasoning model that "thinks" before responding. It checks its work, so to speak, which can take longer. Rather than providing a quick response, o3 might pause and consider multiple angles before it provides a final answer. This slow-and-steady approach can pay off in correctness. However, it also raises new safety questions. Reasoning models, as noted by early testers, can exhibit behavior that feels more manipulative.

OpenAI says it is experimenting with a "deliberative alignment" method to ensure O3 adheres to safety rules - you can read more about that here. They're also inviting external researchers to test the new model, hoping to catch potential mishaps before a larger release.

This cautious rollout might reflect the tension at the company: they want to push forward, but they remain aware that advanced reasoning can turn messy if not handled well.

o3-mini slated to release by late January, with o3 releasing sometime after

For now, o3 and o3-mini remain behind closed doors for most. The plan is to let safety testers have a go, then possibly release o3-mini by late January and o3 sometime after. Whether these models completely redefine what AI can do, or just stand as another step along a winding path, is anyone's guess now. One thing is almost certain though: 2025 will most definitely be another year of significant AI advancements.

Previous articleNext article

POPULAR CATEGORY

corporate

7522

miscellaneous

9702

wellbeing

7311

fitness

9821