
A new study from New York University Stern, conducted in collaboration with private credit and wealth startup Goodfin, has found that large language models (LLMs) can now pass mock versions of the Chartered Financial Analyst Level III exam, one of the most demanding tests of analytical and ethical reasoning in finance.
The results have drawn attention for suggesting that AI may be capable of tackling complex professional exams. But as NYU Stern Professor Srikanth Jagabathula, the co-author of the study told PYMNTS, the findings represent progress in AI reasoning, not readiness for real-world financial decision-making.
What the Study Found
“In traditional machine-learning systems, we have some understanding of their behavior, but not a complete understanding yet,” Jagabathula said. “With these new LLM-based systems, because the models are huge, they raise other types of questions. The first thing to ask is: Do they even have the capabilities to start with?”
That question helped shape the study, which evaluated 23 state-of-the-art large language models across multiple-choice and essay-style mock CFA Level III exams. The questions were drawn from professional CFA preparation materials and reflected the structure and difficulty of the official exam. Earlier research found that language models could pass mock versions of the CFA Level I and II exams but struggled with Level III. This study is the first to show models reaching passing scores on Level III mock exams, which test higher-order reasoning. Researchers assessed leading frontier models from OpenAI, Google and Anthropic, along with several open-source systems.
In the study, essay questions revealed the clearest differences in reasoning. Higher-performing models produced structured, coherent answers, while lower performers often gave incomplete or inconsistent responses. The authors said essays remain the best measure of reasoning progress because they test judgment rather than memorization, a skill that is essential for providing sound financial advice.
Real World Implications
While the study shows that advanced AI systems can meet the threshold for a mock CFA exam, it cautions that this does not mean they are ready for licensed financial work. The models can follow reasoning patterns but lack contextual awareness and ethical judgment.
“On a day-to-day basis these models have impressive capabilities,” Jagabathula said. “But there are still key limitations, especially in high-stakes settings. Some components in financial advising can be automated right now, but by no means do we expect them to fully take over. We’re not seeing clear evidence of that.”
For Goodfin, which focuses on private credit and wealth solutions, this collaboration reflected an interest in exploring how AI can be applied responsibly in finance. Shilpi Nayak, CTO and co-founder of Goodfin, said the company views such research as useful for improving transparency and access in financial decision-making. “This research highlights how AI can support more reliable solutions in financial services as the technology continues to mature,” Nayak added.
When asked how his students view these advances, Jagabathula said, “It’s a combination of anxiety and optimism. The optimism comes from how empowering these tools can be, especially for students who aren’t from technical backgrounds. The anxiety comes from uncertainty, because nobody knows exactly where things are headed.”
He said that sense of uncertainty extends to the broader field of AI. It remains difficult to predict how advanced these systems will become in the next few years. “What we want people to take away from this is that certain components can already be automated, but not everything should be,” he said. “It’s a proof of progress, but not the end of the story.”
Source: https://www.pymnts.com/