LLM Reasoning: Where it is today, where it's going, and its limits

March 4, 2025

Last week, we gathered at Storytell HQ in San Mateo for another Building the Future of AI meetup, co-hosted by Storytell, FounderCulture, and UpHonest Capital. This event focused on the evolving landscape of reasoning in large language models (LLMs).

Tim Kellogg, principal architect at Icertis, led the discussion with an insightful breakdown of where LLM reasoning stands today, where it’s heading, and the challenges it still faces.

The state of LLM reasoning today

Tim opened the conversation by defining AI reasoning and how it has evolved beyond simple pattern recognition. At its core, LLM reasoning involves a systematic process of adhering to a predefined set of rules to arrive at a logical conclusion. Modern reasoning models are meticulously trained to execute these processes, enabling them to tackle complex tasks. While early models were primarily designed to retrieve and synthesize information; modern LLMs are beginning to follow structured problem-solving processes.

One of the examples that stuck with us was the strawberry meme—a reminder of how early LLMs struggled to count the number of ‘R’s in “strawberry” due to tokenization quirks. More recent models, like Claude 3.7, can now get this right, showing improvements in how AI processes text at a deeper level.

Tim also introduced the concept of procedural knowledge acquisition, where AI doesn’t just memorize facts but learns structured problem-solving methods. He referenced a study that showed how LLMs solving math problems don’t just recall answers but apply strategies from previously encountered problems—evidence that models are beginning to generalize reasoning processes.

Where AI reasoning is making an impact

We explored several areas where AI reasoning is already showing its strengths:

Mathematics: LLMs are improving in step-by-step problem-solving instead of just outputting final answers.
Law & Contracts: AI is being used to assess legal risks, analyze contracts, and highlight potential issues in complex agreements.
Geospatial Mapping: While still developing, AI is making progress in processing spatial relationships for navigation and logistics.
Robotics: Reinforcement learning is allowing robots to simulate and optimize movement more effectively.
AI Self-Training in Games: OpenAI’s work on AI competing against itself in Dota 2 was highlighted as a major step in self-improving AI models.

Where LLM reasoning is heading

Tim walked us through some of the recent advancements in reasoning-focused AI models, such as Claude 3.7. These models are moving beyond simple pattern matching and starting to exhibit multi-step reasoning abilities.

One of the areas of development is chain-of-thought reasoning. Tim explained that newer models can now autonomously break problems into logical steps, improving their ability to handle more complex reasoning tasks. Unlike earlier models that required explicit prompting to think step-by-step, modern LLMs are beginning to do this natively.

That said, there’s still a long way to go before AI can generate entirely new reasoning frameworks. While these models are great at optimizing existing processes, they haven’t yet demonstrated the ability to create fundamentally new problem-solving approaches the way humans can.

Current limitations of AI reasoning

Even with these advancements, AI reasoning has notable shortcomings:

Lack of human-like intuition: AI can recognize patterns but doesn’t truly understand concepts in the way humans do.
Struggles with judgment-based reasoning: While LLMs can analyze contracts or legal arguments, they lack the depth of reasoning that human experts bring.
Inconsistent reasoning accuracy: AI is improving, but models still occasionally make critical reasoning errors, which makes their application in high-stakes fields challenging.

Key Takeaways from the Q&A

The audience had some great questions, pushing the discussion even further:

How well do LLMs understand and simulate human reasoning? Tim explained that while LLMs can follow structured logic, they still lack the depth of understanding that humans bring to reasoning tasks.
What kind of pre-processing happens before an AI processes a prompt? Tim explained that models perform a tokenization step, breaking text into tokens before applying attention mechanisms to interpret context. This allows AI to recognize patterns, correct misspellings, and refine input before generating a response.
Can improvements be made to tokenization and pre-processing in LLMs? Tim discussed research exploring token-free models that interpret input at the byte level, which could enhance multi-modal capabilities. While promising, this approach is still experimental, and it’s unclear if it will work at scale.

Final thoughts

The discussion reinforced that while AI reasoning is making impressive strides, it remains an evolving field. Chain-of-thought reasoning and reinforcement learning are pushing AI’s capabilities further, but human oversight and expertise are still essential.

For those working in AI, this session was a reminder that there’s still a lot of opportunity in this space. Whether you’re a researcher, engineer, or founder, there’s plenty of room to build AI systems that enhance structured decision-making across industries.

Watch the full recording: