
Researchers at MIT and Harvard have recently unmasked a concerning reality about generative artificial intelligence: despite its ability to accomplish complex tasks, such as navigating New York City's streets or generating computer code, it does not possess a coherent understanding of the world. In a study highlighted by MIT News, the research team has uncovered that while these AI models -- specifically large language models (LLMs) based on the transformer architecture -- may appear intelligent, their internal world models can be significantly flawed.
The investigation spearheaded by postdoc Keyon Vafa, MIT's Ashesh Rambachan, and other collaborators, revealed that when challenged with changes in their environment, such as the closure of certain streets, the AI's navigational accuracy deteriorates sharply. This discovery challenges the notion that LLMs are on the verge of learning general truths. Rambachan told MIT News, “One hope is that, because LLMs can accomplish all these amazing things in language, maybe we could use these same tools in other parts of science, as well. But the question of whether LLMs are learning coherent world models is very important if we want to use these techniques to make new discoveries.”
In response to these findings, the team has proposed novel metrics for testing the coherence of an AI's world model, focusing on deterministic finite automatons (DFAs), which simplify the complex reality into a series of states and rules. For instance, they assessed the AIs' abilities to handle tasks such as wayfinding in New York and playing Othello by developing metrics like sequence distinction and sequence compression. These metrics are aimed at measuring the AI's ability to see and differentiate or compress sequences, hence assessing the coherence of the internal world model it may have formed.
The results were surprising, to say the least. The transformers' models, which based their "choices" on randomness, tended to construct more accurate world models than those trained on strategic data, which may have been limited by the confines of their training data. "I was surprised by how quickly the performance deteriorated as soon as we added a detour. If we close just 1 percent of the possible streets, accuracy immediately plummets from nearly 100 percent to just 67 percent," said Vafa. These detours in the experiment caused drastic failures, indicating that the ability of transformers to perform well did not equate to understanding the rules governing the tasks.
The implications of this research pose significant questions on the deployment of LLMs in real-world scenarios, particularly in areas where an accurate representation of the world is crucial. The research team, having presented their work at the Conference on Neural Information Processing Systems, aims to expand their approach to a variety of other problems, including those in scientific fields where rules are only partially known. Such endeavors will be fundamental to improving the trustworthiness and reliability of generative AI models, especially as they become more entwined with everyday applications.









