SAN DIEGO — For the past week, academics, startup founders and researchers representing industrial titans from around the globe descended on sunny San Diego for the top gathering in the field of artificial intelligence.
The Neural Information Processing Systems, or NeurIPS, conference has been held for 39 years, but it drew a record-breaking 26,000 attendees this year, twice as many as just six years ago.
Since its founding in 1987, NeurIPS has been devoted to researching neural networks and the interplay among computation, neurobiology and physics. While neural networks, computational structures inspired by human and animal cognitive systems, were once an esoteric academic fixation, their role underpinning AI systems has transformed NeurIPS from a niche meeting in a Colorado hotel to an event filling the entire San Diego Convention Center — also home to the world-famous Comic-Con.
But even as the gathering boomed along with the AI industry and sessions on hyperspecific topics like AI-created music proliferated, one of the buzziest points of discussion was basic and foundational to the field of AI: the mystery around how frontier systems actually work.
Most — if not all — leading AI researchers and CEOs readily admit that they do not understand how today’s leading AI systems function. The pursuit of understanding models’ internal structure is called interpretability, given the desire to “interpret” how the models function.
Shriyash Upadhyay, an AI researcher and co-founder of an interpretability-focused company called Martian, said the interpretability field is still in its infancy: “People don’t really understand fully what the field is about. There’s a lot of ferment in ideas, and people have different agendas.”
“In traditional incremental science, for example, where the ideas are mostly settled, scientists might attempt to add an additional decimal point of accuracy of measurement to particular properties of an electron,” Upadhyay said.
“With interpretability, we’re in the phase of asking: ‘What are electrons? Do electrons exist? Are they measurable?’ It’s the same question with interpretability: We’re asking, ‘What does it mean to have an interpretable AI system?’” Upadhyay and Martian used the NeurIPS occasion to launch a $1 million prize to boost interpretability efforts.
As the conference unfolded, leading AI companies’ interpretability teams signaled new and diverging approaches to understanding how their increasingly advanced systems work.
Early last week, Google’s team announced a significant pivot to shift away from attempts to understand every part of a model to more practical methods focused on real-world impact.
Neel Nanda, one of Google’s interpretability leaders, wrote in a statement that “grand goals like near-complete reverse-engineering still feel far out of reach” given that “we want our work to pay off within ~10 years.” Nanda highlighted AI’s rapid progress and lackluster advancements on the teams’ previous, more “ambitious reverse-engineering” approach as reasons for the switch.
On the other hand, OpenAI’s head of interpretability, Leo Gao, announced Friday and discussed at NeurIPS that he was doubling down on a deeper, more ambitious form of interpretability “to fully understand how neural networks work.”
Adam Gleave, an AI researcher and co-founder of the FAR.AI research and education nonprofit organization, said he was skeptical of the ability to fully understand models’ behavior: “I suspect deep-learning models don’t have a simple explanation — so it’s simply not possible to fully reverse engineer a large-scale neural network in a way that is comprehensible to a person.”
Despite the barrier to complete understanding of the complex systems, Gleave said he was hopeful that researchers would still make meaningful progress in understanding how models behave on many levels, which would help researchers and companies create more reliable and trustworthy systems.
“I’m excited by the growing interest in issues of safety and alignment in the machine-learning research community,” Gleave told NBC News, though he noted that NeurIPS meetings dedicated to increasing AI capabilities were so large that they “took place in rooms that could double as aircraft hangars.”
In addition to uncertainties in how models behave, most researchers are unimpressed with current methods for evaluating and measuring AI systems’ current capabilities.
“We don’t have the measurement tools to measure more complicated concepts and bigger questions about models’ general behavior, things like intelligence and reasoning,” said Sanmi Koyejo, a professor of computer science and leader of the Trustworthy AI Research Lab at Stanford University.
“Lots of evaluations and benchmarks were built for a different time when researchers were measuring specific downstream tasks,” Koyejo said, emphasizing the need for more resources and attention to create new, reliable and meaningful tests for AI systems.