DeepSeek, a new AI assistant from China, shakes Silicon ValleyNBC News LogoSearchSearchLiveNBC News LogoToday Logo | Latest News Today

Jan. 27, 2025, 8:30 AM EST / Updated Jan. 27, 2025, 1:03 PM EST

A small lab in China has shaken Silicon Valley.

The sudden appearance of an advanced AI assistant from DeepSeek, a previously little-known company in the Chinese city of Hangzhou, has sparked discussion and debate within the U.S. tech industry about what it says about the broader AI development race.

DeepSeek’s assistant hit No. 1 on the Apple App Store in recent days, and the AI models powering the assistant are already outperforming top U.S. models, with the company saying that they were made with a fraction of the resources.

DeepSeek released its latest large language model, R1, a week ago. Second only to OpenAI’s o1 model in the Artificial Analysis Quality Index, a well-followed independent AI analysis ranking, R1 is already beating a range of other models including Google’s Gemini 2.0 Flash, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.3-70B and OpenAI’s GPT-4o.

“DeepSeek R1 is AI’s Sputnik moment,” entrepreneur Marc Andreessen, known for cowriting Mosaic, one of the world’s first web browsers, wrote Sunday on X, likening it to the space race between the U.S. and the Soviet Union and the event that forced the U.S. to realize that its technological abilities were not unassailable.

On Monday, DeepSeek released yet another AI model, Janus-Pro-7B, which is multimodal in that it can process various types of media including images. The company says it “surpasses previous unified model and matches or exceeds the performance of task-specific models.”

DeepSeek Shakes Up Stocks as Traders Question US Tech Valuations — The logo of the DeepSeek artificial intelligence mobile app. Andrey Rudakov / Bloomberg via Getty Images

Tech stocks dropped sharply Monday, with the Nasdaq Composite declining 3.4% just minutes into the trading day. Big U.S. tech companies are investing hundreds of billions of dollars into AI technology.

One of R1’s core competencies is its ability to explain its thinking through chain-of-thought reasoning, which is intended to break complex tasks into smaller steps. This method enables the model to backtrack and revise earlier steps — mimicking human thinking — while allowing users to also follow its rationale.

At last week’s World Economic Forum in Switzerland, Microsoft CEO Satya Nadella — whose company is one of OpenAI’s biggest investors — called DeepSeek’s new model “super impressive,” adding that he believes “we should take the developments out of China very, very seriously.”

Both R1 and o1 are part of an emerging class of “reasoning” models meant to solve more complex problems than previous generations of AI models. But unlike OpenAI’s o1, DeepSeek’s R1 is free to use and open weight, meaning anyone can study and copy how it was made.

R1 was based on DeepSeek’s previous model V3, which had also outscored GPT-4o, Llama 3.3-70B and Alibaba’s Qwen2.5-72B, China’s previous leading AI model. Upon its release in late December, V3 was performing on par with Claude 3.5 Sonnet.

Part of what makes R1 so impressive are the claims from DeepSeek about its development.

V3 took only two months and less than $6 million to build, according to a DeepSeek technical report, even as leading tech companies in the United States continue to spend billions of dollars a year on AI. DeepSeek also had to navigate U.S. export restrictions that limited access to the best AI computing chips, forcing the company to build its models with less-powerful chips.

It’s ignited a heated debate in American tech circles: How did a small Chinese company so dramatically surpass the best-funded players in the AI industry? And what does this mean for the field going forward?

Meta’s chief AI scientist Yann LeCun wrote in a Threads post that this development doesn’t mean China is “surpassing the US in AI,” but rather serves as evidence that “open source models are surpassing proprietary ones.” He added that DeepSeek benefited from other open-weight models, including some of Meta’s.

“They came up with new ideas and built them on top of other people’s work. Because their work is published and open source, everyone can profit from it,” LeCun wrote. “That is the power of open research and open source.”

(Although many companies, including DeepSeek and Meta, claim their AI models are open source, they have not actually revealed their training data to the public.)

OpenAI CEO Sam Altman also appeared to take a jab at DeepSeek last month, after some users noticed that V3 would occasionally confuse itself with ChatGPT. A day after V3’s Dec. 26 release, Altman wrote on X that “it is (relatively) easy to copy something that you know works. it is extremely hard to do something new, risky, and difficult when you don’t know if it will work.”

On Monday, Altman said the new R1 was “an impressive model, particularly around what they’re able to deliver for the price.”

“We will obviously deliver much better models and also it’s legit invigorating to have a new competitor!” he wrote on X. “We will pull up some releases.”

“But mostly we are excited to continue to execute on our research roadmap and believe more compute is more important now than ever before to succeed at our mission,” he added.

Some figures online floated unsubstantiated claims that DeepSeek’s success is a Chinese government “psyop,” or psychological operation, casting suspicion on the small team’s ability to “beat all of the top researchers in the world as a side project.”

Soumith Chintala, a co-founder of PyTorch, the machine learning library developed by Meta AI, was among many this weekend who hit back at these allegations.

“i’m comically impressed that people are coping on deepseek by spewing bizarre conspiracy theories — despite deepseek open-sourcing and writing some of the most detail oriented papers ever,” Chintala posted on X. “read. replicate. compete. don’t be salty, just makes you look incompetent.”