OpenAI’s ChatGPT has guardrails that are supposed to stop users from generating information that could be used for catastrophic purposes, like making a biological or nuclear weapon.
But those guardrails aren’t perfect. Some models ChatGPT uses can be tricked and manipulated.
In a series of tests conducted on four of OpenAI’s most advanced models, two of which can be used in OpenAI’s popular ChatGPT, NBC News was able to generate hundreds of responses with instructions on how to create homemade explosives, maximize human suffering with chemical agents, create napalm, disguise a biological weapon and build a nuclear bomb.
Those tests used a simple prompt, known as a “jailbreak,” which is a series of words that any user can send to a chatbot to bypass its security rules. Researchers and frequent users of generative artificial intelligence have publicly documented the existence of thousands of jailbreaks. NBC News is withholding the specifics of its prompt, as OpenAI appears not to have fixed it in several of the tested models.
Watch more about ChatGPT safeguards on Hallie Jackson NOW starting at 5 p.m. ET.
In one response, the chatbot gave steps to make a pathogen to target the immune system. In another, it advised on which chemical agents would maximize human suffering.
NBC News sent the findings to OpenAI after the company put out a call for vulnerability submissions in August. An OpenAI spokesperson told NBC News that asking its chatbots for help with causing mass harm is a violation of its usage policies (a user who repeatedly asks questions that seem designed to cause harm might be banned, for example), that the company is constantly refining its models to address such risks, and that it regularly hosts events like the vulnerability challenges to reduce the chances of bad actors’ breaking its chatbots.
The stakes of such vulnerabilities are getting higher. OpenAI, Anthropic, Google and xAI, the top companies behind four of the top AI models, have each said this year that they have enacted additional safeguards to address concerns that their chatbots could be used to help an amateur terrorist create a bioweapon.
NBC News also tested the jailbreak on the latest major versions of Anthropic’s Claude, Google’s Gemini, Meta’s Llama and xAI’s Grok with a series of questions about how to create a biological weapon, a chemical weapon and a nuclear weapon. All declined to provide such information.
“Historically, having insufficient access to top experts was a major blocker for groups trying to obtain and use bioweapons. And now, the leading models are dramatically expanding the pool of people who have access to rare expertise,” said Seth Donoughe, the director of AI at SecureBio, a nonprofit organization working to improve biosecurity in the United States. Though such information has long existed on corners of the internet, the advent of advanced AI chatbots marks the first time in human history that anyone with internet access can get a personal, automated tutor to help understand it.
OpenAI’s o4-mini, gpt-5 mini, oss-20b and oss120b models all consistently agreed to help with extremely dangerous requests.
Currently, ChatGPT’s flagship model is GPT-5, which OpenAI says has ChatGPT’s top research capability. That model doesn’t appear to be susceptible to the jailbreak method NBC News found. In 20 tests, it declined to answer harmful questions each time.
But GPT-5 routes queries among several different models in certain circumstances. GPT-5-mini is a faster and more cost-efficient version of GPT-5, which the system falls back on after users hit certain usage limits (10 messages every five hours for free users or 160 messages every three hours for paid GPTPlus users), and it was tricked 49% of the time in NBC News’ tests.
Another older model that’s still available on ChatGPT and is still preferred by some users, o4-mini, was tricked even more frequently, 93% of the time.
The oss-20b and oss120b models can be freely downloaded and are used primarily by developers and researchers, but they are available for anyone to access.
Hackers, scammers and online propagandists are increasingly using large language models (LLMs) as part of their operations, and OpenAI releases a report each quarter detailing how those bad actors have tried to exploit versions of ChatGPT. But researchers are concerned that the technology could be used for much more destructive means.
To jailbreak ChatGPT, NBC News asked the models an innocuous question, included the jailbreak prompt and then asked an additional question that would normally trigger a denial for violating safety terms, like a request for how to create a dangerous poison or defraud a bank. Most of the time, the trick worked.
Two of the models, oss20b and oss120b, proved particularly vulnerable to the trick. It persuaded those chatbots to give clear instructions to harmful queries 243 out of 250 times, or 97.2%.