What happens when Uncle Sam hallucinates? Government agencies are embracing generative AI tools that sometimes lose touch with reality

Chip Somodevilla - Getty Images

President Biden’s recent executive order on AI is a reminder that the federal government is now a top customer for artificial intelligence (AI) products, spending at least $1.6 billion on AI in 2023. The Pentagon uses an AI tool called GAMECHANGER to help officials figure out how to spend taxpayers’ money. Other agencies use AI to process FOIA requests, regulate medical treatments, and streamline patent applications.

That’s a good thing: 92% of Americans want to see the government embrace digital innovation, and AI offers a potent way to streamline government operations. But while AI and machine learning are important tools, federal agencies are increasingly turning their attention to generative AI (GenAI), the technology behind conversational tools such as ChatGPT.

The problem is that while GenAI models are useful, they are also prone to making things up. These errors, known as hallucinations, occur when a GenAI tool produces content that sounds right but is actually a work of fiction–with no indication that the output is uncertain or inaccurate.

If you’re using GenAI to write ad copy, an occasional auto-generated snafu might not be such a big deal. (Then again, Google lost $100 billion in value after its Bard AI made up facts in a promo video and its representatives promised “a rigorous testing process.. to make sure Bard's responses meet a high bar for quality, safety and groundedness in real-world information.”)

However, for the government, AI hallucinations could become a huge problem. Americans need to be able to trust the information they get from their leaders. Citizens also need policymakers to use accurate information when making decisions on their behalf.

Governmental hallucinations could prove particularly costly for businesses. One of the most important use cases for federal AI lies in helping companies identify and comply with the regulations that impact their business. If government tools start confabulating, companies will struggle to adapt to new rulemaking, leaving them vulnerable to serious liability and compliance risks.

The problem with hallucinations

Hallucinations are woven into the fabric of GenAI–and there’s no way to eliminate them altogether. A large language model predicts successive words to string together plausible-sounding sentences. The results might be convincing, but since the model isn’t capable of fact-checking itself, some fraction of its responses inevitably wind up being outright fabrications.

Crucially, those fabricated results can subsequently be used to train other AI models. As low-quality GenAI outputs seep into broader datasets and are used to train and fine-tune other AI models, the entire universe of available data grows increasingly corrupt.

Fortunately, that hasn’t yet become a serious problem for federal AI, in part because agencies are currently focusing on non-generative technologies. Still, the tide is turning: Eight out of 10 federal agency leaders now see GenAI as a top priority, and 71% believe the potential benefits outweigh the risks. In the coming months, we’ll see GenAI deployed across increasingly high-stakes government functions. For instance, the Pentagon is already planning to use GenAI to augment intelligence and warfighting capabilities.

In the meantime, we’re already seeing GenAI tools producing politically themed hallucinations. (Try asking Alexa if the 2020 election was stolen.) As similar tools find use in government and are applied to more complex regulatory areas, errors will start to snowball. Taken to an extreme, it’s easy to imagine a world in which ubiquitous AI models chatter amongst themselves, like the world’s worst game of Telephone, iteratively corrupting our understanding of what laws and regulations actually say and mean.

The government is uniquely vulnerable to this kind of drift because our laws and regulations comprise vast amounts of messy, unstructured data scattered across countless different statute books and tomes of case law. When nobody even knows how many laws there are on the books, it’s hard for human overseers to ensure that AI models aren’t misinterpreting or misrepresenting the rules we live by.

How to stop the government from hallucinating

Businesses already spend vast sums on regulatory compliance, and AI hallucinations could further increase the burden as companies struggle to make sense of what’s required of them.

However, the solution isn’t to ban the government from using AI. That ship has already sailed. Instead, we need to make sure that essential government functions are solidly anchored in the only truth that matters: the laws and rules enacted by lawmakers.

That isn’t something we can simply mandate. Hallucinations are inherent to the nature of GenAI, so trying to regulate away errors would be like trying to legislate the value of pi. Instead, it will take the combined impact of best practices, technological solutions, and effective public-private collaboration to provide a truly reliable foundation for AI innovation across federal and state governments.

First, we’ll need to populate GenAI models with accurate regulatory language to ensure that only rigorously vetted policy inputs shape the outputs produced by federal AI models, and provide post-facto systems to verify the provenance of data used across AI ecosystems. Businesses are already using emerging data technologies to turn structured and unstructured data sources into reliable sources of truth for analytics and decision-making. It’s time for Uncle Sam to do the same, and draw together disparate policies, rules, and regulations from across the federal government into a single reliable data hub that can be used to both train and validate AI models.

Separately, we’ll need to ensure that governmental AI models make appropriate use of the data available to them. President Biden’s AI order is a good start, setting standards for federal AI use that will help promote responsible practices across the industry. Earlier this year, the Department of Commerce’s National Institute of Standards and Technology (NIST) also published an important framework guiding safe AI use. Still, such frameworks take months or years to develop, and we’ll need to move significantly faster to keep federal GenAI initiatives on track.

One promising option is “constitutional AI,” a new approach to AI training in which a model’s outputs are governed not just by statistical modeling and feedback but by broadly applicable codes of behavior. With norms established both by developers and end-users–or even collaboratively by the American public–and enforced programmatically by AI tools, such approaches could responsively define and police the bounds of appropriate federal GenAI use.

To leverage such methods, however, we need to put reliable infrastructure in place before government GenAI technologies are deployed at scale. Once a GenAI model starts to hallucinate, it’s all but impossible to put the genie back in the bottle. We need to take concrete steps now to ensure that federal GenAI models remain anchored in reality–before hallucinations become more widespread, and rogue AI tools begin rewriting the laws of the land.

Carolyn Parent is the CEO of Conveyer.

More must-read commentary published by Fortune:

The opinions expressed in Fortune.com commentary pieces are solely the views of their authors and do not necessarily reflect the opinions and beliefs of Fortune.

This story was originally featured on Fortune.com

Advertisement