AI Can Help States Catch Fraud. But Transparency is a Data Problem First.
Public frustration with government waste is not just about the fraud we can prove. It is also about the spending we still cannot trace cleanly from appropriation to contract to payment to result. At the federal level, GAO recently estimated annual fraud losses of between $233 billion and $521 billion, and agencies reported about $162 billion in improper payments in fiscal year 2024 alone.1,2 Improper payments are broader than fraud, but together these figures show the combined cost of weak controls and weak data.
When it comes to making government spending legible, AI matters and data infrastructure matters just as much. GAO’s fraud-risk work identifies data matching, data mining, and predictive analytics as practical tools for preventing and detecting suspicious transactions.3 These techniques are also remarkably capable of working with imperfect data: entity reconciliation, structured extraction from free text, and probabilistic vendor deduplication are problems that modern systems handle well at scale. NIST’s AI Risk Management Framework treats application context and data as core dimensions of trustworthy AI, not side issues.4
But public transparency demands something beyond raw technical performance. An agency can, internally, use AI to reconstruct a coherent picture of its spending from scattered sources. That does not, by itself, make the spending legible to citizens, journalists, or independent auditors. Democratic transparency requires that the inputs, or at minimum the structure, identifiers, and provenance of the data, be published and independently verifiable. Otherwise we trade bureaucratic opacity for algorithmic opacity, equally inaccessible to outsiders. The question for states, then, is not only whether AI can clean up their data internally; it is also whether they produce publishable records that third parties can audit independently.
On that front, even the federal government struggles. GAO found in 2025 that only 36 of 70 agencies reporting to the Federal Procurement Data System confirmed completing fiscal year 2023 procurement data-quality reports, and none of the 24 reviewed agencies’ reports met all relevant OMB reporting requirements, with 19 missing the submission deadline.5
A payment database and a contract database are not enough if they cannot talk to each other. Treasury’s DATA Act reporting rules require agencies to track award identifiers in their financial systems precisely so that spending data can be linked to award details.6 The Open Contracting Data Standard applies the same logic: a unique contracting-process identifier should be carried across systems, including those that record spending transactions.7 States should adopt common award IDs and common vendor IDs across procurement, accounts payable, grant, and transparency systems, and publish those identifiers in machine-readable downloads. These identifiers are the precondition for a researcher, a journalist, or a citizen to independently redo the analysis.
A contract line that says only “consulting services” or “vehicle purchase” is not transparent.8 Federal spending standards define an award description as a brief description of the purpose of the award. GAO has also found that agencies need controls to ensure descriptions are written in plain English and do not rely on acronyms, technical terminology, or agency-specific jargon that is unclear to the public.9 States should require every public contract record to answer four basic questions: What is being bought? Why is the agency buying it? How was it procured? What is the vendor expected to deliver?
The hardest contracts to scrutinize are often service contracts because the public can see the invoice without seeing a credible benchmark for success. Federal acquisition rules are explicit: performance standards should be measurable and structured so that contractor performance can be assessed.10 GAO has separately urged federal procurement leaders to use outcome-oriented metrics, including cost savings or avoidance, timeliness of deliveries, quality of deliverables, and end-user satisfaction.11 States should apply the same discipline to service contracts: define success before award, disclose the metrics, and report whether the contractor met them.
You cannot detect suspicious concentration of spending if the same vendor appears under five slightly different names. AI can reconcile most of those variants, but a reconciliation done downstream, opaquely, is no substitute for a canonical registry that anyone can inspect. The federal government now uses the Unique Entity ID in SAM.gov as the official identifier for doing business with the U.S. government, and interfacing systems are expected to use it.12 OCDS likewise emphasizes the use of structured organization identifiers drawn from authoritative registries, so that buyers and suppliers can be reliably identified across publishers and datasets.13 States do not need perfect master data on day one. But they do need a statewide vendor registry with unique identifiers, validation rules at data entry, and a governance process for resolving duplicates and preserving a single canonical record.
Once the data are linkable, states should automate anomaly detection. GAO describes data matching, data mining, and predictive analytics as ways to spot suspicious patterns that may not be visible transaction by transaction.14 Procurement-risk guidance similarly treats split purchases, multiple direct awards just below competitive thresholds, and repeated below-threshold awards to the same supplier as classic red flags.15 These flags are not proof of misconduct. They are prompts for review. The right model is simple: let algorithms surface anomalies quickly, then let auditors determine whether the pattern reflects fraud, weak controls, or a legitimate operational explanation.
The lesson here is that public transparency has its own requirements that AI alone cannot satisfy. When the inputs are fragmented, dispersed, and unpublished, no amount of downstream algorithmic sophistication restores the citizen’s ability to verify independently. Federal experience under the DATA Act shows that searchable standards, linked identifiers, and public traceability are achievable. Recent GAO work also shows what happens when data quality is treated as an afterthought: monitoring weakens, confidence erodes, and oversight suffers. States that invest first in publishable, linked records will get far more value from AI and will give citizens, not just their own analysts, the tools to catch waste before the money is gone.

Stay Informed
Sign up to receive updates about our fight for policies at the state level that restore liberty through transparency and accountability in American governance.