When Research Doesn’t Replicate: Risks for Evidence-Based Policy

What is the Replication Problem in the Social Sciences?

Lawmakers, advocates, and other stakeholders often defend policies on the basis of social science research. These policies determine how schools run, courts operate, and public health funds are spent. However, a wide variance in the quality of evidence cited is masked by the uniformity of citations. What is more, even seemingly rigorous studies may not hold up. The growing recognition that many landmark findings fail when other researchers attempt to reproduce them has sparked a crisis of confidence. For policymakers, the stakes are high because evidence that cannot be replicated exposes them to costly, ineffective, or even harmful decisions. Stanford’s John Ioannidis underscored the risk by arguing that “most published research findings are false.”[1]

This concern with reproducibility surfaced in the wake of large-scale replication efforts. The most influential of these, the Reproducibility Project: Psychology, was led by Brian Nosek and the Center for Open Science between 2011 and 2015.[2] Nosek and his team attempted to replicate 100 experimental and correlational studies published in 2008 in three leading psychology journals. While 97 percent of the original studies reported statistically significant results (p value of < 0.05), only 36 percent of the replications did so—and the average effect size in the replications was about half that of the originals.

Reproducible research is crucial for public policy. Because policies apply across populations and persist across years, they demand evidence that is stable across settings. A study that cannot be replicated is not reliable enough to guide law. Replication ensures that findings are not artifacts of circumstance, bias, or random variation. For this reason, replication should be the threshold standard for any evidence used in legislative or regulatory decision-making. What follows is a review of notable replication failures and a discussion of what it would mean to embed replication into the evidence base for policy.

Replication failures and their causes

Replication failures erode not only the credibility of individual studies but also public trust in the social sciences.[3] [4] The Center for Open Science’s Reproducibility Project identified the following mechanisms that hindered replication: weak methodological design, small sample sizes, statistical false positives and negatives, selective reporting of results, data manipulation or “p-hacking,” publication bias favoring striking findings, and flawed analytic practices.[5] These factors, alongside the publication incentives within academia, create conditions for a widespread publication crisis and demand greater transparency and standardization across the sciences.[6]

One of the most famous examples is the “power posing” study conducted by Dana Carney, Amy Cuddy, and Andy Yap in 2010. The researchers reported that adopting expansive, open postures for just two minutes increased testosterone, lowered cortisol, and made participants feel more powerful and willing to take risks.[7] The findings captured wide public attention: Amy Cuddy’s subsequent TED talk became one of the most viewed of all time, and her book Presence sold more than half a million copies.[8] Policymakers, business leaders, and educators alike cited the study as evidence that simple behavioral interventions could meaningfully boost confidence.

Public sentiment on power posing shifted sharply in 2015, when Eva Ranehill and colleagues at the University of Zurich published a large-scale replication in Psychological Science.[9] Using more than 200 participants (double the original sample size), the team measured both hormone levels and self-reported feelings of power. They found that while participants did feel more powerful after adopting expansive poses, there were no significant changes in testosterone, cortisol, or risk-taking behavior. The study suggested that the original findings were likely the product of small samples, unreliable hormonal data, and selective reporting. Because “power posing” had become such a cultural phenomenon, its failure to replicate reverberated beyond this single study to fuel public skepticism about psychology more broadly and underscored the risks of over-interpreting preliminary research.[10]

The fate of the Scared Straight program in the United States provides another striking example of the costs when early social science findings fail to replicate. Launched in the 1970s, the Scared Straight program aimed to deter at-risk youth from criminal behavior by exposing them to the harsh realities of prison life. Teenagers were brought into correctional facilities, where inmates shouted at them and described the hardships of incarceration. The program gained national attention after a documentary film portrayed its impact and reported that only 10 percent of the 10,000 juveniles who participated later had arrest records. The film also highlighted that 16 of 17 teens stayed “out of trouble” in the months following their visit. These early results created the impression that the program was a simple, cost-effective way to reduce juvenile crime, and led to widespread adoption across the country.[11]

A decade later, James Finkenauer of Rutgers University conducted a rigorous evaluation of Scared Straight and found that the celebrated early results failed to replicate. The original reports had relied on small samples and short-term outcomes; when Finkenauer tracked participants for six months, he found that those exposed to the program were, in fact, more likely to commit crimes than their peers.[12] Far from reducing delinquency, the program appeared to worsen it. Subsequent analyses reinforced these findings, with one estimate concluding that for every $1 invested in Scared Straight, taxpayers incurred nearly $100 in additional crime-related costs.[13] Beyond the fiscal burden, critics highlighted the psychological harm inflicted on participants.[14] By the late 1990s, the weight of evidence led Congress to formally deem the program ineffective.[15] Despite continued modifications, replications through the 2000s and even into the 2010s showed consistent negative effects. These results cemented Scared Straight as a cautionary tale of how unreplicated findings can misguide national policy.

Addressing the Replication Crisis

Researchers alone cannot solve the replication crisis. Policymakers and advocates also shape how evidence is used and valued. Addressing the crisis, therefore, requires attention at both ends of the research pipeline: researchers must plan and execute studies with uncertainty in mind, while policymakers and advocates must weigh that uncertainty when citing and applying results. The recommendations below speak to both responsibilities.

For researchers, scholars recommend:

  1. Transparency: All published research should include a replication package with anonymized data, full code, and sufficient documentation to reproduce results. These materials should be deposited in a durable, public repository.[16]
  2. Pre-registration: To limit researcher bias and selective reporting, pre-analysis plans and trial registries should be the norm for policy-relevant research.[17]
  3. Credibility Checks: Studies should report placebo tests, robustness checks, and alternative specifications that actively attempt to falsify their findings. When results prove sensitive to model choice, that limitation should be stated clearly in the abstract.[18]

These practices allow other researchers to verify findings, enhance the credibility of results, and curb the tendency to overstate headline-grabbing effects. By embedding transparency and rigor into the research process, scholars can produce evidence more suitable for guiding policy.

For policymakers and advocates, scholars recommend:

  1. Piloting programs. Policymakers should favor pilot programs over immediate large-scale implementation. Pilots allow governments to test interventions in controlled settings, measure outcomes rigorously, and adjust design before committing significant resources. Built-in evaluation and iterative review ensure that ineffective or harmful policies can be revised or abandoned early, while successful programs can be scaled with greater confidence.
  2. Policy Citation Standards: When research is cited in policymaking, citations should clearly distinguish between correlational and causal evidence, disclose study limitations, and note whether replication materials are publicly available.
  3. Evidence Weighting: Policymakers should privilege findings that have been replicated across multiple studies or contexts, ideally supported by systematic reviews or meta-analyses. A key question for any result is how consistently it has been shown across groups, settings, and time.
  4. Acknowledge Uncertainty.When citing evidence, policymakers should consider both the most conservative and most extreme plausible estimates rather than treating single-point estimates as certainties. Reducing the false impression of certainty in uncertain settings fosters more fruitful debate about policy trade-offs.[19]

These practices promote a policy environment that is both evidence-informed and self-correcting, reducing the risk of costly missteps based on fragile or unreplicated findings. The replication crisis in the social sciences reveals how many celebrated findings lack reproducibility, which undermines both academic credibility and evidence-based policy. Case studies such as “power posing” and the Scared Straight program show how programs built on fragile evidence can be ineffective, costly, and even harmful. Common failure modes (e.g., small samples, weak methodology, p-hacking, and selective reporting) underscore the urgency of stronger research standards. To safeguard the credibility of science and the effectiveness of policy, researchers must uphold rigorous methodological practices. At the same time, lawmakers must demand replicable evidence, pilot-test programs before scaling, and commit to continuous evaluation. Only by confronting the replication crisis directly can we rebuild trust in science, strengthen public policy, and ensure that social science serves the public good.


[1] Ioannidis, John P. A. “Why Most Published Research Findings are False.” PLOS Medicine 19, no. 8 (2005). https://doi.org/10.1371/journal.pmed.1004085.

[2] Open Science Collaboration. “Estimating the Reproducibility of Psychological Science.” Science 349, no. 6251 (2015). https://doi.org/10.1126/science.aac4716.

[3] Balafoutas, Loukas, Jeremy Celse, Alexandros Karakostas, and Nicholas Umashev. “Incentives and the Replication Crisis in Social Sciences: A Critical Review.” Journal of Behavioral and Experimental Economics 114 (2024): 102327. https://doi.org/10.1016/j.socec.2024.102327

[4] Lester, Patrick. “Addressing the Research ‘Replication Crisis’: Evidence-Based Policy’s Hidden Vulnerability.” Social Innovation Research Center, January 19, 2018. https://www.socialinnovationcenter.org/archives/2906

[5] Open Science Collaboration, “Estimating the Reproducibility of Psychological Science.”

[6] Balafoutas et al., “Incentives and the Replication Crisis in the Social Sciences.”

[7] Carney, Dana R., Amy J.C. Cuddy, and Andy J. Yap. “Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance.” Psychological Science 21, no. 10 (2010): 1363–1368. https://doi.org/10.1177/0956797610383437.

[8] Cuddy, Amy J.C. “Amy J.C. Cuddy, PhD.” Accessed September 1, 2025. https://www.amycuddy.com.

[9] Ranehill, Eva, Anna Dreber, Magnus Johannesson, Susanne Leiberg, Sunhae Sul, and Roberto A. Weber. “Assessing the Robustness of Power Posing: No Effect on Hormones and Risk Tolerance in a Large Sample of Men and Women.” Psychological Science 26, no. 5 (2015): 653–656. https://doi.org/10.1177/0956797614553946.

[10] Loncar, Tom. “A Decade of ‘Power Posing’: Where Do We Stand?” The Psychologist, June 2021.https://www.bps.org.uk/psychologist/decade-power-posing-where-do-we-stand

[11] Office of Juvenile Justice and Delinquency Prevention. “Justice Department Discourages the Use of ‘Scared Straight’ Programs.” OJJDP News @ a Glance, March/April 2011. https://ojjdp.ojp.gov/sites/g/files/xyckuh176/files/pubs/news_at_glance/234084/topstory.html

[12] Finkenaur, James. Scared Straight! And the Panacea Phenomenon. Englewood Cliffs: Prentice Hall (1982).

[13] Kohli, Jitinder. “Doing What Doesn’t Work: Why Scared Straight Programs Are a Waste of Taxpayer Dollars.” Center for American Progress, February 8, 2012. https://www.americanprogress.org/article/doing-what-doesnt-work/

[14] Petrosino A., Turpin-Petrosino C., Hollis-Peel M. E., Lavenberg J. G. “’Scared Straight’ and other juvenile awareness programs for preventing juvenile delinquency.” Cochrane Database Systematic Review 4 (2013):CD002796. doi: 10.1002/14651858.CD002796.

[15] U.S. General Accounting Office. Juvenile Justice: Impact and Effectiveness of Federal Programs (1997).

[16] E. Miguel et al. “Promoting Transparency in Social Science Research.Science” Science 343 (2014): 30-31. DOI:10.1126/science.1245317

[17] Casey, Katherine, Rachel Glennerster, and Edward Miguel. “Reshaping Institutions: Evidence on Aid Impacts Using a Pre-Analysis Plan.” Quarterly Journal of Economics 127, no. 4 (2012): 1755-1812. https://doi.org/10.1093/qje/qje027

[18] Leamer, Edward E. “Let’s Take the Con out of Econometrics.” American Economic Review 73, no. 1 (1983): 31-43. https://www.jstor.org/stable/1803924

[19] Manski, Charles. Public Policy in an Uncertain World: Analysis and Decisions. Cambridge: Harvard University Press, 2013.

Stay Informed


Sign up to receive updates about our fight for policies at the state level that restore liberty through transparency and accountability in American governance.