‘P-Hacking’ Lets Scientists Massage Results. This Method Could Nix That Loophole

Photo credit: Jonathan Kitchen - Getty Images

The pursuit of science is designed to search for significance in a maze of data. At least, that’s how it’s supposed to work.

By some accounts, that facade began to shatter in 2010 when a social psychologist from Cornell University, Daryl Bem, published a 10-year analysis in the prestigious Journal of Personality and Social Psychology, demonstrating with widely accepted statistical methods that extrasensory perception (ESP), basically the “sixth sense,” was an observable phenomenon. Bem’s peers couldn’t replicate the paper’s results, quickly blaming what we now call “p-hacking,” a process of massaging and overanalyzing your data in search of statistically significant—and publishable—results.

♾ You love math. So do we. Let’s dive deep into its intricacies together—join Pop Mech Pro.

To support or refute a hypothesis, the goal is to establish statistical significance by recording a “p-value” of less than 0.05, explains Benjamin Baer, a post-doctoral researcher and statistician at the University of Rochester, whose recent work looks at addressing this issue. The “p” in p-value stands for probability and is a measure of how likely a null hypothesis result is versus chance.

For example, if you wanted to test whether or not all roses are red, you would count the number of red roses and roses of other colors in a sample and perform a hypothesis test to compare the values. If this test spits out a p-value of less than 0.05, then you have statistically significant grounds to claim that only red roses exist—even though evidence outside your sample of flowers suggests otherwise.

Misusing p-values to support the idea that ESP exists may be relatively harmless, but when this practice is used in medical trials, it can have much deadlier results, says Baer. “I think the big risk is that the wrong decision can be made,” he explains.“There’s this big debate happening across science and statistics, trying to figure out how to make sure that this process can happen more smoothly and that decisions are actually based on what they should be.”

Baer was the first author on a paper published at the end of 2021 in the journal PNAS along with his former Cornell mentor and professor of statistics, Martin Wells, that looked into how new statistics could improve the use of p-values. The metric that they looked at is called the fragility index and is designed to supplement and improve p-values.

This measure describes the fragility of a data set to some of its data points flipping from a positive to a negative result—for example, if a patient who’d been positively impacted by a drug actually felt no impact. If changing only a few of these data points is enough to demote a result from being statistically significant to not, it’s then considered fragile.

In 2014, physician Michael Walsh originally proposed the fragility index in the Journal of Clinical Epidemiology. In the paper, he and his colleagues applied the fragility index to just under 400 randomized control trials with statistically significant results and found that one in four had low fragility scores, meaning their findings may not actually be very reliable or robust.

However, the fragility index has yet to pick up much steam in medical trials. Some critics of the approach have emerged, like Rickey Carter from the Mayo Clinic, who says it’s too similar to p-values without offering enough improvement. “The irony is the fragility index was a p-hacking approach,” Carter says.

To improve the fragility index, Baer, Wells, and colleagues focused on improving two main elements to answer previous criticism: only doing sufficiently likely modifications, and generalizing the approach to work beyond binary 2x2 tables (representing positive or negative control and experimental group results).

Despite the uphill battle that the fragility index has fought thus far, Baer says he still believes it’s a useful metric for medical statisticians and hopes that improvements made in their recent work will help convince others of that, too.

“Talking to the victim’s family after a surgery fails is a very different [experience] than statisticians sitting at their desks doing math,” Baer says.

You Might Also Like

‘P-Hacking’ Lets Scientists Massage Results. This Method Could Nix That Loophole

♾ You love math. So do we. Let’s dive deep into its intricacies together—join Pop Mech Pro.

AOL Editor Picks

You can get a 2022 MacBook Air at a record-low $829 today, and it's selling fast

Are AirTags worth it? If you plan to travel this summer, we think so.

Buzz off! This highly rated bug trap will eliminate annoying pests and is more than 50% off at Walmart today

In Other News

Kevin Costner 'would love' to return to 'Yellowstone,' but is waiting for 'right circ…

Scooter Braun announces retirement as a music manager 5 years after Taylor Swift disp…

Dow Jones will eclipse 100,000 on a 'turbo booster,' Wall Street CIO says

Adobe traps customers in annual subscription plans, FTC alleges

The only way you should store avocados, according to Hass avocado expert

How to use fruit in savory dishes like chef Stephanie Izard

For the first time, Juneteenth is a free entry day at national parks

All about the 2024 summer solstice: What is it and when will it occur?

Serena Williams Encourages Caitlin Clark to 'Continue Doing What She’s Doing’ and Avo…

USA basketball Olympic women's team roster: Who made the cut for Paris Olympics

Wildfire north of Los Angeles prompts evacuation orders; over 14k acres scorched

60 million people under heat advisories, as storm brings snow in west: Monday weather…

Related articles

Related articles

Related articles