For years, rare disease foundations have made the case for research investment using a core set of numbers: how many people are affected, how severe the disease, how urgent the need. Those numbers came from population genetics models — careful, peer-reviewed estimates built from variant frequencies in genomic databases. They were the best available tool. And according to a new study published in Genetics in Medicine, they were, in many cases, deeply wrong.
Researchers compared existing rare disease prevalence models against newborn screening data from 23 million infants across multiple countries. For some conditions, the models predicted a fraction of the cases that screening actually identified. Not a small discrepancy. A tenfold gap. The implications ripple through every layer of how the rare disease field operates — from the grants we write to the trials we design to the arguments we make to regulators.
Why the Models Failed
Population genetics models estimate disease prevalence by calculating how often a disease-causing variant appears in a reference genome database, then applying an expected penetrance — the proportion of people carrying that variant who will actually develop the condition. It is a reasonable approach when direct patient counts are unavailable, which is most of the time in rare disease.
The problem is the assumptions embedded in each step. Reference genome databases have historically overrepresented people of European ancestry, meaning variant frequencies in other populations are less reliable. Penetrance estimates are typically derived from highly selected patient cohorts — often the most severely affected individuals who reached clinical attention — which can inflate the apparent impact of any given variant. And many databases capture only well-characterized mutations, missing the longer tail of rarer or newer variants that also cause disease.
What Newborn Screening Data Changes
Newborn screening is different in kind, not just degree. It does not model who might have a disease. It identifies, at the population level, who does. When 23 million infants are screened and the results compared to what our models predicted, we get a reality check that no genomic database can provide.
This is not an argument against population genetics as a methodology. It is an argument for grounding those models in empirical data wherever that data exists — and for expanding the conditions under which it can be collected. Newborn screening programs are among the most powerful rare disease data infrastructure we have.
For conditions like Dysferlinopathy, ALS, and ARID1B-related disorder — where the patient population is small, geographically dispersed, and often underdiagnosed — this matters acutely. If prevalence estimates are off by a factor of ten, we may be systematically undersizing natural history studies, undervaluing orphan drug programs, and making a weaker case to regulators and payers than the actual patient burden warrants.
The Downstream Consequences for Foundations and PIs
Consider what prevalence estimates actually do in practice. They anchor orphan drug designation applications to the FDA and EMA. They set the expected enrollment ceiling for clinical trials. They inform payers' budget impact models. They appear in grant applications as justification for research investment. They shape the size and ambition of every program a foundation decides to fund.
If those anchoring numbers are systematically low, the field has been working against itself. Programs that looked too small to be viable may be larger than assumed. Trials designed around a narrow patient population may have access to more participants. The regulatory case for accelerated pathways may be stronger than the numbers on the page suggested.
What This Means for You
If you lead a rare disease foundation or research program, this study gives you both a challenge and an opening.
The challenge: the numbers you have been using may need revisiting. That is uncomfortable work — updating estimates mid-program, recalibrating arguments made to funders and regulators. But it is far better to surface this now than to let flawed estimates quietly constrain programs that deserve more ambition.
The opening: a tenfold upward revision in prevalence, even for a subset of conditions, strengthens the case for every investment in patient identification, newborn screening advocacy, and registry expansion. It means there are more patients to find. More families who could benefit from a diagnosis. More participants who could enroll in a trial. That is not a small thing.
The rare disease community has always operated with incomplete information. That is the nature of the territory. What this research makes clear is that better information is within reach — and that expanding newborn screening and building more diverse genomic reference databases are not just scientific goals. They are the foundation on which accurate research, fair regulation, and real patient access must be built.
The numbers were wrong. Now we know. The next step is deciding what to do about it.