Sample Drift Threatens ADHD Assessment Accuracy

Zachary Meehan
Mar 17
8 min read

There is a quiet assumption built into most ADHD assessments: that the tools we are using were designed with patients like ours in mind.

In many cases, that assumption does not hold. The symptom criteria, rating scale cutoffs, and diagnostic thresholds that clinicians rely on were developed using specific samples, primarily school-age children, mostly male, predominantly white, and often clinic-referred. When the person sitting across from you looks different from that sample, the diagnostic accuracy you learned about in training starts to drift.

This is not a minor technical concern. It sits at the heart of what diagnostic validity actually means, and it has direct consequences for the patients most likely to be missed or misidentified.

Where Diagnostic Validity Comes From

Diagnostic validity statistics, sensitivity, specificity, positive and negative predictive values, and likelihood ratios, are not properties of a disorder. They are properties of a test applied to a sample.

When researchers report that a rating scale correctly identifies 85% of ADHD cases, they mean it did so in their particular sample, under the conditions of that study.

The DSM-5 ADHD symptom criteria were developed and refined over decades of research that disproportionately enrolled younger children presenting to specialty clinics (Barkley, 2015). The items were chosen, in part, because they discriminated well within that population. Rating scale norms were derived from samples that varied across studies but were rarely designed with broad demographic representation in mind.

The practical implication is that the psychometric properties of our best tools, the numbers that tell us how much diagnostic weight to assign a given score, are most trustworthy when our patient resembles the derivation sample. The further our patient departs from it, the less we can rely on those numbers.

This is what is meant when researchers describe diagnostic accuracy as sample-dependent. It is not a criticism of the tools. It is a description of how validity evidence is built.

College Students: A Clear Case of Sample Drift

Lefler and colleagues (2021) provide one of the clearest accounts of this problem in their review of ADHD assessment in college students. The college population differs from the derivation samples for most ADHD tools in nearly every relevant way.

Students are older, presenting in a different developmental context, without access to the teacher informants that most child-normed instruments require. Self-report becomes the primary data source precisely when motivation to over-report symptoms may be elevated, whether due to accommodation-seeking, stimulant access, or genuine misattribution of stress and anxiety to ADHD.

The retrospective onset requirement adds further complexity. DSM-5 requires that symptoms were present before age 12, but for individuals presenting for a first evaluation in college, establishing this relies almost entirely on recall, which is vulnerable to bias in both directions.

Lefler and colleagues note that the distinction between late-onset ADHD, in which symptoms genuinely first appear in adulthood and are more commonly attributable to other conditions, and late-identified ADHD, in which longstanding symptoms were never previously recognized, is clinically important but frequently collapsed in practice.

Neuropsychological testing, often perceived as an objective anchor in the assessment battery, does not resolve the problem.

Lefler and colleagues (2021) note that such testing lacks adequate diagnostic utility for ADHD, a conclusion supported by a recent meta-analysis in the Journal of the American Academy of Child and Adolescent Psychiatry (Arrondo et al., 2024). That study pooled data from 19 studies of continuous performance tests and found overall diagnostic accuracy to be modest to moderate at best, with pooled sensitivity and specificity values ranging from approximately 0.59 to 0.75 across subscales.

The authors concluded that continuous performance tests as a standalone tool have limited ability to differentiate ADHD from non-ADHD samples and should only be used within a broader diagnostic process. High scores on these tasks are neither necessary nor sufficient for diagnosis. Their absence does not rule it out; their presence does not confirm it.

The core message is that standard ADHD assessment procedures, applied without modification to a college student, are not operating under the same diagnostic conditions they were designed for. Collateral information, historical records, and validated impairment scales become critical evidence precisely because the usual tools lose some of their discriminative power.

College Students Are Not the Only Example

The college case is instructive because the drift is relatively easy to see. But the same logic applies across other populations that differ from the original derivation samples.

Girls and women have historically been underrepresented in ADHD research. The behavioral presentation that anchored early symptom criteria, overt hyperactivity, disruptive classroom behavior, poor rule-following, maps more closely onto what is typically observed in boys.

Girls with ADHD more commonly present with inattentive symptoms, internalizing comorbidities, and social difficulties that are less visible to teachers and more likely to be attributed to anxiety or emotional dysregulation (Quinn & Madhoo, 2014).

Rating scale items that were sensitive discriminators in predominantly male samples may be less sensitive when applied to girls, not because the disorder is absent, but because the symptom expression differs.

Racially and ethnically diverse populations present similar concerns. Most ADHD normative samples have not included proportionate representation of Black, Latino, or other minority youth. Rating scale norms derived from non-representative samples produce differential false positive and false negative rates when applied to children outside those groups. Clinician bias in referral patterns compounds this, with research suggesting Black children are both more likely to be referred for disruptive behavior and less likely to receive ADHD diagnoses relative to white peers with comparable symptom profiles (Epstein et al., 2005).

Taken together, these patterns reflect a systematic feature of how diagnostic validity is established in the field, not exceptions to an otherwise adequate system. When samples are narrow, the resulting validity statistics are narrow. Clinicians applying those statistics to a broader population are extrapolating beyond the evidence base, sometimes appropriately, but always with some reduction in confidence.

What This Means in Practice

Recognizing sample-dependent validity does not require abandoning the tools we have. It requires using them with calibrated skepticism and supplementing them with evidence sources that are appropriate for the population being evaluated.

For college students, this means treating self-report as a starting point rather than a conclusion. Collateral from a parent, partner, or roommate, combined with historical records such as report cards, prior evaluations, or IEP documentation, provides the cross-setting, cross-informant corroboration that teacher ratings would otherwise supply.

Lefler and colleagues (2021) recommend that no diagnosis be supported on the basis of self-report alone, regardless of symptom severity.

For girls and women, it means attending to inattentive symptom clusters and internalizing presentations that may not trigger the same clinical concern as hyperactive-impulsive behavior, and recognizing that anxiety and ADHD co-occur frequently enough that ruling out one should not be treated as ruling out the other.

For racially and ethnically diverse patients, it means being explicit about the limitations of the norms being applied, supplementing standardized instruments with structured diagnostic interviews that assess symptom function rather than cutoff scores alone, and remaining attentive to the ways that systemic factors, including access to prior evaluation and educational support, shape the clinical history a patient presents.

A Bayesian framing is useful here. The prior probability of ADHD is not fixed; it varies by population, referral context, and developmental stage. The diagnostic tools we use shift the probability estimate upward or downward based on their likelihood ratios, but those ratios are themselves uncertain when applied outside the derivation sample. Good clinical reasoning incorporates that uncertainty rather than suppressing it.

Key Takeaways

Diagnostic validity statistics are sample-dependent. Sensitivity, specificity, and likelihood ratios describe how a test performed in a specific study sample, and may not generalize to populations that differ from that sample.
The original ADHD derivation samples were disproportionately composed of younger, school-age, male, and clinic-referred children. Accuracy drifts when evaluating populations that differ from this profile.
College students represent a clear case of sample drift: self-report becomes the primary data source, teacher ratings are unavailable, retrospective onset is difficult to verify, and continuous performance tests have only modest to moderate diagnostic accuracy as standalone tools (Arrondo et al., 2024; Lefler et al., 2021).
Girls and women are frequently underdiagnosed due to a symptom profile that diverges from the predominantly male derivation samples. Inattentive presentation and internalizing comorbidities are easily misattributed.
Racial and ethnic minority populations are underrepresented in ADHD normative samples. Clinicians should be attentive to differential accuracy across groups and supplement standardized tools with structured interviews.
Sample drift does not require abandoning established tools. It requires pairing them with population-appropriate evidence sources, calibrated uncertainty, and explicit reasoning about what each piece of evidence contributes.

Glossary

collateral information: diagnostic data gathered from informants other than the patient, such as parents, partners, or teachers. Especially important in populations where self-report is subject to elevated bias or where the primary informant source from validation studies is unavailable.

derivation sample: the population from which diagnostic criteria, normative data, or validity statistics were originally generated. The characteristics of this sample set the boundaries of appropriate generalization.

diagnostic validity: the degree to which a diagnostic tool correctly identifies the presence or absence of a condition. Includes sensitivity, specificity, and likelihood ratios. All validity statistics are derived from specific samples and may not generalize universally.

late-identified ADHD: ADHD in which symptoms were present since childhood but not recognized until later in development. Distinct from late-onset ADHD, in which symptoms first emerge in adulthood and are more commonly attributable to other conditions.

likelihood ratio (LR): a statistic that expresses how much a test result changes the probability of a diagnosis. A positive LR greater than 1 raises the probability of the condition; a negative LR less than 1 lowers it. LRs are generally more clinically useful than sensitivity and specificity because they can be directly applied to update pre-test probability estimates.

prior probability: in clinical assessment, the estimated probability of a diagnosis before any test results are considered. Derived from base rates, referral context, and demographic factors. Forms the starting point in Bayesian diagnostic reasoning.

sample-dependent validity: the principle that accuracy statistics derived from a research sample apply most reliably when the patient under evaluation resembles that sample. Greater demographic or clinical differences between the patient and the derivation sample introduce uncertainty into the validity estimate.

sensitivity: the probability that a test will be positive given that the condition is present. A highly sensitive test misses few true cases.

specificity the probability that a test will be negative given that the condition is absent. A highly specific test produces few false positives.

References

Arrondo, G., Mulraney, M., Iturmendi-Sabater, I., Musullulu, H., Gambra, L., Niculcea, T., Banaschewski, T., Simonoff, E., Döpfner, M., Hinshaw, S. P., Coghill, D., & Cortese, S. (2024). Systematic review and meta-analysis: Clinical utility of continuous performance tests for the identification of attention-deficit/hyperactivity disorder. Journal of the American Academy of Child and Adolescent Psychiatry, 63(2), 154–171. https://doi.org/10.1016/j.jaac.2023.03.011

Barkley, R. A. (2006). Attention-deficit hyperactivity disorder: A handbook for diagnosis and treatment (3rd ed.). Guilford Press.

Epstein, J. N., Willoughby, M., Valencia, E. Y., Tonev, S. T., Abikoff, H. B., Arnold, L. E., & Hinshaw, S. P. (2005). The role of children's ethnicity in the relationship between teacher ratings of attention-deficit/hyperactivity disorder and observed classroom behavior. Journal of Consulting and Clinical Psychology, 73(3), 424–434. https://doi.org/10.1037/0022-006X.73.3.424

Lefler, E. K., Flory, K., Canu, W. H., Willcutt, E. G., & Hartung, C. M. (2021). Unique considerations in the assessment of ADHD in college students. Journal of Clinical and Experimental Neuropsychology, 43(4), 352–369. https://doi.org/10.1080/13803395.2021.1936462

Quinn, P. O., & Madhoo, M. (2014). A review of attention-deficit/hyperactivity disorder in women and girls: Uncovering this hidden diagnosis. Primary Care Companion for CNS Disorders, 16(3). https://doi.org/10.4088/PCC.13r01596

Zhou, X., Taber-Doughty, S., Jin, R., & Youngstrom, E. A. (2018). Efficiency and safety of a Bayesian approach to ADHD diagnosis using the CBCL. Psychological Assessment, 30(12), 1531–1541. https://doi.org/10.1037/pas0000605

About the Author

Zachary Meehan earned his PhD in Clinical Psychology from the University of Delaware and serves as the Clinic Director for the university's Institute for Community Mental Health (ICMH). His clinical research focuses on improving access to high-quality, evidence-based mental health services, bridging gaps between research and practice to benefit underserved communities. Zachary is actively engaged in professional networks, holding membership affiliations with the Association for Behavioral and Cognitive Therapies (ABCT) Dissemination and Implementation Science Special Interest Group (DIS-SIG), the BRIDGE Psychology Network, and the Delaware Project. Zachary joined the staff at Biosource Software to disseminate cutting-edge clinical research to mental health practitioners, furthering his commitment to the accessibility and application of psychological science.