5-Second Science: Wearable Composite Health Scores Require Validation
- BioSource Faculty
- Apr 21
- 4 min read

Marco Altini (HRV4Training) is one of our heroes. We look forward to his weekly blog posts (hrv4training@substack.com) and encourage you to subscribe.

His open-access composite health score article, written with Dougherty and colleagues (2025), deserves your attention if you utilize wearables. Composite health scores (CHS) are now a central feature in consumer wearables, integrating multiple physiological metrics into simplified indices labeled “Readiness,” “Recovery,” or “Strain.” Although these scores are marketed as tools to guide physical activity and recovery, their validity remains uncertain.
The review by Doherty et al. (2025) systematically analyzed 14 CHS from 10 leading wearable manufacturers. It found that while the technology aims to simplify complex health data, the underlying algorithms remain proprietary, limiting clinical trust and interpretability. Although we summarize their main findings in this post, we invite you to read their open-access article for yourself.
What Composite Health Scores Measure
CHS generally incorporate a core set of biometric inputs: heart rate variability (HRV, 86% of CHS), resting heart rate (RHR, 79%), physical activity (71%), and sleep metrics (71%). Body temperature (29%), respiratory rate (14%), and blood oxygen saturation appear less frequently. HRV and RHR are typically extracted from photoplethysmography (PPG) signals during sleep. Despite the physiological relevance of these metrics, no manufacturer disclosed how they are algorithmically weighted, nor were any CHS validated against clinical outcomes.
Methodological Inconsistencies and Proprietary Algorithms
Each CHS varies in how it combines metrics over time. Some, like Oura’s Readiness Score or Garmin’s Training Readiness, integrate short-term (e.g., last night’s sleep) and long-term (e.g., 7–28-day trends) data. Others, such as WHOOP’s Recovery, emphasize daily recalibration. Most wearables present CHS on a normalized scale (0–100), often categorized into interpretive zones (e.g., “Optimal,” “Pay Attention”) to help guide behavior. However, no one publicly defines how inputs like HRV or sleep are prioritized within these zones, undermining clinical utility.
HRV and RHR: Central but Unevenly Measured
The dominance of HRV and RHR across CHS reflects their physiological importance. RHR indicates cardiovascular efficiency and baseline autonomic tone. HRV, particularly measured as RMSSD (root mean square of successive differences), reflects parasympathetic modulation. Yet, the method of measuring these varies considerably across devices. For instance, WHOOP weights HRV readings during slow-wave sleep, while Fitbit uses the longest sleep period > 3 hours. This inconsistency hampers cross-device comparison and interpretability for clinicians.
Limited Transparency Undermines Clinical Confidence
The lack of algorithmic transparency presents a critical limitation. While some manufacturers (e.g., Polar, Oura) offer theoretical white papers, none provide full validation studies for their CHS. The opaque nature of these scores makes them difficult to evaluate scientifically or to use as dependable clinical tools. Without empirical validation, especially in diverse populations, the utility of CHS as decision-support tools remains speculative.
CHS Applications in Practice
Despite these limitations, wearables with CHS are widely used for self-monitoring. Clients and athletes often use scores to determine training intensity or recovery needs. Some CHS, like WHOOP’s Strain or Samsung’s Energy Score, even include day-to-day guidance. Yet, the lack of individualized thresholds and validated normative data undermines their potential for nuanced feedback. Clinicians should thus regard CHS as a motivational tool rather than a diagnostic instrument.
Implications for Remote Monitoring
CHS have potential in remote patient monitoring, especially in flagging trends in cardiovascular or sleep metrics. However, clinicians should interpret such data cautiously without validated thresholds or consistency in sensor output. Emphasizing raw metrics such as nightly RHR or RMSSD may be more reliable than composite interpretations.
Challenges to Integration in Clinical Settings
For clinical adoption, CHS must be more transparent, reproducible, and validated. Most are inaccessible to researchers due to paywalls (e.g., Oura Resilience, WHOOP Recovery) or lack of documentation. Integration into electronic health records or patient monitoring platforms is premature without standardization. The emphasis should shift to raw data streams, for which there is more empirical support.
Toward Evidence-Based Algorithms
Doherty et al. call for cross-disciplinary collaboration to improve CHS. This includes developing standardized sensor fusion frameworks, disclosing algorithmic logic, and testing indices across age, sex, and clinical groups. Only then can CHS evolve from commercial wellness features to clinically actionable indices.
Conclusion: Use with Caution, Not Certainty
CHS offer a promising bridge between continuous physiological monitoring and user guidance. However, their lack of transparency and validation demands caution. Clinicians should educate clients about what these scores mean—and do not mean. Until rigorous standards are adopted, the value of CHS lies more in behavioral motivation than medical decision-making.
Key Takeaways
CHS simplify complex physiological data but are built on proprietary, non-validated algorithms.
HRV and RHR are the most common contributors, but measurement methods vary across devices.
Lack of transparency limits CHS usefulness for clinical decision-making or research.
CHS can support behavioral change, but should not replace direct physiological interpretation.
Future standards must include open algorithms and population-level validation to increase clinical trust.
Glossary
accelerometry: measurement of movement and acceleration, often used to assess physical activity and sleep.
composite health score (CHS): an index derived from multiple physiological metrics to estimate states like readiness, recovery, or strain.
heart rate (HR): the number of heartbeats per minute, influenced by activity and autonomic tone.
heart rate variability (HRV): variability in time intervals between heartbeats, reflecting autonomic nervous system balance, especially vagal tone.
photoplethysmography (PPG): optical method of detecting blood volume changes, commonly used to estimate HR and HRV.
readiness score: a CHS intended to quantify how prepared a person is for exertion based on sleep, HRV, and other metrics.
recovery score: a CHS reflecting physiological restoration, often using sleep quality and HRV as inputs.
resting heart rate (RHR): HR measured at rest, typically during sleep, used as a marker of cardiovascular efficiency.
RMSSD: root mean square of successive differences between heartbeats; a standard HRV metric indicative of vagal tone.
strain score: a CHS intended to quantify cumulative exertion, incorporating cardiovascular and activity metrics.
Reference
Doherty, C., Baldwin, M., Lambe, R., Burke, D., & Altini, M. (2025). Readiness, recovery, and strain: An evaluation of composite health scores in consumer wearables. Translational Exercise and Biomedicine. https://doi.org/10.1515/teb-2025-0001
Support Our Friends




Comments