Last month I looked at my sleep data produced by Bodymedia and Sleep Time app, statistically comparing their sleep quality scores with each other, and with my own subjective sleep assessment. In October and first weeks of November, I replicated experiment, adding another device – Zeo. With over 30 nights of data, I finally was able to look at four different metrics, side by side, to see how comparable and interchangeable they are. The results will surprise you.
First, let’s take a look at what kind of data is actually available from each of the devices:
*while these numbers are not reported directly, they can be easily estimated by visually examining the dashboard graphics. Each white stripe on the dashboard represents a sleep interruption, and the length of each interruption can be seen by hovering the mouse over the stripe.
As you can see, each of the devices offers a “summary” score that quantitatively reflects the quality of sleep: sleep efficiency for Bodymedia and Sleep Time app, and ZQ score for Zeo. This is how all three scores look like when I put them all in one table:
Of course, comparing ZQ against the other two metrics using descriptive statistics is like comparing oranges and apples: in calculation of ZQ, Zeo takes in consideration total sleep, deep sleep, REM sleep, wake time and number of times woken up. The sleep efficiency, at the other hand, is just a ratio of time slept versus total time spent in bed. Still, if these 3 metrics measure the same construct (sleep quality), theoretically they should have a monotonous relationship, and be at least moderately and positively correlated. In other words, if I slept better today than yesterday, then all my three scores today should be higher than yesterday, and the change in the sleep quality should be reflected relatively in the same degree in all three metrics. To test this theory, I computed Spearman’s rank-order correlations among three scores:
Contrary to my theoretical expectations, none of the correlations was significant (the numbers in the round brackets are 95% confidence intervals)! If you recall, I had similar results last time, comparing Bodymedia and Sleep Time app, but I expected that bringing in Zeo would add more clarity to which of those two metrics is more accurate. It is possible that ZQ score represents conceptually different aspect of sleep quality than sleep efficiency. But then why there is no concordance among the Bodymedia and Sleep Time scores? Perhaps, it has to do with the measurement error? The more variables are involved in the calculation, the more measurement error (“noise”) we potentially introduce. But what if we look at just one single variable, common to all three devices? How about total time slept? This is how all three time slept estimates look like when you compare them in terms of descriptive statistics:
Not bad at all! On average, all three total time slept scores are very close. As you can see, the Bodymedia estimates are notably lower, which is interesting. My theory is Bodymedia’s accelerometer is more sensitive than iphone’s and Zeo’s, so it captures even slight movements during the sleep. Now let’s take a look at the correlations:
In all three pairs, associations are positive and are of moderate strength. Considering that it is based on a small sample (only 35 nights), I think we can safely assume that when it comes to measuring total time slept, all three metrics are comparable.
What if we narrow it down even further, and look at the deep and REM sleep estimates? The Bodymedia is out, since it does not provide such a detailed information. Sleep Time app combines both deep and REM time in one number, so I did the same for Zeo numbers:
As you can see, the average and median estimates are considerably different, and the correlation is not statistically significant. In other words, I would not call Zeo and Sleep Time app estimates of deep+REM sleep comparable. Finding, which of these estimates is more reliable and accurate would require additional experiments (I am working on it).
Finally, I compared all three metrics to the fourth: my subjective assessment of sleep, computed as an arithmetic average of responses to two questions: “was the sleep long enough” and “how well did you sleep”, each rated on 10-point scale (1 = not at all, 10 = extremely). And, just for fun, I included four more variable that do not represent sleep quality directly, but nevertheless are expected to be correlated with it: my subjective scores of physical and mental energy, stress and mood in the morning, measured on 10-point scale (* marks the statistically significant correlation):
In spite of non-significant correlations, these results suggest that total time slept estimates are more useful in terms of predictive validity/value than the summary scores. Bodymedia statistics for total time slept was particularly interesting. I plan to continue this validation experiment in December and January, and add some cognitive performance data (reaction time, attention, memory) to the mix.
Overall, it looks like sleep efficiency and ZQ scores measure different “kinds” of sleep quality. Perhaps, sleep efficiency is a good way to measure how fast I fall asleep and how uninterrupted my sleep is. The ZQ, on the other hand, captures more granular aspects, like deep and REM sleep, and may be potentially a better predictor of cognitive performance, etc.
- the summary sleep quality statistics (ZQ score and sleep efficiency) produced by Zeo, Sleep Time app and Bodymedia tracker were found to be statistically incomparable (at least, for me). It is possible that these metrics reflect different aspects of sleep quality.
- the estimates of total time slept, on the other hand, were very close and correlated across all three methods. I can with certainty say that Zeo, Sleep Time app and Bodymedia can replace each other
- the analysis of deep+REM sleep estimates by Zeo and Sleep Time app for accuracy and reliability was inconclusive
- based on preliminary results, the predictive value of summary statistics was very weak and inconsistent. Again, it looks like sleep efficiency and ZQ measure different “kinds” of sleep quality, and thus, are sensitive to different factors.
- The predictive value of total time slept estimates, on the other hand, was much better across all three tracking methods.