## Personal Analytics 101: Testing Differences in Your Data

Whether you are conducting a self-experiment, or tracking some variables in your life simply out of curiosity, eventually you would want to look at the data and examine it for some meaningful patterns. One of the most common research questions is **testing differences**: you would like to see if a given variable differs with respect to a certain “grouping” aspect, and whether these differences are statistically significant. For instance, you may want to see if you sleep better on the nights after gym workout, or if a certain diet helps you to lose more weight. In this post, I will provide simple step-by-step instructions for conducting difference test that can be easily done in Excel. I promise to keep demonstration basic and as less technical as possible!

For illustrative purposes, I will be using actual data from my current personal tracking project, #fitsperiment. For the past 2 weeks, my fitness regimen included days on which I would bike to and from work, and take a walk after lunch, and days on which I would have a quick morning workout, and then go to the gym during lunch and after work. Based on what my BodyMedia dashboard has been showing so far, I suspect that I burn more calories on “bike & walk” days than on the “gym” days. This is how my data looks (click on image to enlarge).

**The average number of calories burned on “gym” days is 3043, versus 3461 burned on average on “bike & walk” days**. It looks like I do burn whopping 400 more calories on “bike & walk” days! But is this difference statistically significant? Now, if you remember the Intro Stats test, you will probably suggest the good old t-test, which is readily available in Excel. To which I will respond: bad idea. You see, Quantified-Self data are somewhat different from what you may have encountered in Psychology 101 course:

- data comes from “single subject” experiments (i.e., there is only one participant – you!)
- data is not normally distributed, and in most cases not even symmetric
- number of data points is usually small (n <30)

In order to check for statistical significance, you will need a more robust, non-parametric test, appropriate for single-subject experimental data with small sample sizes. I recommend using non-parametric version of *Hedge’s g* test, applied to the ranks instead of the actual values (if you like stats and need more technical details, see the end of this post). To calculate *g* and check if it is statistically significant, follow these steps:

**Preparing Essential Components of the Formula.**

**Step 1a.** Combine both groups into one, and calculate ranks for caloric expenditure values using Excel’s RANK function (make sure to rank values in the ascending order). Then split the ranks by groups (click on the image to enlarge):

**Step 1b**. Add numbers of data points, means (using AVERAGE function in Excel), and standard deviations SD (using STDEV function):

**Computing Hedge’s g.**

**Step 2a.** Using cells, highlighted in light blue and the following formula, compute SD Pooled value:

In my example, standard deviations (SD_{E})and (SD_{C}) are located in cells I16 and J16, respectively. The corresponding (n_{E})and (n_{C}) are in cells I14 and J14. I ended up with Pooled SD of approximately 1.80.

**Step 2b**. Using the cells highlighted in light red, with the means (ME)and (MC)in cells I15 and J15, and N in cell G14, you can now calculate value of g:

My g was approximately 1.76 (if you end up with negative sign, ignore it. It only shows direction of the difference, which we already know).

**Computing confidence interval.**

**Step3a.** Using the CONFIDENCE function in Excel, compute the following number: = CONFIDENCE(.05,<insert your Pooled SD here>, insert total N here). In my case, this value was 1.18.

**Step 3b.** Compute the *lower* and *upper bounds* of confidence interval, by subtracting and adding the g and the number above:

**lower bound = 1.76 – 1.18 = 0.58**

**upper bound = 1.76 + 1.18 = 2.94**

The 95% confidence thus is (0.58;2.94). It tells me that if I replicate my experiment again and again, in 95% cases the number of calories burned, when expressed as a g, will be between 0.58 and 2.94:

The good news is that the zero ** 0** (no differences) is not in this interval, which means that in 95% I will always burn more calories on “bike & walk” days than on “gym” days. In other words, the difference is statistically significant. Which means I will have to increase the intensity of my gym workouts

**PS. If you are into stats, here are some references behind this post:**

The standardized means difference as a best choice for reporting effect size in single subject studies was suggested by Olive M.L., Smith B.W. , *Effect size calculations and single subject designs*. Educational Psychology 2005; 25:313-324.

Applying Cohen’s d formula to rank-transformed data for robustness was described in Schacht, A., Bogaerts K., Bluhmki E, Lesaffre E, *A New Nonparametric Approach for Baseline Covariate Adjustment for Two-Group Comparative Studies*. Volume 64, Issue 4, pages 1110-1116, December 2008.

Finally, screenshots of formulas for Hedge’s adjustment of Cohen’s d formula were shamelessly lifted from Joseph A. Durlak, *How to Select, Calculate, and Interpret Effect Sizes*, Journal of Pediatric Psychology, 34,9, pages 917-928 (link here) .

I don’t think that QS data is much different from other data:

1. all experimental data has n = 1 in dozens of ways, such n = 1 school or n = 1 company or n = 1 month, etc. Why you think n = 1 person is importantly different from n = 1 school or n = 1 whatever is unclear.

2. almost all data is non-normal and asymmetric.

3. A QS dataset may be large or small. Lots of non-QS data involves small datasets.

When faced with non-normal data, I usually transform the data to near-normality and then use parametric tests. There are many advantages to working with near-normal data (e.g., more informative graphs), not just ease of statistical testing.

You omit the most important data analysis step: plot the data.

Hello Dr. Roberts,

Thanks a lot for your comment! What I was trying to achieve with this post is to describe the test that would be appropriate in any conditions, regardless of the data; hence, the choice of non-parametric test, and Hedge’s adjustment (which won’t affect results if the sample size is large, but will be helpful in case of small n). I guess, the choice of data for the example was a bit misleading; I should have chosen one of the attitudinal scales (e.g., RPE of workouts, or mood), to demonstrate the flexibility of non-parametric test. But I completely agree, transformation to normal improves the visual representation of data. I will definitely cover the ways to transform distributions in my future posts. I also love your use of plots in your papers, and will try to include some plots in the future!

This is an informative and well-written post. If I were teaching statistics I’d make my students read it, especially the description of the confidence interval – there’s a lot of misunderstanding around that.

It also shows the value of BodyMedia. I am planning on getting one and I will do so through your referral post.

I would be curious to see what type of exercise is most efficient. For example, look at the ratio of calories burned per day to time spent exercising that day.

Love the blog, keep up the good work, many people will benefit from reading it. I know I have already benefited from it. This blog will definitely influence the decisions I make.

Thank you, Jay, I am glad you liked the post. Yes, I am not a big fan of old-fashioned p-value. The confidence interval is much more informative way to both test and gauge the effect size.

I highly recommend Bodymedia, and like your way of thinking. In fact, that is the kind of experiments I was conducting myself: trying to test efficiency of exercises. You can’t really isolate the caloric expenditure of individual exercises, but if you organize them into routines-sets of exercises, then you can really see the differences on the Bodymedia dashboard. That’s how I ended up switching from weight lifting to plyometric exercises. And don’t forget subjective measures lie RPE (rated perceived exertion), I am almost done collecting data and will be looking at the correlations soon between the RPE and actual amount of calories burned. Your idea of looking at the ratio of calories burned to time spent exercising that day sounds intriguing, and you can get all that information from the Bodymedia dashboard, too.

The less statistics the better, no?