Whether you are conducting a self-experiment, or tracking some variables in your life simply out of curiosity, eventually you would want to look at the data and examine it for some meaningful patterns. One of the most common research questions is testing differences: you would like to see if a given variable differs with respect to a certain “grouping” aspect, and whether these differences are statistically significant. For instance, you may want to see if you sleep better on the nights after gym workout, or if a certain diet helps you to lose more weight. In this post, I will provide simple step-by-step instructions for conducting difference test that can be easily done in Excel. I promise to keep demonstration basic and as less technical as possible!
For illustrative purposes, I will be using actual data from my current personal tracking project, #fitsperiment. For the past 2 weeks, my fitness regimen included days on which I would bike to and from work, and take a walk after lunch, and days on which I would have a quick morning workout, and then go to the gym during lunch and after work. Based on what my BodyMedia dashboard has been showing so far, I suspect that I burn more calories on “bike & walk” days than on the “gym” days. This is how my data looks (click on image to enlarge).
The average number of calories burned on “gym” days is 3043, versus 3461 burned on average on “bike & walk” days. It looks like I do burn whopping 400 more calories on “bike & walk” days! But is this difference statistically significant? Now, if you remember the Intro Stats test, you will probably suggest the good old t-test, which is readily available in Excel. To which I will respond: bad idea. You see, Quantified-Self data are somewhat different from what you may have encountered in Psychology 101 course:
- data comes from “single subject” experiments (i.e., there is only one participant – you!)
- data is not normally distributed, and in most cases not even symmetric
- number of data points is usually small (n <30)
In order to check for statistical significance, you will need a more robust, non-parametric test, appropriate for single-subject experimental data with small sample sizes. I recommend using non-parametric version of Hedge’s g test, applied to the ranks instead of the actual values (if you like stats and need more technical details, see the end of this post). To calculate g and check if it is statistically significant, follow these steps:
Preparing Essential Components of the Formula.
Step 1a. Combine both groups into one, and calculate ranks for caloric expenditure values using Excel’s RANK function (make sure to rank values in the ascending order). Then split the ranks by groups (click on the image to enlarge):
Step 1b. Add numbers of data points, means (using AVERAGE function in Excel), and standard deviations SD (using STDEV function):
In my example, standard deviations (SDE)and (SDC) are located in cells I16 and J16, respectively. The corresponding (nE)and (nC) are in cells I14 and J14. I ended up with Pooled SD of approximately 1.80.
My g was approximately 1.76 (if you end up with negative sign, ignore it. It only shows direction of the difference, which we already know).
Computing confidence interval.
Step3a. Using the CONFIDENCE function in Excel, compute the following number: = CONFIDENCE(.05,<insert your Pooled SD here>, insert total N here). In my case, this value was 1.18.
Step 3b. Compute the lower and upper bounds of confidence interval, by subtracting and adding the g and the number above:
lower bound = 1.76 – 1.18 = 0.58
upper bound = 1.76 + 1.18 = 2.94
The 95% confidence thus is (0.58;2.94). It tells me that if I replicate my experiment again and again, in 95% cases the number of calories burned, when expressed as a g, will be between 0.58 and 2.94:
The good news is that the zero 0 (no differences) is not in this interval, which means that in 95% I will always burn more calories on “bike & walk” days than on “gym” days. In other words, the difference is statistically significant. Which means I will have to increase the intensity of my gym workouts
PS. If you are into stats, here are some references behind this post:
The standardized means difference as a best choice for reporting effect size in single subject studies was suggested by Olive M.L., Smith B.W. , Effect size calculations and single subject designs. Educational Psychology 2005; 25:313-324.
Applying Cohen’s d formula to rank-transformed data for robustness was described in Schacht, A., Bogaerts K., Bluhmki E, Lesaffre E, A New Nonparametric Approach for Baseline Covariate Adjustment for Two-Group Comparative Studies. Volume 64, Issue 4, pages 1110-1116, December 2008.
Finally, screenshots of formulas for Hedge’s adjustment of Cohen’s d formula were shamelessly lifted from Joseph A. Durlak, How to Select, Calculate, and Interpret Effect Sizes, Journal of Pediatric Psychology, 34,9, pages 917-928 (link here) .