Why “Quantified Self” and “Big Data” Do Not Pair Well

Check out Measured You Store for great deals on tracking gadgets and apps!

Champagne-and-chocolatesOnce in awhile (and recently more and more often) I would see someone writing or talking about Quantified Self movement and using words “Big Data” in the same sentence. I have been expressing my discontent with this “pairing” for quite a while now, and finally, decided to write this post. Today I would like to to explain how Big Data and Quantified Self, just like chocolate and champagne, do not pair together well. In fact, these two lie on the opposite ends of a conceptual spectrum, and should not be confused or mixed with each other. It is my understanding that this confusion comes from the fact that Quantified Self movement is currently seen from three different perspectives. The first perspective, “n=1”, is of IT, technology and data folks. The second perspective, “n=you” is of marketers and pharmaceutical and health insurance companies. And the third perspective, “n=me”, is of self-trackers and self-experimenters. It is the  “n=me” perspective that represents the very premise and nature of self-tracking and Quantified Self; that’s how this movement started, and that’s how it should continue to be. Let me now know explain why.

In his fictional novel “The Seven Minutes”, Irving Wallace describes court trial of a book that was banned due to purported erotic content. One of the strategies that persecution pursues is based on the legal notion that if an average person deems the book offensive, it then can be legally recognized as one.  So persecutors actually put on the witness stand a woman that, according to their scientific advisers, represents that “average” American reader, as defined by Census-drive statistical profile:

..’’Since we are concerned in this trial with a book we have tested and found that the average reader of books among the average citizens in our communities is a female..She is Caucasian, she is Protestant, she has had at least twelve years of formal education – a decade ago the average woman had had only ten years of education. She is twenty-four years old. She is five feet four inches tall and weighs one hundred thirty pounds. She was married at the age of twenty to a man two years older than herself. She has two children. She and her husband share one car and the same religious faith. She attends church twice a month. Her husband has a manual or a service job, and he earns 17,114 a year. Our average woman resides in an urban area, a city under one hundred thousand in population, which qualifies Oakwood to supply this woman. She has a five-room home worth $11,900. Half of the house is mortgaged… The average woman spends seven hours a day performing her household chores, three of these hours in the kitchen. There you have her, sir. That is an actual profile.”

Needless to say, the strategy did not work. During the questioning it became clear that the reading habits, tastes and even knowledge of the literature of this “average” woman does not necessarily reflect those of the general population.

The science has always been concerned with the universal patterns: relationships and associations that exist and hold for all people. Thus, its operational principle has been to dump a lot of people into one big group, in hope that individual differences will cancel each other out, and the common patterns would emerge. The individual differences, psychological or biological, are considered a noise that is to be ignored or suppressed. The resulting data clump is then used as a proxy of an “average subject”, whose reactions to experimental treatment or observed characteristics would then be generalized for the entire population of interest. This approach seems to work most the time, but not for everyone. The closer you resemble “Mr. Average” (can one even use gender when speaking of Average?), the more chances that treatment will work for you. The farther you are from the middle, the more chances that treatment will be less effective or you will react with some side effects. The farthest 5-10% are left to the chance. The long tail = fail, so to speak, no matter how many “control variables” (gender, age, race, income, personality traits, etc.), or hierarchical levels or segments we keep adding to our statistical models.

Where I am going with this? Well, from the “n=1” and “n=you” perspectives, self-trackers are just single data points, “small data” droplets that are intended to end up in a big data bucket, with the ultimate goal of generalization. The goal is to get data from many self-trackers, aggregate it, adjust for the “individuality noise”, and mine for “golden nuggets”: for solutions to health problems, best way to market pills and convey advertising messages, etc. etc. The problem with this approach is that self-tracking data is not generalizable. People who track and self-experiment in order to address a particular health condition are not representative of the rest of people with the same condition.  By definition, self-trackers  are different from other people with regard to mentality, psychological traits, lifestyles, behaviors, etc. So even if we derive a certain pattern based on a data from a hundred, thousand or even five thousand self-trackers with diabetes, that pattern won’t necessarily hold for all other people with diabetes. The “average self-tracker” does not equal “average person”. Moreover, the chances are that this pattern won’t hold equally well (if hold at all) for individual self-trackers themselves. The “average self-tracker” does not equal any self-tracker. The “me-factor” that defined that individual and all those me-related “nuances” of the relationship between X and Y, was removed by means of means (pun intended).

Mathematically speaking, the “n=me” perspective is based on the idea that data of “n=me” could be used to find relationship yme = fme(xme). In plain words, I use MY personal data to find that uniquely MY relationship between MY X and MY Y. That link between X and Y may be different for other person, or may not exist at all. So the sole goal of Quantified Self movement should be developing tools and methods that would allow each of “me” to collect and analyze “my” data in order to derive insights that are applicable to “me”. Any attempt to aggregate data from different “me”s will result in just a bunch of poor-quality data, that, in turn, will lead to unreliable and non-generalizable results.

Here, I said it 🙂

PS here is an excellent response from Neal at Urban Mining blog

Related Posts Plugin for WordPress, Blogger...
Print Friendly
Measured Me Recommends:
Best Apps for Self-Tracking: rTracker and Track & Share
Product of the Month:
Inner Balance HRV and Stress Sensor

Buy directly from HeartMath or shop on Amazon

5 Responses to Why “Quantified Self” and “Big Data” Do Not Pair Well

  1. Doug says:

    Good point.

    The “basket” of self-trackers doesn’t fit the general population at all.

    I had heart bypass surgery years ago(genetic issues), and remember the surgeon saying two things to me:

    (1) The odds of anything happening here are <5%. But if you're in that 5%, you're screwed.

    (2) My profile was so un-typical from your average heart patient, that the odds didn't apply regardless. There was no group from which to extract odds. Young, healthy, athletic, no other conditions….

  2. Eric Jain says:

    There are always issues with aggregating data, even from reliable sources… But what’s wrong with using the knowledge that a certain diet tends to work better for people with a certain genotype (or other common characteristics) when deciding which diet to try next? Surely that beats choosing a diet based on what’s in the bestseller list, or based on anecdotal evidence from friends?

  3. I’m not so pessimistic about the potential for big data to provide n=1 insight. For example, how do you even form hypotheses to test in your self experiments? Personally, I look around for things that have worked on average groups in trials or things that other QS/biohackers have tried successfully. If big data allows us to improve our hypothesis set for n=1 experimentation, then this is a benefit.

    Secondly, I think we will have if not “big”, at least “moderate” data for each of us. We will each be ecosystems of data. Imagine what will be collectible with monitors embedded in our skin and circulatory system. Continuous monitoring of BP, body temp, oxigenation, biomarkers, hormone levels, the sky’s the limit. To use this data well will require some serious firepower, and combining it will others will yield insight. We are all different, but we are all human. We have a lot of commonalities too. This type of data will actually help elucidate what are those commonalities and what features are highly variable.

    So I guess Im optimistic about the potential.

  4. phillip dane says:

    thanks for sharing an opinion and putting out a position. i, however, must respectfully disagree with the premise. wallace’s “average citizen” is a much broader definition of population than a marketer’s interest in the frequency with which the “average” 45 year-old affluent suburban female will replace a pair of running shoes – or an insurer’s interest in the activity levels of an insured population. in this instance, the term big data is certainly applicable. big data, by most definitions, refers to the “V’s” – Volume, Velocity, Variety, and (increasingly) Veracity – and the combination of these factors which produces data that cannot be properly analyzed by “traditional” tools and techniques big data lacks a set of universally-agreed specific thresholds, i would submit that big data is, at best, a concept – one which is useful in communicating the need for the specific analytic tools you mention. indeed, the “n=me” group would probably agree with you that better tools are needed to analyze the data dominating from each of their experiments and measurements – making even a relatively small volume of data to one individual an example of big data in the aggregate.

  5. Dawn Nafus says:

    Thank you for this post–really interesting. Your distinction between n=me and n=you is especially helpful, I think. What I hear when I read this post isn’t that big data is useless per se, but that its not the same thing as n=me. There is alot of contextualizing and interpretation work that happens in n=me: selecting which data matters, the mindfulness that happens as that data becomes apparent in the moment, etc.. As social creatures, the numbers that surround us will always inform how we interpret our own data, but the leap between “personalization” as brought to us by big data and personalization as brought to us by us is a really big leap. One doesn’t magically become the other.

    There is some sociology of science research that suggests the possibility (and this may be a stretch) that “big data” research could also potentially be done in QS-like ways, but again, it isn’t automatic, and requires the researcher to have a certain perspective.

Leave a Reply

Your email address will not be published. Required fields are marked *

+ 5 = 10