Augmenting Daily Log with Weather Data

Sat, May 30, 2020 3-minute read 480 words

I was always curious if the weather has any effects on my everyday life. To discover any patterns, I would first need to incorporate weather data in my daily logs. In this post, I show the easiest way to do it.

Weather has been shown to influence various aspects of human lives, including physical and mental health, productivity, and social behavior. Some of these effects are obvious, others - subtle. Extreme temperature fluctuations have been shown to affect our immune systems. A lot of people (including myself) report sleeping better at nights when it rains or snows. I personally tend to experience mild depression on cloudy and rainy days, while plenty of sunshine usually affects my mood positively.

To validate these patterns empirically, I first need to include weather data in my daily logs. Recording them manually is not an option. A much easier solution is to augment my daily logs with historical weather data.

Meet “riem” - an R package that pulls weather data from Automated Surface Observing Systems (ASOS) from all over the world. To get data for my location, I first identify the country/state network and station closest to me:

# I choose New York ASOS

# since I live in upper Manhattan, but work downtown, I choose NYC station

# reading weather data for NYC

You can see the full list of variables along with the descriptions here. I am only interested in the following:

  • timestamp
  • air temperature
  • relative humidity
  • pressure (in inches)
  • visibility (miles)
  • sky coverage (I chose the lowest level 1)
  • weather conditions

The sky coverage levels capture cloud coverage in terms “octas” (1/8 of the sky): CLR (clear),FEW (1-2 octas),SCT (scattered, 3-4 octas), BKN (broken, 5-7 octas), and OVC (overcast, full coverage). I collapse these further to “clear” (sclear or few clouds), “cloudy”(scattered or broken), and “overcast”.

The weather codes include rain (RA, FZRA, etc), snow (SN), fog (FG), and a number of other conditions that may necessarily apply to NYC area. For simplicity, I only kept rain and snow.

Finally, since weather data is reported hourly, I roll them up to the daypart levels: AM (midnight-noon), DA (noon-5 pm), and EV (5 pm - midnight). Thus, all numeric metrics become averages, and all qualitative indicators turn into probabilities. I also normalize and center all numeric values:

weatherlog<-rawlog %>% 
            select(valid,tmpf,relh,alti,vsby,skyc1,wxcodes) %>% 
            mutate(dmy=as.Date(valid)) %>% 
            mutate(hour=hour(strptime(valid,"%Y-%m-%d %H:%M:%S"))) %>% 
            mutate(daypart=as.factor(ifelse(hour < 12, "AM",ifelse(hour>17,"EV", "DA")))) %>% 
            mutate(clear=as.numeric(grepl("CLR|FEW",skyc1))) %>% 
            mutate(cloudy=as.numeric(grepl("SCT|BKN",skyc1))) %>%   
            mutate(overcast=as.numeric(grepl("OVC",skyc1))) %>%  
            mutate(rain = as.numeric(grepl("RA|TS",wxcodes))) %>% 
            mutate(snow = as.numeric(grepl("SN",wxcodes))) %>%   
            group_by(dmy,daypart) %>% 
            mutate_at(vars(tmpf,relh,alti,vsby,clear,cloudy,overcast,rain,snow),funs(mean),na.rm=TRUE) %>% 
            ungroup %>% 
            distinct(dmy,daypart,tmpf,relh,alti,vsby,clear,cloudy,overcast,rain,snow) %>% 
            mutate_if(is.numeric,funs(normalize)) %>% 
            mutate_if(is.numeric,funs(scale(.,center=TRUE,scale=FALSE))) %>% 
            mutate_if(is.numeric,round,2) %>% 
            filter(LOC_Other < 1) #removing moments outside New York City


Just for fun, here is the chart of my happiness vs average temperature, by daypart:

ggplot(lifeweatherlog, aes(x=dmy)) + 
  geom_line(aes(y = happiness), color = "blue") + 
  geom_line(aes(y = tmpf), color="maroon")+
  ggtitle("Average Happiness vs Air Temperature")

happiness vs average temperature