Creating publication quality graphics using R

grid manipulate

As part of a one-day workshop, I have developped an online tutorial on how to create publication quality graphics using R (from an academic point of view).

The tutorial can be found here

http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/

As mentioned in the tutorial, feel free to send me any feedback, criticism, general comments or bug reports.

Enjoy,

Tim

Btw, the entire tutorial was created using Rmarkdown and knitr. The .Rmd file can be found here

https://github.com/tim-salabim/metvurst/blob/master/markdown/20130617_data_vis_workshop.Rmd

Advertisements
Posted in R | 3 Comments

metvurst now a package, repository moved to GitHub

Inspired by a post on PirateGrunt, I finally managed to pack metvurst up and turn it into a proper R-Package (the fact that I’m on holiday and have some time also helped). As a side-effect of this, the repository has been moved from google code to GitHub. As I use RStudio for developping R-code, this shift seemed inevitable, as the integration of git in the package development tools in RStudio is very handy.

In order to install metvurst you need to have devtools. Making use of devtools install_github() function you can easily install and load metvurst:

library(devtools)
install_github('metvurst', 'tim-salabim')
library(metvurst)

I have tried it on Linux and Mac so far, so in case there are any problems on Windows, please let me know (a quick note if indeed it does work on Windows would be appreciated too).

For now, the two core functions strip() and windContours() along with some helper functions (mainly to convert wind speed and direction to u and v components and vice versa) are included.

The package is fully functional but there is no documentation for now. I will progressively add and update documentation manuals over the next few weeks (maybe months, depending on how busy I am once I return to work).

Have fun using metvurst and in case you have any questions, suggestions or critique don’t hesitate to get in touch.

Cheers

TimSalabim

Posted in Uncategorized | 2 Comments

resizing plot panels to fit data distribution

I am a big fan of lattice/latticeExtra. In fact, nearly all visualisations I have produced so far make use of this great package. The possibilities for customisation are endless and the amount of flexibility it provides is especially valuable for producing visualisations in batch mode/programatically.

Today I needed to visualise some precipitation data for a poster presentation of climate observations at Mt. Kilimanjaro. I wanted to show monthly precipitation observations in relation to long term mean monthly precipitation in order to show which months have been particularly wet or dry.
The important point here is that by combining two different visualisations of the same data, we need to make sure that we make these directly comparable. This means that the scales of the absolute rain amounts and the deviations need to be similar, so we can get an instant impression of the deviation in relation to the absolute amounts.

Here's what I've done with latticeExtra (using mock data):

First, we need some (semi-) random data.

## LOAD PACKAGE
library(latticeExtra, quietly = TRUE)

## CREATE MOCK DATA
# precipitation long term mean
pltmean <- 800
# precipitation long term standard deviation
pltsd <- 200
# precipitation observations
pobs <- rnorm(12, pltmean, pltsd)
# preceipitation deviation from long term mean
pdev <- rnorm(12, 0, 150)
# months
dates <- 1:12

Then we calculate the panel heights to be relative to the (precipitation) data distribution. This is crucial because we want the deviation data to be directly comparable to the observed values.

## CALCULATE RELATIVE PANEL HEIGHTS
y.abs <- max(abs(pobs))
y.dev <- range(pdev)[2] - range(pdev)[1]
yy.aspect <- y.dev/y.abs

Then, we create the bar charts as objects.

## COLOUR
clrs <- rev(brewer.pal(3, "RdBu"))

## CREATE THE PLOT OBJECTS
abs <- barchart(pobs ~ dates, horizontal = FALSE, strip = FALSE, origin = 0,
                between = list(y = 0.3),
                ylab = "Precipitation [mm]", xlab = "Months", col = clrs[1])

dev <- barchart(pdev ~ dates, horizontal = FALSE, origin = 0, 
                col = ifelse(pdev > 0, clrs[1], clrs[length(clrs)]))

Now, we combine the two plot objects into one and also create strips to be plotted at the top of each panel with labels providing some detail about the respective panel.

## COMBINE PLOT OBJECTS INTO ONE AND CREATE CUSTOM STRIPS FOR LABELLING
out <- c(abs, dev, x.same = TRUE, y.same = FALSE, layout = c(1,2))
out <- update(out, scales = list(y = list(rot = 0)), 
              strip = strip.custom(bg = "grey40", 
                                   par.strip.text = list(col = "white", 
                                                         font = 2),
                                   strip.names = FALSE, strip.levels = TRUE, 
                                   factor.levels = c("observed", 
                                                     "deviation from long term monthly mean")))

As a final step, we re-size the panels according to the panel heights calculated earlier.

## RESIZE PANELS RELATIVE TO DATA DISTRIBUTION
out <- resizePanels(out, h = c(1,yy.aspect), w = 1)

And this is what the final product looks like.

## PRINT PLOT
print(out)

plot of chunk unnamed-chunk-6

Note: I suggest you rerun this example a few times to see how the relative panel sizes change with the data distribution (which is randomly created during each run). This highlights the usefulness of such an approach for batch visualisations.

sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] gridBase_0.4-6      abind_1.4-0         fields_6.7         
##  [4] spam_0.29-2         reshape_0.8.4       plyr_1.8           
##  [7] latticeExtra_0.6-19 lattice_0.20-13     RColorBrewer_1.0-5 
## [10] RWordPress_0.2-3    rgdal_0.8-5         raster_2.0-41      
## [13] sp_1.0-5            knitr_1.1          
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    markdown_0.5.4
## [5] RCurl_1.95-3   stringr_0.6.2  tools_2.15.3   XML_3.95-0    
## [9] XMLRPC_0.2-5
Posted in climatology, R, visualisation | 4 Comments

visualising diurnal wind climatologies

In this post I want to highlight the second core function of the metvurst repository (https://github.com/tim-salabim/metvurst):

The windContours function

It is intended to provide a compact overview of the wind field climatology at a location and plots wind direction and speed as a function of the hour of day.

  • direction is plotted as frequencies of occurrences
  • speed is represented by a box plot

The classical approach for visualising wind direction and speed at a location is to plot a wind rose where a stacked barplot of speeds is plotted in a circular coordinates system showing the direction. As an example of a wind rose we take the example from the Wikipedia page for “wind rose”.

alt text

This kind of representation is very useful for getting a general sense of the of the wind field climatology at a location.
You will be able to quickly get information along the lines of. “the x most prevailing wind directions are from here, there and sometimes also over there”. Furthermore, you will also know which of these tends to produce windier conditions. So far, so good.

If you want to learn about possible diurnal flow climatologies, however, there is no way of obtaining this information straight from the wind rose. The common way to achieve visualising such dynamics using wind roses is to plot individual roses for some specific period (e.g. hourly or 3 hourly roses).

Getting information on diurnal climate dynamics is especially important in regions of complex terrain or for coastal locations, where diurnally reversing wind flow patterns are a major climatic feature.

The windContours function is able to deliver such information at a glance.
Consider the case from the last post, where we used hourly meteorological information from a coastal station in Fiji.

First, we (again) read the data from the web at BOM (Australian Bureau Of Meteorology).

## LOAD RColorBrewer FOR COLOR CREATION
library(metvurst)
library(RColorBrewer)

## SET URL FOR DATA DOWNLOAD
url <- "http://www.bom.gov.au/ntc/IDO70004/IDO70004_"

## YEARS TO BE DOWNLOADED
yr <- 1993:2012

## READ DATA FOR ALL YEARS FROM URL INTO LIST
fijilst <- lapply(seq(yr), function(i) {
 read.csv(paste(url, yr[i], ".csv", sep = ""), na.strings = c(-9999, 999))
})

## TURN LIST INTO COMPLETE DATAFRAME AND CONVERT NA STRINGS TO NAs
fiji <- do.call("rbind", fijilst)
fiji[fiji == -9999] <- NA
fiji[fiji == -9999] <- NA
fiji[fiji == 999] <- NA

## CREATE POSIX DATETIME AND CONVERT UTC TO LOCAL FIJI TIME
dts <- as.POSIXct(strptime(fiji$Date...UTC.Time,
 format = "%d-%b-%Y %H:%M")) + 12 * 60 * 60

Next, we need to create a vector of the hours of the recordings in order to plot our hourly climatologies.

## CREATE CONDITIONING VARIABLE (IN THIS CASE HOUR)
hr <- substr(as.character(dts), 12, 13) 

Now, we can use the windContours function straight away to plot the hourly wind direction frequencies and corresponding wind speeds for the observation period 1993 – 2012.
Note, for reproducibility I have posted the function on my personal uni web page, but it is recommended that you get the function from the metvurst repository at git hub.

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             keytitle = "hourly wind frequencies [%]")

plot of chunk unnamed-chunk-3

Just as we would with a wind rose, we easily see that there are 3 main prevailing wind directions:

  1. north north-east
  2. south-east
  3. west north-west

However, we straight away realise that there is a distinct diurnal pattern in these directions with 1. and 2. being nocturnal and 3. denoting the daytime winds. This additional information is captured at a glance.

It is possible to refine the graph in a lot of ways.
In the next few figures I want to highlight a few of the flexibilities that windContours provides:

In order to adjust the x-axis of the speed plot, use the speedlim = parameter

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "hourly wind frequencies [%]")

plot of chunk unnamed-chunk-4

In case you want to change the density of the contour lines you can use spacing =

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "hourly wind frequencies [%]",
             spacing = .5)

plot of chunk unnamed-chunk-5

You can also provide colors through the colour = parameter

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "hourly wind frequencies [%]",
             colour = rev(brewer.pal(11, "Spectral")))

plot of chunk unnamed-chunk-6

You can adjust the number of cuts to be made using ncuts =

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "hourly wind frequencies [%]",
             colour = rev(brewer.pal(11, "Spectral")),
             ncuts = .5)

plot of chunk unnamed-chunk-7

Apart from the design, you can also adjust the position of the x-axis so that the plot is centered north (default is south) using the centre = parameter (allowed values are: “S”, “N”, “E”, “W”)

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "hourly wind frequencies [%]",
             colour = rev(brewer.pal(11, "Spectral")),
             centre = "N")

plot of chunk unnamed-chunk-8

There’s also the possibility to provide an additional variable using add.var = in order to examine the relationship between winds and some other parameter of interest (e.g. concentrations of some pollutant or precipitation or anything really). In doing so, the contours will remain as before (representing wind direction frequencies) and the (colour-)filled part of the left panel will be used to represent the additional variable.
Here, we’re using air temperature (don’t forget to change the keytitle) where we can easily see that cold air mainly comes from south (extra-tropical air masses) and highest temperatures are observed when air flow is from northerly directions (tropical air masses)

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "Temperature [°C]",
             colour = rev(brewer.pal(11, "Spectral")),
             add.var = fiji$Air.Temperature)

plot of chunk unnamed-chunk-9

In case you want to change the smoothing of the contours you can use the smooth.contours = parameter.
The same applies to the (colour-)filled area using smooth.fill =

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "Temperature [°C]",
             colour = rev(brewer.pal(11, "Spectral")),
             add.var = fiji$Air.Temperature,
             smooth.contours = .8,
             smooth.fill = 1.5)

plot of chunk unnamed-chunk-10

In the future I might present some analytical post making use of the windContours function to highlight its usefulness in climate analysis.
For now, I refer the interested reader to Appelhans et al. (2012) where I’m presenting the seasonal wind climatology of Christchurch, New Zealand and also analyse general diurnal patterns of air pollution using windContours.

sessionInfo()

 

## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
##  [7] LC_PAPER=C                 LC_NAME=C
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid      parallel  stats     graphics  grDevices utils     datasets
## [8] methods   base
##
## other attached packages:
##  [1] gridBase_0.4-6      abind_1.4-0         fields_6.7
##  [4] spam_0.29-2         reshape_0.8.4       plyr_1.8
##  [7] latticeExtra_0.6-19 lattice_0.20-13     RColorBrewer_1.0-5
## [10] RWordPress_0.2-3    rgdal_0.8-5         raster_2.0-41
## [13] sp_1.0-5            knitr_1.1
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    markdown_0.5.4
## [5] RCurl_1.95-3   stringr_0.6.2  tools_2.15.3   XML_3.95-0
## [9] XMLRPC_0.2-5
Posted in climatology, R, visualisation | 47 Comments

visualising large amounts of hourly environmental data

It is Sunday, it's raining and I have a few hours to spend before I am invited for lunch at my parents place. Hence, I thought I'd use the time to produce another post. It has been a while since the last post as I have been in Africa for about two months for yet another stint of fieldwork on the great Kilimanjaro mountain…

This time I want to introduce the strip() function which is part of the METVURST suite for plotting meteorological/climatological data (available from https://github.com/tim-salabim/metvurst)

This function is intended to facilitate two things:

  1. to enable plotting of large (climatological) data sets at hourly resolution (the data need to be hourly observations) in a reasonably well defined space (meaning that you won't need pages and pages of paper to print the results).
  2. to display the data in a way that interpretation is possible from hourly to decadal time scales.

The function is much like the calendarHeat() function from Revolution Analytics (see here) though, as mentioned above, it produces plots of hourly data. In fact, calendarHeat() would be a good alternative to use when you intend to plot daily values.
strip() is implemented in lattice (needs latticeExtra to be more precise) and essentially plots a levelplot of hour of day (y-axis) vs. day of year (x-axis).

In detail the function takes the following parameters:

  • x (numeric): Object to be plotted (e.g. temperature).
  • date (character or POSIXct): Date(time) of the observations. Format must be 'YYYY-MM-DD hh:mm(:ss)'
  • fun: The function to be used for aggregation to hourly observations (if original is of higher frequency). Default is mean.
  • cond (factor): Conditioning variable e.g. years.
  • arrange (character): One of “wide” or “long”. Defaults to “long” which provides a layout that is easier to interpret, however, “wide” is better for use in presentation slides.
  • colour (character): a vector of color names. Defaults to rev(brewer.pal(11, “Spectral”))
  • …: Further arguments to be passed to levelplot (see ?lattice::levelplot for options). This can be quite handy to set the legend title using main = "YOUR TEXT HERE" (the legend is plotted on top of the actual plot).

Here's an example using hourly data from Fiji (Station is at Lautoka on the west coast). The data can be freely accessed through the Bureau of Meteorology in Australia (BOM).

First we need to download the data and do some reshaping

## LOAD RColorBrewer FOR COLOR CREATION
library(RColorBrewer)

## SET URL FOR DATA DOWNLOAD
url <- "http://www.bom.gov.au/ntc/IDO70004/IDO70004_"

## YEARS TO BE DOWNLOADED
yr <- 1993:2012

## READ DATA FOR ALL YEARS FROM URL INTO LIST
fijilst <- lapply(seq(yr), function(i) {
  read.csv(paste(url, yr[i], ".csv", sep = ""), na.strings = c(-9999, 999))
  })

## TURN LIST INTO COMPLETE DATAFRAME AND CONVERT NA STRINGS TO NAs
fiji <- do.call("rbind", fijilst)
fiji[fiji == -9999.00] <- NA
fiji[fiji == -9999.0] <- NA
fiji[fiji == 999.0] <- NA

## CREATE POSIX DATETIME AND CONVERT UTC TO LOCAL FIJI TIME
dts <- as.POSIXct(strptime(fiji$Date...UTC.Time, 
                  format = "%d-%b-%Y %H:%M")) + 12 * 60 * 60

## CREATE CONDITIONING VARIABLE (IN THIS CASE YEAR)
year <- substr(as.character(dts), 1, 4)

Now, let's plot the temperatures

## SOURCE FUNCTION (ALSO AVAILABLE AT https://github.com/tim-salabim/metvurst)
source("http://www.staff.uni-marburg.de/~appelhat/r_stuff/strip.R")

## PLOT STRIP FOR TEMPERATURE
strip(x = fiji$Air.Temperature, 
      date = dts,
      cond = year,
      arrange = "long",
      main = "Temperature")
## 
##  Module   :  strip 
##  Author   :  Tim Appelhans <tim.appelhans@gmail.com>, Thomas Nauss 
##  Version  :  2012-01-06 
##  License  :  GNU GPLv3, see http://www.gnu.org/licenses/

plot of chunk unnamed-chunk-2

We can clearly see both the diurnal and the seasonal signal.
Looking at other parameters such as wind direction and speed should also give an interesting insight into the seasonal and diurnal dynamics of the location.

I leave it up to you to explore this further…

I am off to a nice sunday roast now! yumyum

sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] reshape_0.8.4       plyr_1.8            latticeExtra_0.6-19
##  [4] lattice_0.20-13     RColorBrewer_1.0-5  RWordPress_0.2-3   
##  [7] rgdal_0.8-5         raster_2.0-41       sp_1.0-5           
## [10] knitr_1.1          
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    markdown_0.5.4
## [5] RCurl_1.95-3   stringr_0.6.2  tools_2.15.3   XML_3.95-0    
## [9] XMLRPC_0.2-5
Posted in climatology, R, visualisation | 10 Comments

reading raster data using library(parallel)

Recently, I have been doing some analysis for a project I am involved in. In particular, I was interested what role pacific sea surface temperatures play with regard to rainfall in East Africa. I spare you the details as I am currently writing all this up into a paper which you can have a look at once published.

For this analysis, however, I am processing quite an amount of raster files. This led me to investigate the possibilities of the parallel package to speed up the process.

Here's a quick example on how to read in raster data (in this case 460 global sea surface temperature files of 1° x 1° degree resolution) using parallel

First, lets do it the conventional way and see how long that takes

library(raster)
library(rgdal)

### Input preparation ########################################################
inputpath <- "/media/tims_ex/sst_kili_analysis"
ptrn <- "*sst_anom_pcadenoise_*_R2.rst"

### list files in direcotry ##################################################
fnames_sst_r2 <- list.files(inputpath, 
                            pattern = glob2rx(ptrn), 
                            recursive = T)

### read into raster format ##################################################
system.time({
  sst.global <- lapply(seq(fnames_sst_r2), function(i) {
    raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
    }
                       )
  })
##    user  system elapsed 
##  61.584   0.412  68.104

Now using library(parallel)

library(parallel)

system.time({
  ### set up cluster call ######################################################
  cl <- makePSOCKcluster(4)

  clusterExport(cl, varlist = c("inputpath", "fnames_sst_r2"), 
                envir=environment())
  junk <- clusterEvalQ(cl, c(library(raster),
                             library(rgdal)))

  ### read into raster format using parallel version of lapply #################
  sst.global.p <- parLapply(cl, seq(fnames_sst_r2), function(i) {
    raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
    }
                          )

  ### stop the cluster #########################################################
  stopCluster(cl)
  })
##    user  system elapsed 
##   0.152   0.080  25.670

Not a crazy speed enhancement, but we need to keep in mind that the raster command does not read into memory. Hence, the speed improvements should be a lot higher once we start the calculations or plotting.

Finally, let's test whether the two methods produce identical results.

identical(sst.global.p, sst.global)
## [1] TRUE

to be continued…

sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] rgdal_0.8-5   raster_2.0-41 sp_1.0-5      knitr_1.1    
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3    evaluate_0.4.3  formatR_0.7     grid_2.15.3    
## [5] lattice_0.20-13 stringr_0.6.2   tools_2.15.3
Posted in R | Leave a comment

renaming data frame columns in lists

OK, so the scenario is as follows:

  • we have a list of 2 elements which in turn are again lists with 2 elements (each of which is a data frame).
  • None of the elements in question carry names (neither the list entries nor the data frames)
  • we want to only set the names of the data frames that are buried 2 levels down the main list

First we create some mock data that resembles the scenario (mimicking temperature and relative humidity observations during January and February 2010)

## create 2 mock months
date_jan <- as.Date(seq(1, 31, 1), origin = "2010-01-01")
date_feb <- as.Date(seq(1, 28, 1), origin = "2010-02-01")

## create mock observations for the months
Ta_200_jan <- rnorm(31, 10, 3)
Ta_200_feb <- rnorm(28, 11, 3)
rH_200_jan <- rnorm(31, 75, 10)
rH_200_feb <- rnorm(28, 70, 10)


df1 <- data.frame(V1 = date_jan, V2 = Ta_200_jan)
df2 <- data.frame(V1 = date_jan, V2 = rH_200_jan)
df3 <- data.frame(V1 = date_feb, V2 = Ta_200_feb)
df4 <- data.frame(V1 = date_feb, V2 = rH_200_feb)

lst <- list(list(df1, df2), list(df3, df4))

So now we have a list of two elements which are again a list of 2 which is made up of 2 data frames each.
None of these elements are named (actually the columns of the data frames are named V1 and V2 – which is not very informative).

This is what the list structure looks like:

str(lst)
## List of 2
##  $ :List of 2
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ V1: Date[1:31], format: "2010-01-02" ...
##   .. ..$ V2: num [1:31] 9.95 15.49 9.45 12.16 8.84 ...
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ V1: Date[1:31], format: "2010-01-02" ...
##   .. ..$ V2: num [1:31] 70.4 87.6 69.6 80.2 59 ...
##  $ :List of 2
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ V1: Date[1:28], format: "2010-02-02" ...
##   .. ..$ V2: num [1:28] 11.95 8.42 13.06 9.55 10.76 ...
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ V1: Date[1:28], format: "2010-02-02" ...
##   .. ..$ V2: num [1:28] 78.7 63.9 62.6 67.5 73.5 ...

Now we define the names to set

name.x <- c("Date")
name.y <- c("Ta_200", "rH_200")

And finally, we use lapply() to recursively set the column names of the data frames within the list of lists
The crux is to define a data frame (y) at iteration 2 which is subsequently returned (and as lapply() always returns a list, we again get a list of lists)

lst <- lapply(seq(lst), function(i) {
    lapply(seq(name.y), function(j) {
        y <- data.frame(lst[[i]][[j]])
        names(y) <- c(name.x, name.y[j])
        return(y)
    })
})

And this is what we end up with:

str(lst)
## List of 2
##  $ :List of 2
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ Date  : Date[1:31], format: "2010-01-02" ...
##   .. ..$ Ta_200: num [1:31] 9.95 15.49 9.45 12.16 8.84 ...
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ Date  : Date[1:31], format: "2010-01-02" ...
##   .. ..$ rH_200: num [1:31] 70.4 87.6 69.6 80.2 59 ...
##  $ :List of 2
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ Date  : Date[1:28], format: "2010-02-02" ...
##   .. ..$ Ta_200: num [1:28] 11.95 8.42 13.06 9.55 10.76 ...
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ Date  : Date[1:28], format: "2010-02-02" ...
##   .. ..$ rH_200: num [1:28] 78.7 63.9 62.6 67.5 73.5 ...

Problem solved!

we now have a list of lists with named columns for each data frame with correct labels for date and parameter of the observations!

PS: if you wanted to name the first level entries of the list according to the month of observation, this would do the job:

names(lst) <- c("January", "February")

str(lst)
## List of 2
##  $ January :List of 2
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ Date  : Date[1:31], format: "2010-01-02" ...
##   .. ..$ Ta_200: num [1:31] 9.95 15.49 9.45 12.16 8.84 ...
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ Date  : Date[1:31], format: "2010-01-02" ...
##   .. ..$ rH_200: num [1:31] 70.4 87.6 69.6 80.2 59 ...
##  $ February:List of 2
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ Date  : Date[1:28], format: "2010-02-02" ...
##   .. ..$ Ta_200: num [1:28] 11.95 8.42 13.06 9.55 10.76 ...
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ Date  : Date[1:28], format: "2010-02-02" ...
##   .. ..$ rH_200: num [1:28] 78.7 63.9 62.6 67.5 73.5 ...

I leave it up to your imagination how to set the names of the second level list entries…

sessionInfo()
## R version 2.15.2 (2012-10-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.1
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    stringr_0.6.2 
## [5] tools_2.15.2
Posted in R | 1 Comment