Introducing Rsenal – magic R functions for things various

Today, I would like to quickly introduce our package “Rsenal”. It is basically a collection of R functions that we use in our daily lives at Environmental Informatics Marburg. The package is hosted at GitHub and can be installed using the install_github() function from package devtools.

This package is the opposite of a 'general purpose' package, yet there are some functions that may be of interest to the wider R user community, for example latticeCombineGrid() and latticeCombineLayer(). The idea behind these functions is to provide tools to easily combine lattice plot objects that are stored in a list. These may, for example, result from iterative processes using lapply().

Below is an example for latticeCombineGrid() using the panel.2dsmoother() example from latticeExtra modified such that we loop over four different standard deviation values in a lapply() loop.

library(Rsenal)
library(latticeExtra)

sds <- c(1, 2, 3, 4)

clrs <- colorRampPalette(brewer.pal(9, "RdBu"))

## example taken from 
## http://latticeextra.r-forge.r-project.org/#panel.2dsmoother&theme=default
## looping over 4 different standard deviations - sds
p_list <- lapply(seq(sds), function(i) {
  set.seed(1)
  xyz <- data.frame(x = rnorm(100), y = rnorm(100))
  xyz$z <- with(xyz, x * y + rnorm(100, sd = sds[i]))

  p <- levelplot(z ~ x * y, xyz, panel = panel.2dsmoother,
                 col.regions = clrs(1000), at = seq(-5, 5, 0.1),
                 aspect = 1)
  return(p)
  })

p_final <- latticeCombineGrid(p_list)
print(p_final)

plot of chunk unnamed-chunk-1

By default, panels are separated slightly (which I find a little more optically pleasing) and plotting order is from top left to bottom right (for lattice the default is bottom left to top right). Under the hood, Reduce() is used to combine the plot objects stored in a list.

latticeCombineLayer() works in a similar way, but lets you plot the different objects on top of each other.

Another useful function for package developers is bumpVersion() which lets you modify the Version: number and automatically changes the Date: entry in in your package DESCRPTION file. That said, it assumes that you use semantic software versioning. I usually commit all changes I made first and then bumpVersion() in a separate commit.

I hope that some people may find the fucntion collection in Rsenal useful.

Please provide feedback, feature requests, suggestions and bug reports here.

Posted in R | Leave a comment

Reot: Empirical Orthogonal Teleconnections in R

We are happy to introduce Reot, an R package designed for empirical orthogonal teleconnection (EOT) analysis of gridded geo-scientific space-time data based on the method by van den Dool et al. (2000). EOT denotes a regression-based approach to decompose spatio-temporal fields into a set of independent orthogonal patterns. In contrast to the classical approach of Empirical Orthogonal Functions (EOF), which are orthogonal in space and time, EOT analysis produces patterns that are orthogonal in either space or time (the current implementation of Reot provides the latter).

To identify these patterns, the time series of each pixel pp of the predictor domain is regressed against the time series of all pixels pr in the response domain (in case of a single field pr = pp – 1). The pixel with the highest sum of its coefficients of determination is defined as the ’base point’ of the first/leading mode. The time series of this point is the first/leading EOT. The next EOT is calculated on the residuals of the previous step, thus ensuring orthogonality of the modes. This procedure is repeated until a user-specified number of EOTs is identified.

Given that the amount of regressions to be calculated can be extremely large, we implemented the core regression functions of Reot in C++ (via Rcpp) to ensure acceptable computation times. All input and output is based on Raster* classes to ensure seamless integration with existing analysis tools and work-flows. Reot is available from CRAN. The development version is hosted on GitHub.

Examples

Example 1: Winter mean 700 mb height over the Northern Hemisphere

As a first example, we replicate one of the examples from van den Dool et al. (2000) (Example 3.d. – Figure 6). A spatio-temporal field of 700 mb geopotential heights of NCEP/NCAR Reanalysis grids Kalnay et al. (1996) is decomposed into its four leading modes exhibiting the prominent patterns of North Atlantic Oscillation (NAO) and Pacific-North American Pattern (PNA) as modes 1 and 2, respectively. The climatologically inclined reader is referred to the respective Section in van den Dool et al. (2000) for a more detailed description of the atmospheric dynamics and processes associated with the identified patterns. Here, we merely want to highlight, that the Reot implementation of the algorithm produces similar results to the original implementation by van den Dool et al. (2000).

library("rworldmap")
library("rgdal")
library("rgeos")
library("latticeExtra")
library("gridExtra")
library("Reot")

### load the necessary data
data("vdendool")
data("coastsCoarse")

### calculate the first four EOT modes
modes <- eot(pred = vdendool, resp = NULL, n = 4, 
             reduce.both = FALSE, standardised = FALSE, 
             print.console = TRUE)
## 
## Calculating linear model ... 
## Locating 1. EOT ...
## Location: -35 77.67 
## Cum. expl. variance (%): 21.53 
## 
## Calculating linear model ... 
## Locating 2. EOT ...
## Location: -165 43.15 
## Cum. expl. variance (%): 38.17 
## 
## Calculating linear model ... 
## Locating 3. EOT ...
## Location: 85 67.81 
## Cum. expl. variance (%): 46.08 
## 
## Calculating linear model ... 
## Locating 4. EOT ...
## Location: -25 53.01 
## Cum. expl. variance (%): 53.07
### set appropriate projection
ster <- CRS("+proj=stere +lat_0=90 +lon_0=-45")

xmin <- -180
xmax <- 180
ymin <- 20
ymax <- 90     # Coordinates for bounding box
bb <- cbind(x = c(xmin, xmin, xmax, xmax, xmin), 
            y = c(ymin, ymax, ymax, ymin, ymin))
SP <- SpatialPolygons(list(Polygons(list(Polygon(bb)), 
                                    "1")), 
                      proj4string = CRS(proj4string(coastsCoarse)))

gI <- gIntersects(coastsCoarse, SP, byid = TRUE) 
out <- vector(mode = "list", length = length(which(gI))) 
ii <- 1

for (i in seq(along = gI)) if (gI[i]) {
  out[[ii]] <- gIntersection(coastsCoarse[i, ], SP)
  row.names(out[[ii]]) <- row.names(coastsCoarse)[i]
  ii <- ii + 1
  }

nhem.coasts <- do.call("rbind", out)
nhem.coasts.ster <- spTransform(nhem.coasts, ster) 

lout <- list("sp.lines", nhem.coasts.ster, 
             col = "grey30", grid = TRUE)

### define colors
clrs <- colorRampPalette(rev(brewer.pal(9, "RdBu")))

### re-project modes
mode <- lapply(seq(modes), function(i) {
  projectRaster(modes[[i]]$r.predictor, crs = ster)
  })

### create title for each image
titles <- lapply(seq(mode), function(i) {
  paste("Mode ", i, " : ", "EV = ", 
        round(if (i > 1) {
          modes[[i]]$exp.var * 100 - 
            modes[[i - 1]]$exp.var * 100
          } else {
            modes[[i]]$exp.var * 100
            }, 1), " : ", "BP = ", 
        as.integer(modes[[i]]$loc.eot[, 1]), 
        ", ", as.integer(modes[[i]]$loc.eot[, 2]), 
        sep = "")
  })

### create plots
p <- lapply(seq(mode), function(i) {
  spplot(mode[[i]], sp.layout = lout, 
         main = list(titles[[i]], cex = 0.7),
         col.regions = clrs(1000), at = seq(-1, 1, 0.2),
         par.settings = list(axis.line = list(col = 0)),
         colorkey = list(height = 0.75, width = 1))
  })

### arrange and plot
f <- function(...) grid.arrange(..., heights = 1, ncol = 2)
do.call(f, p)

plot of chunk unnamed-chunk-1

Even though the location of the identified base points (BP) is somewhat offset, and hence the explained variance (EV) figures differ slightly compared to van den Dool (2000), it is obvious that the isolated patterns are very similar and represent the same signals. We can only speculate as to why the base point locations differ slightly, but potential reasons may include different version numbers of the reanalysis data, rounding discrepancies between the utilised programming languages (especially when summing up the coefficients of determination) and slight differences in geographic projections.

Example 2: Identifying tropical Pacific SST drivers for Australian precipitation

The processes of precipitation development are complex and not yet understood completely. The physical state of the atmosphere, which determines whether rain occurs or not at any point in space and time, is the result of a multitude of constantly changing factors. Influences range from local to hemispheric boundary conditions in all 4 dimensions (incl. time).

Some areas of the global oceans exhibit low-frequency anomaly signals which can influence precipitation variability world-wide. The most prominent example of a coupled ocean-atmosphere tropical SST variability is ENSO. ENSO has received much attention in the scientific literature since the major 1982 – 83 El Nino. Here we investigate, whether EOT analysis can be used to identify the ENSO signal as a driver for low-frequency Australian precipitation variability over the period 1982 to 2010. The data sets needed for this analysis are included in Reot. In order to reveal low-frequency signals such as ENSO, we need to prepare the raw data fields so that high-frequency variation is eliminated. We achieve this by creating seasonal anomalies using deseson() and by denoise()-ing the data to filter out some of the noise that is present in any spatio-temporal data field.

The first 3 leading modes of SSTs most influential for Australian rainfall variability can be calculated with:

data("australiaGPCP")
data("pacificSST")

sst.pred <- deseason(pacificSST, cycle.window = 12)
gpcp.resp <- deseason(australiaGPCP, cycle.window = 12)

sst.pred.dns <- denoise(sst.pred, expl.var = 0.9)
## 
## Using the first 19 components (of 348) to reconstruct series...
##  these account for 0.9 of variance in orig. series
gpcp.resp.dns <- denoise(gpcp.resp, expl.var = 0.9)
## 
## Using the first 37 components (of 348) to reconstruct series...
##  these account for 0.9 of variance in orig. series
modes <- eot(pred = sst.pred.dns, resp = gpcp.resp.dns, 
             n = 3, standardised = FALSE, 
             reduce.both = FALSE, print.console = FALSE)

As we can see, especially the principal components filter from denoise() is an important step, as we need only 19 (37) of the original 348 components for the SST (GPCP) data to explain 90 % of the respective inherent field variance.
To get a visual impression, the results for the first leading mode can be plotted using the standard Reot plotting routine plotEot():

plotEot(modes, eot = 1, 
        show.eot.loc = TRUE, 
        arrange = "long")

plot of chunk unnamed-chunk-3

We see that we are indeed able to isolate the ENSO signal as the most important SST driver in the tropical Pacific (EOT 1) for Australian precipitation. This signal is able to explain just above 4 % of the original variation found in rainfall over the analysed period. This may not seem much, but we need to keep in mind that precipitation is influenced by many factors, with local conditions playing a major role. Spatially, mainly the north-eastern part of the response domain is being explained with some locations showing negative correlations of up to 0.4. With regard to mainland Australia, it becomes obvious that the identified ENSO signal is not able to explain any rainfall variation in the inner-continental parts of the land mass. It is mainly the coastal areas that are influenced by the ENSO phenomenon. Note, that our analysis did not take into account any time-lags between the SST anomalies and precipitation. Even though in this particular example lagging does not increase the explanatory power of the SST signal (not shown), it can be expected that in many cases the influence will not manifest instantaneously and that a certain lag time will explain a higher portion of the rainfall variance.

Final remarks

We have just submitted a manuscript to the Journal of Statistical Software that describes Reot in much more detail. I will announce more on Reot once the final version of the paper is out, including another example of spatially downscaling NDVI observations from 8 km resolution to 250 m resolution.

For now, I would like to encourage people to try Reot in many different ways and applications and share their feedback with us. Feature requests and/or bug reports should be made on GitHub.

References

Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Chelliah M, Ebisuzaki W, Higgins W, Janowiak J, Mo K, Ropelewski C, Wang J, Leetmaa A, Reynolds R, Jenne R, Joseph D (1996). “The NCEP/NCAR 40-year reanalysis project.” Bulletin of the American Meteorological Society, 77(3), 437 – 471.

van den Dool HM, Saha S, Johansson A (2000). “Empirical orthogonal teleconnections.” Journal of Climate, 13(8), 1421 – 1435. URL http://journals.ametsoc.org/doi/abs/10.1175/1520-0442(2000)013%3C1421%3AEOT%3E2.0.CO%3B2.

Posted in climatology, R, space-time analysis | 4 Comments

Update to metvurst

Here's a quick update to metvurst in response to some issues encountered over the last weeks.

The most important change is that the strip() function now returns the plot object rather than printing it. This means that we can work with it afterwards. Given that all plotting is implemented using lattice/latticeExtra this seems natural and is now behaving accordingly. To highlight one advance of this new behaviour, let's revisit the earlier post, where I introduced the function and modify it slightly so that we end up with two graphs on one page to facilitate comparisons between variables.

First, we need some data (again from Fiji – this time only for 1993 and 1994)

## LOAD METVURST PACKAGE
library(metvurst)

## SET URL FOR DATA DOWNLOAD
url <- "http://www.bom.gov.au/ntc/IDO70004/IDO70004_"

## YEARS TO BE DOWNLOADED
yr <- 1993:1994

## READ DATA FOR ALL YEARS FROM URL INTO LIST
fijilst <- lapply(seq(yr), function(i) {
  read.csv(paste(url, yr[i], ".csv", sep = ""), na.strings = c(-9999, 999))
})

## TURN LIST INTO COMPLETE DATAFRAME AND CONVERT NA STRINGS TO NAs
fiji <- do.call("rbind", fijilst)
fiji[fiji == -9999.00] <- NA
fiji[fiji == -9999.0] <- NA
fiji[fiji == 999.0] <- NA

## CREATE POSIX DATETIME AND CONVERT UTC TO LOCAL FIJI TIME
Sys.setenv(TZ = "UTC") # set environment to UTC before conversion
dts <- as.POSIXct(strptime(fiji$Date...UTC.Time, 
                           format = "%d-%b-%Y %H:%M")) + 12 * 60 * 60

## CREATE CONDITIONING VARIABLE (IN THIS CASE YEAR)
year <- substr(as.character(dts), 1, 4)

We are now able to create plot objects (i.e. store the visualisation in an object) rather than print them straight away and use them later…

## CREATE STRIP FOR WATER TEMPERATURE
plot.water.temp <- strip(x = fiji$Water.Temperature, 
                         date = dts,
                         cond = year,
                         arrange = "long",
                         main = "Water Temperature [°C]")

## CREATE STRIP FOR AIR TEMPERATURE
plot.air.temp <- strip(x = fiji$Air.Temperature, 
                         date = dts,
                         cond = year,
                         arrange = "long",
                         main = "Air Temperature [°C]")

Now we can use these two objects and plot them on one page using grid

grid.newpage()

### define first plotting region (viewport)
vp1 <- viewport(x = 0, y = 1, 
                height = 0.5, width = 1,
                just = c("left", "top"),
                name = "top")

### enter vp1 
pushViewport(vp1)

### plot a plot - needs to be printed (and newpage set to FALSE)!!!
print(plot.water.temp, newpage = FALSE)

### leave vp1 - up one level (into root vieport)
upViewport(1)

### define second plot area
vp2 <- viewport(x = 0, y = 0, 
                height = 0.5, width = 1,
                just = c("left", "bottom"),
                name = "bottom")

### enter vp2
pushViewport(vp2)

### plot another plot
print(plot.air.temp, newpage = FALSE)

### destroy vp2 (as we're finished here)
popViewport()

plot of chunk unnamed-chunk-3

This is rather nice, as it enables direct comparisons between two variables. In this case we see that water temperature does exhibit the same seasonal behaviour as air temperature, whereas the diurnal signal is virtually non-existent… I hope this will spark some imagination for your own usage (e.g. comparisons of two climate station records or the like).

Note, the 3rd year we see here is a result of the conversion of UTC time to local Fiji time (we added 12 hours to UTC, hence end up with the first 12 hours of 1995)

sessionInfo()
## R version 3.0.1 (2013-05-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] RWordPress_0.2-3    reshape_0.8.4       plyr_1.8           
## [4] latticeExtra_0.6-24 lattice_0.20-21     RColorBrewer_1.0-5 
## [7] metvurst_1.0        knitr_1.2          
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.8    RCurl_1.95-4.1
## [5] stringr_0.6.2  tools_3.0.1    XML_3.98-1.1   XMLRPC_0.3-0
Posted in climatology, R, visualisation | 4 Comments

Creating publication quality graphics using R

grid manipulate

As part of a one-day workshop, I have developped an online tutorial on how to create publication quality graphics using R (from an academic point of view).

The tutorial can be found here

http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/

As mentioned in the tutorial, feel free to send me any feedback, criticism, general comments or bug reports.

Enjoy,

Tim

Btw, the entire tutorial was created using Rmarkdown and knitr. The .Rmd file can be found here

https://github.com/tim-salabim/metvurst/blob/master/markdown/20130617_data_vis_workshop.Rmd

Posted in R | 3 Comments

metvurst now a package, repository moved to GitHub

Inspired by a post on PirateGrunt, I finally managed to pack metvurst up and turn it into a proper R-Package (the fact that I’m on holiday and have some time also helped). As a side-effect of this, the repository has been moved from google code to GitHub. As I use RStudio for developping R-code, this shift seemed inevitable, as the integration of git in the package development tools in RStudio is very handy.

In order to install metvurst you need to have devtools. Making use of devtools install_github() function you can easily install and load metvurst:

library(devtools)
install_github('metvurst', 'tim-salabim')
library(metvurst)

I have tried it on Linux and Mac so far, so in case there are any problems on Windows, please let me know (a quick note if indeed it does work on Windows would be appreciated too).

For now, the two core functions strip() and windContours() along with some helper functions (mainly to convert wind speed and direction to u and v components and vice versa) are included.

The package is fully functional but there is no documentation for now. I will progressively add and update documentation manuals over the next few weeks (maybe months, depending on how busy I am once I return to work).

Have fun using metvurst and in case you have any questions, suggestions or critique don’t hesitate to get in touch.

Cheers

TimSalabim

Posted in Uncategorized | 2 Comments

resizing plot panels to fit data distribution

I am a big fan of lattice/latticeExtra. In fact, nearly all visualisations I have produced so far make use of this great package. The possibilities for customisation are endless and the amount of flexibility it provides is especially valuable for producing visualisations in batch mode/programatically.

Today I needed to visualise some precipitation data for a poster presentation of climate observations at Mt. Kilimanjaro. I wanted to show monthly precipitation observations in relation to long term mean monthly precipitation in order to show which months have been particularly wet or dry.
The important point here is that by combining two different visualisations of the same data, we need to make sure that we make these directly comparable. This means that the scales of the absolute rain amounts and the deviations need to be similar, so we can get an instant impression of the deviation in relation to the absolute amounts.

Here's what I've done with latticeExtra (using mock data):

First, we need some (semi-) random data.

## LOAD PACKAGE
library(latticeExtra, quietly = TRUE)

## CREATE MOCK DATA
# precipitation long term mean
pltmean <- 800
# precipitation long term standard deviation
pltsd <- 200
# precipitation observations
pobs <- rnorm(12, pltmean, pltsd)
# preceipitation deviation from long term mean
pdev <- rnorm(12, 0, 150)
# months
dates <- 1:12

Then we calculate the panel heights to be relative to the (precipitation) data distribution. This is crucial because we want the deviation data to be directly comparable to the observed values.

## CALCULATE RELATIVE PANEL HEIGHTS
y.abs <- max(abs(pobs))
y.dev <- range(pdev)[2] - range(pdev)[1]
yy.aspect <- y.dev/y.abs

Then, we create the bar charts as objects.

## COLOUR
clrs <- rev(brewer.pal(3, "RdBu"))

## CREATE THE PLOT OBJECTS
abs <- barchart(pobs ~ dates, horizontal = FALSE, strip = FALSE, origin = 0,
                between = list(y = 0.3),
                ylab = "Precipitation [mm]", xlab = "Months", col = clrs[1])

dev <- barchart(pdev ~ dates, horizontal = FALSE, origin = 0, 
                col = ifelse(pdev > 0, clrs[1], clrs[length(clrs)]))

Now, we combine the two plot objects into one and also create strips to be plotted at the top of each panel with labels providing some detail about the respective panel.

## COMBINE PLOT OBJECTS INTO ONE AND CREATE CUSTOM STRIPS FOR LABELLING
out <- c(abs, dev, x.same = TRUE, y.same = FALSE, layout = c(1,2))
out <- update(out, scales = list(y = list(rot = 0)), 
              strip = strip.custom(bg = "grey40", 
                                   par.strip.text = list(col = "white", 
                                                         font = 2),
                                   strip.names = FALSE, strip.levels = TRUE, 
                                   factor.levels = c("observed", 
                                                     "deviation from long term monthly mean")))

As a final step, we re-size the panels according to the panel heights calculated earlier.

## RESIZE PANELS RELATIVE TO DATA DISTRIBUTION
out <- resizePanels(out, h = c(1,yy.aspect), w = 1)

And this is what the final product looks like.

## PRINT PLOT
print(out)

plot of chunk unnamed-chunk-6

Note: I suggest you rerun this example a few times to see how the relative panel sizes change with the data distribution (which is randomly created during each run). This highlights the usefulness of such an approach for batch visualisations.

sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] gridBase_0.4-6      abind_1.4-0         fields_6.7         
##  [4] spam_0.29-2         reshape_0.8.4       plyr_1.8           
##  [7] latticeExtra_0.6-19 lattice_0.20-13     RColorBrewer_1.0-5 
## [10] RWordPress_0.2-3    rgdal_0.8-5         raster_2.0-41      
## [13] sp_1.0-5            knitr_1.1          
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    markdown_0.5.4
## [5] RCurl_1.95-3   stringr_0.6.2  tools_2.15.3   XML_3.95-0    
## [9] XMLRPC_0.2-5
Posted in climatology, R, visualisation | 4 Comments

visualising diurnal wind climatologies

In this post I want to highlight the second core function of the metvurst repository (https://github.com/tim-salabim/metvurst):

The windContours function

It is intended to provide a compact overview of the wind field climatology at a location and plots wind direction and speed as a function of the hour of day.

  • direction is plotted as frequencies of occurrences
  • speed is represented by a box plot

The classical approach for visualising wind direction and speed at a location is to plot a wind rose where a stacked barplot of speeds is plotted in a circular coordinates system showing the direction. As an example of a wind rose we take the example from the Wikipedia page for “wind rose”.

alt text

This kind of representation is very useful for getting a general sense of the of the wind field climatology at a location.
You will be able to quickly get information along the lines of. “the x most prevailing wind directions are from here, there and sometimes also over there”. Furthermore, you will also know which of these tends to produce windier conditions. So far, so good.

If you want to learn about possible diurnal flow climatologies, however, there is no way of obtaining this information straight from the wind rose. The common way to achieve visualising such dynamics using wind roses is to plot individual roses for some specific period (e.g. hourly or 3 hourly roses).

Getting information on diurnal climate dynamics is especially important in regions of complex terrain or for coastal locations, where diurnally reversing wind flow patterns are a major climatic feature.

The windContours function is able to deliver such information at a glance.
Consider the case from the last post, where we used hourly meteorological information from a coastal station in Fiji.

First, we (again) read the data from the web at BOM (Australian Bureau Of Meteorology).

## LOAD RColorBrewer FOR COLOR CREATION
library(metvurst)
library(RColorBrewer)

## SET URL FOR DATA DOWNLOAD
url <- "http://www.bom.gov.au/ntc/IDO70004/IDO70004_"

## YEARS TO BE DOWNLOADED
yr <- 1993:2012

## READ DATA FOR ALL YEARS FROM URL INTO LIST
fijilst <- lapply(seq(yr), function(i) {
 read.csv(paste(url, yr[i], ".csv", sep = ""), na.strings = c(-9999, 999))
})

## TURN LIST INTO COMPLETE DATAFRAME AND CONVERT NA STRINGS TO NAs
fiji <- do.call("rbind", fijilst)
fiji[fiji == -9999] <- NA
fiji[fiji == -9999] <- NA
fiji[fiji == 999] <- NA

## CREATE POSIX DATETIME AND CONVERT UTC TO LOCAL FIJI TIME
dts <- as.POSIXct(strptime(fiji$Date...UTC.Time,
 format = "%d-%b-%Y %H:%M")) + 12 * 60 * 60

Next, we need to create a vector of the hours of the recordings in order to plot our hourly climatologies.

## CREATE CONDITIONING VARIABLE (IN THIS CASE HOUR)
hr <- substr(as.character(dts), 12, 13) 

Now, we can use the windContours function straight away to plot the hourly wind direction frequencies and corresponding wind speeds for the observation period 1993 – 2012.
Note, for reproducibility I have posted the function on my personal uni web page, but it is recommended that you get the function from the metvurst repository at git hub.

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             keytitle = "hourly wind frequencies [%]")

plot of chunk unnamed-chunk-3

Just as we would with a wind rose, we easily see that there are 3 main prevailing wind directions:

  1. north north-east
  2. south-east
  3. west north-west

However, we straight away realise that there is a distinct diurnal pattern in these directions with 1. and 2. being nocturnal and 3. denoting the daytime winds. This additional information is captured at a glance.

It is possible to refine the graph in a lot of ways.
In the next few figures I want to highlight a few of the flexibilities that windContours provides:

In order to adjust the x-axis of the speed plot, use the speedlim = parameter

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "hourly wind frequencies [%]")

plot of chunk unnamed-chunk-4

In case you want to change the density of the contour lines you can use spacing =

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "hourly wind frequencies [%]",
             spacing = .5)

plot of chunk unnamed-chunk-5

You can also provide colors through the colour = parameter

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "hourly wind frequencies [%]",
             colour = rev(brewer.pal(11, "Spectral")))

plot of chunk unnamed-chunk-6

You can adjust the number of cuts to be made using ncuts =

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "hourly wind frequencies [%]",
             colour = rev(brewer.pal(11, "Spectral")),
             ncuts = .5)

plot of chunk unnamed-chunk-7

Apart from the design, you can also adjust the position of the x-axis so that the plot is centered north (default is south) using the centre = parameter (allowed values are: “S”, “N”, “E”, “W”)

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "hourly wind frequencies [%]",
             colour = rev(brewer.pal(11, "Spectral")),
             centre = "N")

plot of chunk unnamed-chunk-8

There’s also the possibility to provide an additional variable using add.var = in order to examine the relationship between winds and some other parameter of interest (e.g. concentrations of some pollutant or precipitation or anything really). In doing so, the contours will remain as before (representing wind direction frequencies) and the (colour-)filled part of the left panel will be used to represent the additional variable.
Here, we’re using air temperature (don’t forget to change the keytitle) where we can easily see that cold air mainly comes from south (extra-tropical air masses) and highest temperatures are observed when air flow is from northerly directions (tropical air masses)

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "Temperature [°C]",
             colour = rev(brewer.pal(11, "Spectral")),
             add.var = fiji$Air.Temperature)

plot of chunk unnamed-chunk-9

In case you want to change the smoothing of the contours you can use the smooth.contours = parameter.
The same applies to the (colour-)filled area using smooth.fill =

windContours(hour = hr,
             wd = fiji$Wind.Direction,
             ws = fiji$Wind.Speed,
             speedlim = 12,
             keytitle = "Temperature [°C]",
             colour = rev(brewer.pal(11, "Spectral")),
             add.var = fiji$Air.Temperature,
             smooth.contours = .8,
             smooth.fill = 1.5)

plot of chunk unnamed-chunk-10

In the future I might present some analytical post making use of the windContours function to highlight its usefulness in climate analysis.
For now, I refer the interested reader to Appelhans et al. (2012) where I’m presenting the seasonal wind climatology of Christchurch, New Zealand and also analyse general diurnal patterns of air pollution using windContours.

sessionInfo()

 

## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
##  [7] LC_PAPER=C                 LC_NAME=C
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid      parallel  stats     graphics  grDevices utils     datasets
## [8] methods   base
##
## other attached packages:
##  [1] gridBase_0.4-6      abind_1.4-0         fields_6.7
##  [4] spam_0.29-2         reshape_0.8.4       plyr_1.8
##  [7] latticeExtra_0.6-19 lattice_0.20-13     RColorBrewer_1.0-5
## [10] RWordPress_0.2-3    rgdal_0.8-5         raster_2.0-41
## [13] sp_1.0-5            knitr_1.1
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    markdown_0.5.4
## [5] RCurl_1.95-3   stringr_0.6.2  tools_2.15.3   XML_3.95-0
## [9] XMLRPC_0.2-5
Posted in climatology, R, visualisation | 45 Comments

visualising large amounts of hourly environmental data

It is Sunday, it's raining and I have a few hours to spend before I am invited for lunch at my parents place. Hence, I thought I'd use the time to produce another post. It has been a while since the last post as I have been in Africa for about two months for yet another stint of fieldwork on the great Kilimanjaro mountain…

This time I want to introduce the strip() function which is part of the METVURST suite for plotting meteorological/climatological data (available from https://github.com/tim-salabim/metvurst)

This function is intended to facilitate two things:

  1. to enable plotting of large (climatological) data sets at hourly resolution (the data need to be hourly observations) in a reasonably well defined space (meaning that you won't need pages and pages of paper to print the results).
  2. to display the data in a way that interpretation is possible from hourly to decadal time scales.

The function is much like the calendarHeat() function from Revolution Analytics (see here) though, as mentioned above, it produces plots of hourly data. In fact, calendarHeat() would be a good alternative to use when you intend to plot daily values.
strip() is implemented in lattice (needs latticeExtra to be more precise) and essentially plots a levelplot of hour of day (y-axis) vs. day of year (x-axis).

In detail the function takes the following parameters:

  • x (numeric): Object to be plotted (e.g. temperature).
  • date (character or POSIXct): Date(time) of the observations. Format must be 'YYYY-MM-DD hh:mm(:ss)'
  • fun: The function to be used for aggregation to hourly observations (if original is of higher frequency). Default is mean.
  • cond (factor): Conditioning variable e.g. years.
  • arrange (character): One of “wide” or “long”. Defaults to “long” which provides a layout that is easier to interpret, however, “wide” is better for use in presentation slides.
  • colour (character): a vector of color names. Defaults to rev(brewer.pal(11, “Spectral”))
  • …: Further arguments to be passed to levelplot (see ?lattice::levelplot for options). This can be quite handy to set the legend title using main = "YOUR TEXT HERE" (the legend is plotted on top of the actual plot).

Here's an example using hourly data from Fiji (Station is at Lautoka on the west coast). The data can be freely accessed through the Bureau of Meteorology in Australia (BOM).

First we need to download the data and do some reshaping

## LOAD RColorBrewer FOR COLOR CREATION
library(RColorBrewer)

## SET URL FOR DATA DOWNLOAD
url <- "http://www.bom.gov.au/ntc/IDO70004/IDO70004_"

## YEARS TO BE DOWNLOADED
yr <- 1993:2012

## READ DATA FOR ALL YEARS FROM URL INTO LIST
fijilst <- lapply(seq(yr), function(i) {
  read.csv(paste(url, yr[i], ".csv", sep = ""), na.strings = c(-9999, 999))
  })

## TURN LIST INTO COMPLETE DATAFRAME AND CONVERT NA STRINGS TO NAs
fiji <- do.call("rbind", fijilst)
fiji[fiji == -9999.00] <- NA
fiji[fiji == -9999.0] <- NA
fiji[fiji == 999.0] <- NA

## CREATE POSIX DATETIME AND CONVERT UTC TO LOCAL FIJI TIME
dts <- as.POSIXct(strptime(fiji$Date...UTC.Time, 
                  format = "%d-%b-%Y %H:%M")) + 12 * 60 * 60

## CREATE CONDITIONING VARIABLE (IN THIS CASE YEAR)
year <- substr(as.character(dts), 1, 4)

Now, let's plot the temperatures

## SOURCE FUNCTION (ALSO AVAILABLE AT https://github.com/tim-salabim/metvurst)
source("http://www.staff.uni-marburg.de/~appelhat/r_stuff/strip.R")

## PLOT STRIP FOR TEMPERATURE
strip(x = fiji$Air.Temperature, 
      date = dts,
      cond = year,
      arrange = "long",
      main = "Temperature")
## 
##  Module   :  strip 
##  Author   :  Tim Appelhans <tim.appelhans@gmail.com>, Thomas Nauss 
##  Version  :  2012-01-06 
##  License  :  GNU GPLv3, see http://www.gnu.org/licenses/

plot of chunk unnamed-chunk-2

We can clearly see both the diurnal and the seasonal signal.
Looking at other parameters such as wind direction and speed should also give an interesting insight into the seasonal and diurnal dynamics of the location.

I leave it up to you to explore this further…

I am off to a nice sunday roast now! yumyum

sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] reshape_0.8.4       plyr_1.8            latticeExtra_0.6-19
##  [4] lattice_0.20-13     RColorBrewer_1.0-5  RWordPress_0.2-3   
##  [7] rgdal_0.8-5         raster_2.0-41       sp_1.0-5           
## [10] knitr_1.1          
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    markdown_0.5.4
## [5] RCurl_1.95-3   stringr_0.6.2  tools_2.15.3   XML_3.95-0    
## [9] XMLRPC_0.2-5
Posted in climatology, R, visualisation | 10 Comments

reading raster data using library(parallel)

Recently, I have been doing some analysis for a project I am involved in. In particular, I was interested what role pacific sea surface temperatures play with regard to rainfall in East Africa. I spare you the details as I am currently writing all this up into a paper which you can have a look at once published.

For this analysis, however, I am processing quite an amount of raster files. This led me to investigate the possibilities of the parallel package to speed up the process.

Here's a quick example on how to read in raster data (in this case 460 global sea surface temperature files of 1° x 1° degree resolution) using parallel

First, lets do it the conventional way and see how long that takes

library(raster)
library(rgdal)

### Input preparation ########################################################
inputpath <- "/media/tims_ex/sst_kili_analysis"
ptrn <- "*sst_anom_pcadenoise_*_R2.rst"

### list files in direcotry ##################################################
fnames_sst_r2 <- list.files(inputpath, 
                            pattern = glob2rx(ptrn), 
                            recursive = T)

### read into raster format ##################################################
system.time({
  sst.global <- lapply(seq(fnames_sst_r2), function(i) {
    raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
    }
                       )
  })
##    user  system elapsed 
##  61.584   0.412  68.104

Now using library(parallel)

library(parallel)

system.time({
  ### set up cluster call ######################################################
  cl <- makePSOCKcluster(4)

  clusterExport(cl, varlist = c("inputpath", "fnames_sst_r2"), 
                envir=environment())
  junk <- clusterEvalQ(cl, c(library(raster),
                             library(rgdal)))

  ### read into raster format using parallel version of lapply #################
  sst.global.p <- parLapply(cl, seq(fnames_sst_r2), function(i) {
    raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
    }
                          )

  ### stop the cluster #########################################################
  stopCluster(cl)
  })
##    user  system elapsed 
##   0.152   0.080  25.670

Not a crazy speed enhancement, but we need to keep in mind that the raster command does not read into memory. Hence, the speed improvements should be a lot higher once we start the calculations or plotting.

Finally, let's test whether the two methods produce identical results.

identical(sst.global.p, sst.global)
## [1] TRUE

to be continued…

sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] rgdal_0.8-5   raster_2.0-41 sp_1.0-5      knitr_1.1    
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3    evaluate_0.4.3  formatR_0.7     grid_2.15.3    
## [5] lattice_0.20-13 stringr_0.6.2   tools_2.15.3
Posted in R | Leave a comment

renaming data frame columns in lists

OK, so the scenario is as follows:

  • we have a list of 2 elements which in turn are again lists with 2 elements (each of which is a data frame).
  • None of the elements in question carry names (neither the list entries nor the data frames)
  • we want to only set the names of the data frames that are buried 2 levels down the main list

First we create some mock data that resembles the scenario (mimicking temperature and relative humidity observations during January and February 2010)

## create 2 mock months
date_jan <- as.Date(seq(1, 31, 1), origin = "2010-01-01")
date_feb <- as.Date(seq(1, 28, 1), origin = "2010-02-01")

## create mock observations for the months
Ta_200_jan <- rnorm(31, 10, 3)
Ta_200_feb <- rnorm(28, 11, 3)
rH_200_jan <- rnorm(31, 75, 10)
rH_200_feb <- rnorm(28, 70, 10)


df1 <- data.frame(V1 = date_jan, V2 = Ta_200_jan)
df2 <- data.frame(V1 = date_jan, V2 = rH_200_jan)
df3 <- data.frame(V1 = date_feb, V2 = Ta_200_feb)
df4 <- data.frame(V1 = date_feb, V2 = rH_200_feb)

lst <- list(list(df1, df2), list(df3, df4))

So now we have a list of two elements which are again a list of 2 which is made up of 2 data frames each.
None of these elements are named (actually the columns of the data frames are named V1 and V2 – which is not very informative).

This is what the list structure looks like:

str(lst)
## List of 2
##  $ :List of 2
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ V1: Date[1:31], format: "2010-01-02" ...
##   .. ..$ V2: num [1:31] 9.95 15.49 9.45 12.16 8.84 ...
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ V1: Date[1:31], format: "2010-01-02" ...
##   .. ..$ V2: num [1:31] 70.4 87.6 69.6 80.2 59 ...
##  $ :List of 2
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ V1: Date[1:28], format: "2010-02-02" ...
##   .. ..$ V2: num [1:28] 11.95 8.42 13.06 9.55 10.76 ...
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ V1: Date[1:28], format: "2010-02-02" ...
##   .. ..$ V2: num [1:28] 78.7 63.9 62.6 67.5 73.5 ...

Now we define the names to set

name.x <- c("Date")
name.y <- c("Ta_200", "rH_200")

And finally, we use lapply() to recursively set the column names of the data frames within the list of lists
The crux is to define a data frame (y) at iteration 2 which is subsequently returned (and as lapply() always returns a list, we again get a list of lists)

lst <- lapply(seq(lst), function(i) {
    lapply(seq(name.y), function(j) {
        y <- data.frame(lst[[i]][[j]])
        names(y) <- c(name.x, name.y[j])
        return(y)
    })
})

And this is what we end up with:

str(lst)
## List of 2
##  $ :List of 2
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ Date  : Date[1:31], format: "2010-01-02" ...
##   .. ..$ Ta_200: num [1:31] 9.95 15.49 9.45 12.16 8.84 ...
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ Date  : Date[1:31], format: "2010-01-02" ...
##   .. ..$ rH_200: num [1:31] 70.4 87.6 69.6 80.2 59 ...
##  $ :List of 2
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ Date  : Date[1:28], format: "2010-02-02" ...
##   .. ..$ Ta_200: num [1:28] 11.95 8.42 13.06 9.55 10.76 ...
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ Date  : Date[1:28], format: "2010-02-02" ...
##   .. ..$ rH_200: num [1:28] 78.7 63.9 62.6 67.5 73.5 ...

Problem solved!

we now have a list of lists with named columns for each data frame with correct labels for date and parameter of the observations!

PS: if you wanted to name the first level entries of the list according to the month of observation, this would do the job:

names(lst) <- c("January", "February")

str(lst)
## List of 2
##  $ January :List of 2
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ Date  : Date[1:31], format: "2010-01-02" ...
##   .. ..$ Ta_200: num [1:31] 9.95 15.49 9.45 12.16 8.84 ...
##   ..$ :'data.frame': 31 obs. of  2 variables:
##   .. ..$ Date  : Date[1:31], format: "2010-01-02" ...
##   .. ..$ rH_200: num [1:31] 70.4 87.6 69.6 80.2 59 ...
##  $ February:List of 2
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ Date  : Date[1:28], format: "2010-02-02" ...
##   .. ..$ Ta_200: num [1:28] 11.95 8.42 13.06 9.55 10.76 ...
##   ..$ :'data.frame': 28 obs. of  2 variables:
##   .. ..$ Date  : Date[1:28], format: "2010-02-02" ...
##   .. ..$ rH_200: num [1:28] 78.7 63.9 62.6 67.5 73.5 ...

I leave it up to your imagination how to set the names of the second level list entries…

sessionInfo()
## R version 2.15.2 (2012-10-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.1
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    stringr_0.6.2 
## [5] tools_2.15.2
Posted in R | 1 Comment