Introducing the ‘gimms’ package

This is a guest post by Florian Detsch

What it is all about

With the most recent update of the AVHRR GIMMS data collection to NDVI3g (Pinzon and Tucker, 2014), we decided to create a package from all functions we have written so far to download and process GIMMS binary files from the NASA ECOCAST server. The package is called gimms and features a collection of fundamental work steps required to get the data into R:

  • updateInventory to list all GIMMS files available online and
  • rearrangeFiles to sort (online or local) files by date,
  • downloadGimms to download selected files,
  • rasterizeGimms to import the binary data as 'Raster*' objects into R and
  • monthlyComposite to aggregate the bi-monthly datasets to monthly value

How to install

The gimms package (version 0.1.1) is now officially on CRAN and can be installed directly via

## install 'gimms' package

## load 'gimms' package

In order to use the development version (no liability assumed), please refer to the 'develop' branch hosted at GitHub. There, you will also find the latest news and updates concerning the package.

install_github("environmentalinformatics-marburg/gimms", ref = "develop")

List available files

updateInventory imports the latest version of the online file inventory as 'character' vector into R. By setting sort = TRUE, it is at the same time a wrapper around rearrangeFiles as the output vector will be sorted by date rather than in alphabetical order. The latter feature proves particularly useful when considering the GIMMS file naming convention, where e.g. 'geo13jul15a.n19-VI3g' means the first half of July 2013. In case no active internet connection is available, updateInventory automatically imports the latest offline version of the file inventory.

gimms_files <- updateInventory(sort = TRUE)
## Trying to update GIMMS inventory from server...
## Online update of the GIMMS file inventory successful!
## [1] ""
## [2] ""
## [3] ""
## [4] ""
## [5] ""

Download files

The next logical step of the gimms processing chain is to download selected (if not all) bi-monthly datasets. This can be achieved by running downloadGimms which accepts various types of input parameters.

  • 'missing' input → download entire collection
    Specifying no particular input is possibly the most straightforward way of data acquisition. The function will automatically start to download the entire collection of files (currently July 1981 to December 2013) and write the data to dsn.
## download entire gimms collection
downloadGimms(dsn = paste0(getwd(), "/data"))
  • 'numeric' input → download temporal range
    It is also possibly to specify a start year (x) and/or end year (y) to limit the temporal coverage of the datasets to be downloaded. In case x (or y) is missing, data download will automatically start from the first (or finish with the last) year available.
## download gimms data from 1998-2000
downloadGimms(x = 1998, y = 2000, 
              dsn = paste0(getwd(), "/data"))
  • 'character' input → download particular files
    As a third and final possibility to run downloadGimms, it is also possible to supply a 'character' vector consisting of valid online filepaths. The latter can easily be retrieved from updateInventory (as demonstrated above) and directly passed on to the input argument x.
## download manually selected files
downloadGimms(x = gimms_files[769:780], 
              dsn = paste0(getwd(), "/data"))

Rasterize downloaded data

rasterizeGimms transforms the retrieved GIMMS data from native binary into common 'GeoTiff' format and makes the single layers available in R as ordinary 'Raster*' objects. Thereby, it is up to the user to decide whether or not to discard 'mask-water' values (-10,000) and 'mask-nodata' values (-5,000) (see also the official NDVI3g README) and apply the scaling factor (1/10,000). Since rasterizing usually takes some time, we highly recommend to make use of the filename argument that automatically invokes raster::writeRaster.

## list available files
gimms_files <- rearrangeFiles(dsn = paste0(getwd(), "/data"), 
                              pattern = "^geo13", full.names = TRUE)

## rasterize files
gimms_raster <- rasterizeGimms(gimms_files, 
                               filename = paste0(gimms_files, ".tif"))

With a little bit of effort and the help of RColorBrewer and sp, here is what we have created so far.


Figure 1.Global bi-monthly GIMMS NDVI3g images from July to December 2013.

Generate monthly composites

Sometimes, the user is required to calculate monthly value composites from the bi-monthly GIMMS datasets, e.g. to ensure temporal overlap with some other remote sensing product. For that purpose, gimms also features a function called monthlyComposite which works both on vectors of filenames and entire 'RasterStack' objects (ideally returned by rasterizeGimms) and calculates monthly values based on a user-defined function (e.g. fun = max to create monthly MVC layers). The function is heavily based on stackApply from the raster package and the required coding work is quite straightforward.

## 'GeoTiff' files created during the previous step
gimms_files_tif <- sapply(gimms_raster@layers, function(i) attr(i@file, "name"))

## create monthly maximum value composites
gimms_raster_mvc <- monthlyComposite(gimms_files_tif)


Figure 2.Global monthly composite GIMMS NDVI3g images from July to December 2013.

Final remark

A more comprehensive version of this short introduction to the gimms package including a collection of use cases (particularly in conjunction with R's parallel capabilities) can be found online at Any comments on how to improve the package, possible bug-reports etc. are highly appreciated!

Posted in R | 2 Comments

mapview 0.5.0

I have put some more effort into mapview.
The current version 0.5.0 has some new features which make the whole experience much more user-friendly.

In a nutshell, changes/additions are as follows:

  • mapView() is now also defined for SpatialPixelsDataFrame
  • all Spatial * DataFrame methods have gained argument zcol to select specific columns from the attribute table
  • SpatialPointsDataFrame method has gained argument radius to scale the circles according to another variable
  • added viewRGB() to render true-/false-color images of RatserStacks/Bricks (like raster::plotRGB)
  • added viewExtent() to view the extent/bbox of Raster/Spatial objects. This is useful for large objects as only the four corners of the extent need to be reprojected (using raster::projectExtent)
  • defined '+'-method to easily add layers to a leaflet/mapview map or combine existing maps
  • defined class 'mapview' which has two slots
    • @object – a list with all objects that are shown on the map
    • @map – the map
  • Raster* methods have gained argument maxpixels to avoid long rendering times (by default set to 500000 pixels which produces acceptable times on my machine)
  • enhanced leaflet integration so that you can use leaflet maps and add features using mapview (e.g. +) or use mapview to create maps and add functionality provided by leaflet (using e.g. %>%)

As an example, this means that you can now do things like

mapView(meuse.grid, zcol = "soil") + viewExtent(meuse) + meuse

to view all points of meuse plus their extent on top of a raster layer of meuse.grid$soil

All new functionality is highlighted in detail in the demo at

(the .Rmd source of which is now also included as a vignette in the package)

The package repository can be found at

To install the package use


I hope this update will prove useful for some of you.
Don't hesitate to send me feedback and/or suggestions.

Formal bug reports and feature requests should be filed at

Here's an example of viewRGB()



Posted in R | 1 Comment

mapView: basic interactive viewing of spatial data in R

Working with spatial data in R I find myself quite often in the need to quickly visually check whether a certain analysis has produced reasonable results. There are two ways I usually do this. Either I:

  1. (sp)plot the data in R and then toggle back and forth between the static plots (I use RStudio) or
  2. save the data to the disk and then open in QGIS or similar to interactively examine the results.

Both these approaches are semi-optimal. Where option 1. is fine for a quick glance at a coarse patterns, it lacks the possibility to have a closer look into the results via zooming and paning. While option 2. provides the interactivity, the detour via the hard disk is annoying (at best), especially when fine-tuning and checking regularly.

I attended this years useR2015! conference in Aalborg (which was marvelous!) and attended the session on interactive graphics in R where Joe Cheng from RStudio presented the leaflet package. Leaflet is great but its rather geared towards manually setting up maps. What a GIS-like functionality would need is some default behaviour for different objects from the spatial universe.

This got me thinking and sparked my enthusiasm to write some wrapper functions for leaflet to provide at least very basic GIS-like interactive graphing capabilities that are directly accessible within RStudio (or the web browser, if you're not using RStudio). So I sat down and wrote a function called mapView().

Unfortunately, it is not possible to present interactive leaflet content here at wordpress.

Therefore, the full article is published at web presentations space at github.

Here's a little sneak preview though.

Enjoy reading!


Posted in R, visualisation | 1 Comment

remote update, paper & poster

Just a few quick lines today to

  1. announce an update to the remote package (1.0.0 now on CRAN)
    We have uploaded a first major release of remote to CRAN a couple of weeks ago. Most relevant changes are:

    • bottleneck functions deseason() and denoise() now implemented in C++ (user can decide whether to use C++ implementation or original R version via use.cpp argument)
    • remote now has two vignettes highlighting comparison between old and new versions of deseason() and denoise() (see example below)
    • CITATION file has been added with citation to paper in Journal of Statistical Software (use citation(“remote”) to display in R) which brings us to…
  2. … the release of our paper in J STAT SOFT
    finally, our paper on remote has been published in J STAT SOFT. The paper gives a comprehensive overview of the functionality provided by remote. There is also some additional data and code to reproduce all of the analyses and figures from the paper, including an exercise on how to use remote for statistical downscaling of gridded data. In case you want to know more and are at…
  3. … useR! 2015
    come visit us at our poster on Wednesday evening and have a chat over a beer or two. So long, and thanks for all the…

Check for equality between old R-based deseason() and new C++-based deseason(). see for details

Posted in R | Leave a comment

Unsupervised Google Maps image classification

This is a guest post by Florian Detsch


Required packages

First, we need to (install and) load some packages required for data processing and visualization. The below code is mainly based on the Rsenal package, which is a steadily developing, unofficial R library maintained by the Environmental Informatics working group at Philipps-Universität Marburg, Germany. It is hosted on GitHub and features a couple of functions to prepare true-color (satellite) imagery for unsupervised image classification.

# library(devtools)
# install_github("environmentalinformatics-marburg/Rsenal")

lib <- c("Rsenal", "cluster", "rasterVis", "RColorBrewer")
jnk <- sapply(lib, function(x) library(x, character.only = TRUE))

Focal matrices

## 3-by-3 focal matrix (incl. center)
mat_w3by3 <- matrix(c(1, 1, 1, 
                      1, 1, 1, 
                      1, 1, 1), nc = 3)

## 5-by-5 focal matrix (excl. center)
mat_w5by5 <- matrix(c(1, 1, 1, 1, 1, 
                      1, 1, 1, 1, 1, 
                      1, 1, 0, 1, 1, 
                      1, 1, 1, 1, 1, 
                      1, 1, 1, 1, 1), nc = 5)

Google Maps imagery

Rsenal features a built-in dataset (data(gmap_hel)) that shall serve as a basis for our unsupervised classification approach. The image is originated from Google Maps and has been downloaded via dismo::gmap. We previously employed OpenStreetMap::openmap to retrieve BING satellite images of the area. However, massive cloud obscuration prevented any further processing. As seen from the Google Maps image shown below, the land surface is dominated by small to medium-sized shrubs (medium brown) with smaller proportions of tall bushes (dark brown) and bare soil (light brown). Also included are shadows (dark brown to black), which are typically located next to tall vegetation.

data(gmap_hel, package = "Rsenal")

plot of chunk gmap_hel

Additional input layers

To gather further information on the structural properties of the land surface, a number of artificial layers is calculated from the red, green and blue input bands including

  • focal means and standard deviations,
  • a visible vegetation index and
  • a shadow mask.

5-by-5 focal mean per band

gmap_hel_fcmu <- lapply(1:nlayers(gmap_hel), function(i) {
  focal(gmap_hel[[i]], w = mat_w5by5, fun = mean, na.rm = TRUE, pad = TRUE)
gmap_hel_fcmu <- stack(gmap_hel_fcmu)

5-by-5 focal standard deviation per band

gmap_hel_fcsd <- lapply(1:nlayers(gmap_hel), function(i) {
  focal(gmap_hel[[i]], w = mat_w5by5, fun = sd, na.rm = TRUE, pad = TRUE)
gmap_hel_fcsd <- stack(gmap_hel_fcsd)

Visible vegetation index

In addition to focal means and standard deviations, we calculate a so-called visible vegetation index (VVI), thus taking advantage of the spectral properties of vegetation in the visible spectrum of light to distinguish between vegetated and non-vegetated surfaces. The VVI is included in Rsenal (vvi) and mainly originates from the red and green input bands.

gmap_hel_vvi <- vvi(gmap_hel)

plot of chunk vvi_vis

Shadow mask

We finally create a shadow mask to distinguish between shadow and non-shadow pixels during post-classification image processing. The algorithm is based on the YCbCr color space and is applied to each of the 3 visible bands in a slightly modified form. For further details, the reader is kindly referred to the original article by Deb and Suny (2014). Briefly, the algorithm incorporates a transformation step of the initial RGB raster stack to the YCbCr color space followed by an iterative threshold calculation to distinguish between shadow and non-shadow pixels. Both the color space transformation (rgb2YCbCr) and the subsequent shadow mask algorithm (rgbShadowMask) are included in Rsenal. To get rid of noise (i.e. isolated shadow pixels), we additionally apply a 3-by-3 modal value filter after the actual shadow detection algorithm.

## shadow detection
gmap_hel_shw <- rgbShadowMask(gmap_hel)

## modal filter
gmap_hel_shw <- focal(gmap_hel_shw, w = mat_w3by3, 
                      fun = modal, na.rm = TRUE, pad = TRUE)

plot of chunk shadow_mask_vis

Image classification via kmeans()

The unsupervised image classification is finally realized via kmeans clustering following a nice tutorial by Devries, Verbesselt and Dutrieux (2015). We focus on separating the 3 major land-cover types depicted above, namely

  • bare soil (class 'A'),
  • small to medium-sized vegetation (class 'B') and
  • tall vegetation (class 'C').
## assemble relevant raster data
gmap_hel_all <- stack(gmap_hel, gmap_hel_fcmu, gmap_hel_fcsd, gmap_hel_vvi)

## convert to matrix
mat_hel_all <- as.matrix(gmap_hel_all)

## k-means clustering with 3 target groups
kmn_hel_all <- kmeans(mat_hel_all, centers = 3, iter.max = 100, nstart = 10)

After inserting the classification results into a separate raster template (rst_tmp), an additional class for shadow (tagged '0') is created through multiplication with the initially created shadow mask.

## raster template
rst_tmp <- gmap_hel[[1]]

## insert values
rst_tmp[] <- kmn_hel_all$cluster

## apply shadow mask
rst_tmp <- rst_tmp * gmap_hel_shw

The clusters are initialized randomly, and hence, each land-cover type will be assigned a different ID when running the code repeatedly (which renders automated image creation impossible). Consequently, visual inspection of rst_tmp is required to assign classes 'A', 'B' and 'C' to the respective feature IDs of the previously ratified raster. In our case, bare soil is represented by '1', small vegetation by '3', and tall vegetation by '2'. Note that shadow will always be associated with '0' due to multiplication.

rat_tmp <- ratify(rst_tmp)
rat <- rat_tmp@data@attributes[[1]]
rat$Class <- c("S", "C", "A", "B")
rat <- rat[order(rat$Class), , ]
levels(rat_tmp) <- rat


Finally, the classified raster can nicely be visualized using the 'YlGn' color palette from COLORBREWER 2.0 along with levelplot from the rasterVis package.

ylgn <- brewer.pal(3, "YlGn")
names(ylgn) <- c("A", "B", "C")
ylgnbl <- c(ylgn, c("S" = "black"))
levelplot(rat_tmp, col.regions = ylgnbl)

plot of chunk visualization

Posted in R | Leave a comment

remote is the new Reot

If you have used Reot before and tried to install it from CRAN recently, you may have noticed the following message:

Warning in install.packages : package ‘Reot’ is not available (for R version 3.0.2)

This is because the Reot package was abandoned due to a name change suggested by Huug van den Dool, the author of the EOT algorithm and also the reviewer of our JSS paper introducing the package (about which I am very happy). Here's what he said:

I am in slight doubt whether Reot is a good acronym. It looks in that spelling like REOF which often means rotated EOF. In case there is EOF in R, and there must be, how would that be called? Not Reof!?

I totally agree with his reservation about the acronym and therefore decided to rename the package remote which is short for R EMpirical Orthogonal TEleconections. So the name merely gained two letters. I thought it is quite clever as teleconnections imply forcing from some place not in the immediate vicinity of the observed phenomenon and therefore the term remote seems to fit quite well (think television remote control).

Apart from the name change, remote is really an enhanced version of Reot. Not in the sense that computational performance was improved, but in the sense that it is now coded in an object oriented way. This means that R's native plot() function will now work on objects returned by eot(). Furthermore, the package has gained a predict() function to use the identified linear models on new data not seen before by the model. This can be useful in many instances, e.g. to extend spatio-temporal data sets in time. In our paper (to which I will provide a link here as soon as it is published) we show one example using predict() to downscale NDVI images for the region of Mt. Kilimanjaro. Here I quickly want to show how easy it is to use this new functionality (taken from the example 0f ?remote::predict()). I would like to stress that this is solely to demonstrate the code, whether it is a valid application or not is another question.


### not very useful, but highlights the workflow

## train data using eot()
train <- eot(x = pacificSST[[1:10]],
             y = australiaGPCP[[1:10]],
             n = 1)
## Calculating linear model ... 
## Locating 1. EOT ...
## Location: 271.5 11.5 
## Cum. expl. variance (%): 79.82
## predict using identified model
pred <- predict(train,
                newdata = pacificSST[[11:20]],
                n = 1)

## compare results
opar <- par(mfrow = c(1,2))
plot(australiaGPCP[[13]], main = "original", zlim = c(0, 10))
plot(pred[[3]], main = "predicted", zlim = c(0, 10))

plot of chunk unnamed-chunk-1

Another enhancement is that remote has gained classes EotMode for a single mode and EotStack for an object of multiple modes. The latter can be subset()ed and names() can be set and retrieved.

All in all, the new functionalities introduced in remote should make the analyses much more pleasant and easier to code as the data handling functionalities should be more familiar, especially for people who have worked with the raster package before. As we say in our paper, we are hopeful that there will be manifold applications where remote may prove to be a useful tool.

Posted in R | 2 Comments

Introducing Rsenal – magic R functions for things various

Today, I would like to quickly introduce our package “Rsenal”. It is basically a collection of R functions that we use in our daily lives at Environmental Informatics Marburg. The package is hosted at GitHub and can be installed using the install_github() function from package devtools.

This package is the opposite of a 'general purpose' package, yet there are some functions that may be of interest to the wider R user community, for example latticeCombineGrid() and latticeCombineLayer(). The idea behind these functions is to provide tools to easily combine lattice plot objects that are stored in a list. These may, for example, result from iterative processes using lapply().

Below is an example for latticeCombineGrid() using the panel.2dsmoother() example from latticeExtra modified such that we loop over four different standard deviation values in a lapply() loop.


sds <- c(1, 2, 3, 4)

clrs <- colorRampPalette(brewer.pal(9, "RdBu"))

## example taken from 
## looping over 4 different standard deviations - sds
p_list <- lapply(seq(sds), function(i) {
  xyz <- data.frame(x = rnorm(100), y = rnorm(100))
  xyz$z <- with(xyz, x * y + rnorm(100, sd = sds[i]))

  p <- levelplot(z ~ x * y, xyz, panel = panel.2dsmoother,
                 col.regions = clrs(1000), at = seq(-5, 5, 0.1),
                 aspect = 1)

p_final <- latticeCombineGrid(p_list)

plot of chunk unnamed-chunk-1

By default, panels are separated slightly (which I find a little more optically pleasing) and plotting order is from top left to bottom right (for lattice the default is bottom left to top right). Under the hood, Reduce() is used to combine the plot objects stored in a list.

latticeCombineLayer() works in a similar way, but lets you plot the different objects on top of each other.

Another useful function for package developers is bumpVersion() which lets you modify the Version: number and automatically changes the Date: entry in in your package DESCRPTION file. That said, it assumes that you use semantic software versioning. I usually commit all changes I made first and then bumpVersion() in a separate commit.

I hope that some people may find the fucntion collection in Rsenal useful.

Please provide feedback, feature requests, suggestions and bug reports here.

Posted in R | Leave a comment

Reot: Empirical Orthogonal Teleconnections in R

We are happy to introduce Reot, an R package designed for empirical orthogonal teleconnection (EOT) analysis of gridded geo-scientific space-time data based on the method by van den Dool et al. (2000). EOT denotes a regression-based approach to decompose spatio-temporal fields into a set of independent orthogonal patterns. In contrast to the classical approach of Empirical Orthogonal Functions (EOF), which are orthogonal in space and time, EOT analysis produces patterns that are orthogonal in either space or time (the current implementation of Reot provides the latter).

To identify these patterns, the time series of each pixel pp of the predictor domain is regressed against the time series of all pixels pr in the response domain (in case of a single field pr = pp – 1). The pixel with the highest sum of its coefficients of determination is defined as the ’base point’ of the first/leading mode. The time series of this point is the first/leading EOT. The next EOT is calculated on the residuals of the previous step, thus ensuring orthogonality of the modes. This procedure is repeated until a user-specified number of EOTs is identified.

Given that the amount of regressions to be calculated can be extremely large, we implemented the core regression functions of Reot in C++ (via Rcpp) to ensure acceptable computation times. All input and output is based on Raster* classes to ensure seamless integration with existing analysis tools and work-flows. Reot is available from CRAN. The development version is hosted on GitHub.


Example 1: Winter mean 700 mb height over the Northern Hemisphere

As a first example, we replicate one of the examples from van den Dool et al. (2000) (Example 3.d. – Figure 6). A spatio-temporal field of 700 mb geopotential heights of NCEP/NCAR Reanalysis grids Kalnay et al. (1996) is decomposed into its four leading modes exhibiting the prominent patterns of North Atlantic Oscillation (NAO) and Pacific-North American Pattern (PNA) as modes 1 and 2, respectively. The climatologically inclined reader is referred to the respective Section in van den Dool et al. (2000) for a more detailed description of the atmospheric dynamics and processes associated with the identified patterns. Here, we merely want to highlight, that the Reot implementation of the algorithm produces similar results to the original implementation by van den Dool et al. (2000).


### load the necessary data

### calculate the first four EOT modes
modes <- eot(pred = vdendool, resp = NULL, n = 4, 
             reduce.both = FALSE, standardised = FALSE, 
             print.console = TRUE)
## Calculating linear model ... 
## Locating 1. EOT ...
## Location: -35 77.67 
## Cum. expl. variance (%): 21.53 
## Calculating linear model ... 
## Locating 2. EOT ...
## Location: -165 43.15 
## Cum. expl. variance (%): 38.17 
## Calculating linear model ... 
## Locating 3. EOT ...
## Location: 85 67.81 
## Cum. expl. variance (%): 46.08 
## Calculating linear model ... 
## Locating 4. EOT ...
## Location: -25 53.01 
## Cum. expl. variance (%): 53.07
### set appropriate projection
ster <- CRS("+proj=stere +lat_0=90 +lon_0=-45")

xmin <- -180
xmax <- 180
ymin <- 20
ymax <- 90     # Coordinates for bounding box
bb <- cbind(x = c(xmin, xmin, xmax, xmax, xmin), 
            y = c(ymin, ymax, ymax, ymin, ymin))
SP <- SpatialPolygons(list(Polygons(list(Polygon(bb)), 
                      proj4string = CRS(proj4string(coastsCoarse)))

gI <- gIntersects(coastsCoarse, SP, byid = TRUE) 
out <- vector(mode = "list", length = length(which(gI))) 
ii <- 1

for (i in seq(along = gI)) if (gI[i]) {
  out[[ii]] <- gIntersection(coastsCoarse[i, ], SP)
  row.names(out[[ii]]) <- row.names(coastsCoarse)[i]
  ii <- ii + 1

nhem.coasts <-"rbind", out)
nhem.coasts.ster <- spTransform(nhem.coasts, ster) 

lout <- list("sp.lines", nhem.coasts.ster, 
             col = "grey30", grid = TRUE)

### define colors
clrs <- colorRampPalette(rev(brewer.pal(9, "RdBu")))

### re-project modes
mode <- lapply(seq(modes), function(i) {
  projectRaster(modes[[i]]$r.predictor, crs = ster)

### create title for each image
titles <- lapply(seq(mode), function(i) {
  paste("Mode ", i, " : ", "EV = ", 
        round(if (i > 1) {
          modes[[i]]$exp.var * 100 - 
            modes[[i - 1]]$exp.var * 100
          } else {
            modes[[i]]$exp.var * 100
            }, 1), " : ", "BP = ", 
        as.integer(modes[[i]]$loc.eot[, 1]), 
        ", ", as.integer(modes[[i]]$loc.eot[, 2]), 
        sep = "")

### create plots
p <- lapply(seq(mode), function(i) {
  spplot(mode[[i]], sp.layout = lout, 
         main = list(titles[[i]], cex = 0.7),
         col.regions = clrs(1000), at = seq(-1, 1, 0.2),
         par.settings = list(axis.line = list(col = 0)),
         colorkey = list(height = 0.75, width = 1))

### arrange and plot
f <- function(...) grid.arrange(..., heights = 1, ncol = 2), p)

plot of chunk unnamed-chunk-1

Even though the location of the identified base points (BP) is somewhat offset, and hence the explained variance (EV) figures differ slightly compared to van den Dool (2000), it is obvious that the isolated patterns are very similar and represent the same signals. We can only speculate as to why the base point locations differ slightly, but potential reasons may include different version numbers of the reanalysis data, rounding discrepancies between the utilised programming languages (especially when summing up the coefficients of determination) and slight differences in geographic projections.

Example 2: Identifying tropical Pacific SST drivers for Australian precipitation

The processes of precipitation development are complex and not yet understood completely. The physical state of the atmosphere, which determines whether rain occurs or not at any point in space and time, is the result of a multitude of constantly changing factors. Influences range from local to hemispheric boundary conditions in all 4 dimensions (incl. time).

Some areas of the global oceans exhibit low-frequency anomaly signals which can influence precipitation variability world-wide. The most prominent example of a coupled ocean-atmosphere tropical SST variability is ENSO. ENSO has received much attention in the scientific literature since the major 1982 – 83 El Nino. Here we investigate, whether EOT analysis can be used to identify the ENSO signal as a driver for low-frequency Australian precipitation variability over the period 1982 to 2010. The data sets needed for this analysis are included in Reot. In order to reveal low-frequency signals such as ENSO, we need to prepare the raw data fields so that high-frequency variation is eliminated. We achieve this by creating seasonal anomalies using deseson() and by denoise()-ing the data to filter out some of the noise that is present in any spatio-temporal data field.

The first 3 leading modes of SSTs most influential for Australian rainfall variability can be calculated with:


sst.pred <- deseason(pacificSST, cycle.window = 12)
gpcp.resp <- deseason(australiaGPCP, cycle.window = 12)

sst.pred.dns <- denoise(sst.pred, expl.var = 0.9)
## Using the first 19 components (of 348) to reconstruct series...
##  these account for 0.9 of variance in orig. series
gpcp.resp.dns <- denoise(gpcp.resp, expl.var = 0.9)
## Using the first 37 components (of 348) to reconstruct series...
##  these account for 0.9 of variance in orig. series
modes <- eot(pred = sst.pred.dns, resp = gpcp.resp.dns, 
             n = 3, standardised = FALSE, 
             reduce.both = FALSE, print.console = FALSE)

As we can see, especially the principal components filter from denoise() is an important step, as we need only 19 (37) of the original 348 components for the SST (GPCP) data to explain 90 % of the respective inherent field variance.
To get a visual impression, the results for the first leading mode can be plotted using the standard Reot plotting routine plotEot():

plotEot(modes, eot = 1, 
        show.eot.loc = TRUE, 
        arrange = "long")

plot of chunk unnamed-chunk-3

We see that we are indeed able to isolate the ENSO signal as the most important SST driver in the tropical Pacific (EOT 1) for Australian precipitation. This signal is able to explain just above 4 % of the original variation found in rainfall over the analysed period. This may not seem much, but we need to keep in mind that precipitation is influenced by many factors, with local conditions playing a major role. Spatially, mainly the north-eastern part of the response domain is being explained with some locations showing negative correlations of up to 0.4. With regard to mainland Australia, it becomes obvious that the identified ENSO signal is not able to explain any rainfall variation in the inner-continental parts of the land mass. It is mainly the coastal areas that are influenced by the ENSO phenomenon. Note, that our analysis did not take into account any time-lags between the SST anomalies and precipitation. Even though in this particular example lagging does not increase the explanatory power of the SST signal (not shown), it can be expected that in many cases the influence will not manifest instantaneously and that a certain lag time will explain a higher portion of the rainfall variance.

Final remarks

We have just submitted a manuscript to the Journal of Statistical Software that describes Reot in much more detail. I will announce more on Reot once the final version of the paper is out, including another example of spatially downscaling NDVI observations from 8 km resolution to 250 m resolution.

For now, I would like to encourage people to try Reot in many different ways and applications and share their feedback with us. Feature requests and/or bug reports should be made on GitHub.


Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Chelliah M, Ebisuzaki W, Higgins W, Janowiak J, Mo K, Ropelewski C, Wang J, Leetmaa A, Reynolds R, Jenne R, Joseph D (1996). “The NCEP/NCAR 40-year reanalysis project.” Bulletin of the American Meteorological Society, 77(3), 437 – 471.

van den Dool HM, Saha S, Johansson A (2000). “Empirical orthogonal teleconnections.” Journal of Climate, 13(8), 1421 – 1435. URL

Posted in climatology, R, space-time analysis | 5 Comments

Update to metvurst

Here's a quick update to metvurst in response to some issues encountered over the last weeks.

The most important change is that the strip() function now returns the plot object rather than printing it. This means that we can work with it afterwards. Given that all plotting is implemented using lattice/latticeExtra this seems natural and is now behaving accordingly. To highlight one advance of this new behaviour, let's revisit the earlier post, where I introduced the function and modify it slightly so that we end up with two graphs on one page to facilitate comparisons between variables.

First, we need some data (again from Fiji – this time only for 1993 and 1994)


url <- ""

yr <- 1993:1994

fijilst <- lapply(seq(yr), function(i) {
  read.csv(paste(url, yr[i], ".csv", sep = ""), na.strings = c(-9999, 999))

fiji <-"rbind", fijilst)
fiji[fiji == -9999.00] <- NA
fiji[fiji == -9999.0] <- NA
fiji[fiji == 999.0] <- NA

Sys.setenv(TZ = "UTC") # set environment to UTC before conversion
dts <- as.POSIXct(strptime(fiji$Date...UTC.Time, 
                           format = "%d-%b-%Y %H:%M")) + 12 * 60 * 60

year <- substr(as.character(dts), 1, 4)

We are now able to create plot objects (i.e. store the visualisation in an object) rather than print them straight away and use them later…

plot.water.temp <- strip(x = fiji$Water.Temperature, 
                         date = dts,
                         cond = year,
                         arrange = "long",
                         main = "Water Temperature [°C]")

plot.air.temp <- strip(x = fiji$Air.Temperature, 
                         date = dts,
                         cond = year,
                         arrange = "long",
                         main = "Air Temperature [°C]")

Now we can use these two objects and plot them on one page using grid


### define first plotting region (viewport)
vp1 <- viewport(x = 0, y = 1, 
                height = 0.5, width = 1,
                just = c("left", "top"),
                name = "top")

### enter vp1 

### plot a plot - needs to be printed (and newpage set to FALSE)!!!
print(plot.water.temp, newpage = FALSE)

### leave vp1 - up one level (into root vieport)

### define second plot area
vp2 <- viewport(x = 0, y = 0, 
                height = 0.5, width = 1,
                just = c("left", "bottom"),
                name = "bottom")

### enter vp2

### plot another plot
print(plot.air.temp, newpage = FALSE)

### destroy vp2 (as we're finished here)

plot of chunk unnamed-chunk-3

This is rather nice, as it enables direct comparisons between two variables. In this case we see that water temperature does exhibit the same seasonal behaviour as air temperature, whereas the diurnal signal is virtually non-existent… I hope this will spark some imagination for your own usage (e.g. comparisons of two climate station records or the like).

Note, the 3rd year we see here is a result of the conversion of UTC time to local Fiji time (we added 12 hours to UTC, hence end up with the first 12 hours of 1995)

## R version 3.0.1 (2013-05-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## other attached packages:
## [1] RWordPress_0.2-3    reshape_0.8.4       plyr_1.8           
## [4] latticeExtra_0.6-24 lattice_0.20-21     RColorBrewer_1.0-5 
## [7] metvurst_1.0        knitr_1.2          
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.8    RCurl_1.95-4.1
## [5] stringr_0.6.2  tools_3.0.1    XML_3.98-1.1   XMLRPC_0.3-0
Posted in climatology, R, visualisation | 4 Comments

Creating publication quality graphics using R

grid manipulate

As part of a one-day workshop, I have developped an online tutorial on how to create publication quality graphics using R (from an academic point of view).

The tutorial can be found here

As mentioned in the tutorial, feel free to send me any feedback, criticism, general comments or bug reports.



Btw, the entire tutorial was created using Rmarkdown and knitr. The .Rmd file can be found here

Posted in R | 3 Comments