mapView: basic interactive viewing of spatial data in R


Working with spatial data in R I find myself quite often in the need to quickly visually check whether a certain analysis has produced reasonable results. There are two ways I usually do this. Either I:

  1. (sp)plot the data in R and then toggle back and forth between the static plots (I use RStudio) or
  2. save the data to the disk and then open in QGIS or similar to interactively examine the results.

Both these approaches are semi-optimal. Where option 1. is fine for a quick glance at a coarse patterns, it lacks the possibility to have a closer look into the results via zooming and paning. While option 2. provides the interactivity, the detour via the hard disk is annoying (at best), especially when fine-tuning and checking regularly.

I attended this years useR2015! conference in Aalborg (which was marvelous!) and attended the session on interactive graphics in R where Joe Cheng from RStudio presented the leaflet package. Leaflet is great but its rather geared towards manually setting up maps. What a GIS-like functionality would need is some default behaviour for different objects from the spatial universe.

This got me thinking and sparked my enthusiasm to write some wrapper functions for leaflet to provide at least very basic GIS-like interactive graphing capabilities that are directly accessible within RStudio (or the web browser, if you're not using RStudio). So I sat down and wrote a function called mapView().

Unfortunately, it is not possible to present interactive leaflet content here at wordpress.

Therefore, the full article is published at web presentations space at github.

Here's a little sneak preview though.

Enjoy reading!

Tim


Posted in R, visualisation | 1 Comment

remote update, paper & poster

Just a few quick lines today to

  1. announce an update to the remote package (1.0.0 now on CRAN)
    We have uploaded a first major release of remote to CRAN a couple of weeks ago. Most relevant changes are:

    • bottleneck functions deseason() and denoise() now implemented in C++ (user can decide whether to use C++ implementation or original R version via use.cpp argument)
    • remote now has two vignettes highlighting comparison between old and new versions of deseason() and denoise() (see example below)
    • CITATION file has been added with citation to paper in Journal of Statistical Software (use citation(“remote”) to display in R) which brings us to…
  2. … the release of our paper in J STAT SOFT
    finally, our paper on remote has been published in J STAT SOFT. The paper gives a comprehensive overview of the functionality provided by remote. There is also some additional data and code to reproduce all of the analyses and figures from the paper, including an exercise on how to use remote for statistical downscaling of gridded data. In case you want to know more and are at…
  3. … useR! 2015
    come visit us at our poster on Wednesday evening and have a chat over a beer or two. So long, and thanks for all the…

Check for equality between old R-based deseason() and new C++-based deseason(). see https://github.com/environmentalinformatics-marburg/remote/blob/master/vignettes/deseason.md for details

Posted in R | Leave a comment

Unsupervised Google Maps image classification

This is a guest post by Florian Detsch

Prerequisites

Required packages

First, we need to (install and) load some packages required for data processing and visualization. The below code is mainly based on the Rsenal package, which is a steadily developing, unofficial R library maintained by the Environmental Informatics working group at Philipps-Universität Marburg, Germany. It is hosted on GitHub and features a couple of functions to prepare true-color (satellite) imagery for unsupervised image classification.

# library(devtools)
# install_github("environmentalinformatics-marburg/Rsenal")

lib <- c("Rsenal", "cluster", "rasterVis", "RColorBrewer")
jnk <- sapply(lib, function(x) library(x, character.only = TRUE))

Focal matrices

## 3-by-3 focal matrix (incl. center)
mat_w3by3 <- matrix(c(1, 1, 1, 
                      1, 1, 1, 
                      1, 1, 1), nc = 3)

## 5-by-5 focal matrix (excl. center)
mat_w5by5 <- matrix(c(1, 1, 1, 1, 1, 
                      1, 1, 1, 1, 1, 
                      1, 1, 0, 1, 1, 
                      1, 1, 1, 1, 1, 
                      1, 1, 1, 1, 1), nc = 5)

Google Maps imagery

Rsenal features a built-in dataset (data(gmap_hel)) that shall serve as a basis for our unsupervised classification approach. The image is originated from Google Maps and has been downloaded via dismo::gmap. We previously employed OpenStreetMap::openmap to retrieve BING satellite images of the area. However, massive cloud obscuration prevented any further processing. As seen from the Google Maps image shown below, the land surface is dominated by small to medium-sized shrubs (medium brown) with smaller proportions of tall bushes (dark brown) and bare soil (light brown). Also included are shadows (dark brown to black), which are typically located next to tall vegetation.

data(gmap_hel, package = "Rsenal")
plotRGB(gmap_hel)

plot of chunk gmap_hel

Additional input layers

To gather further information on the structural properties of the land surface, a number of artificial layers is calculated from the red, green and blue input bands including

  • focal means and standard deviations,
  • a visible vegetation index and
  • a shadow mask.

5-by-5 focal mean per band

gmap_hel_fcmu <- lapply(1:nlayers(gmap_hel), function(i) {
  focal(gmap_hel[[i]], w = mat_w5by5, fun = mean, na.rm = TRUE, pad = TRUE)
})
gmap_hel_fcmu <- stack(gmap_hel_fcmu)

5-by-5 focal standard deviation per band

gmap_hel_fcsd <- lapply(1:nlayers(gmap_hel), function(i) {
  focal(gmap_hel[[i]], w = mat_w5by5, fun = sd, na.rm = TRUE, pad = TRUE)
})
gmap_hel_fcsd <- stack(gmap_hel_fcsd)

Visible vegetation index

In addition to focal means and standard deviations, we calculate a so-called visible vegetation index (VVI), thus taking advantage of the spectral properties of vegetation in the visible spectrum of light to distinguish between vegetated and non-vegetated surfaces. The VVI is included in Rsenal (vvi) and mainly originates from the red and green input bands.

gmap_hel_vvi <- vvi(gmap_hel)

plot of chunk vvi_vis

Shadow mask

We finally create a shadow mask to distinguish between shadow and non-shadow pixels during post-classification image processing. The algorithm is based on the YCbCr color space and is applied to each of the 3 visible bands in a slightly modified form. For further details, the reader is kindly referred to the original article by Deb and Suny (2014). Briefly, the algorithm incorporates a transformation step of the initial RGB raster stack to the YCbCr color space followed by an iterative threshold calculation to distinguish between shadow and non-shadow pixels. Both the color space transformation (rgb2YCbCr) and the subsequent shadow mask algorithm (rgbShadowMask) are included in Rsenal. To get rid of noise (i.e. isolated shadow pixels), we additionally apply a 3-by-3 modal value filter after the actual shadow detection algorithm.

## shadow detection
gmap_hel_shw <- rgbShadowMask(gmap_hel)

## modal filter
gmap_hel_shw <- focal(gmap_hel_shw, w = mat_w3by3, 
                      fun = modal, na.rm = TRUE, pad = TRUE)

plot of chunk shadow_mask_vis

Image classification via kmeans()

The unsupervised image classification is finally realized via kmeans clustering following a nice tutorial by Devries, Verbesselt and Dutrieux (2015). We focus on separating the 3 major land-cover types depicted above, namely

  • bare soil (class 'A'),
  • small to medium-sized vegetation (class 'B') and
  • tall vegetation (class 'C').
## assemble relevant raster data
gmap_hel_all <- stack(gmap_hel, gmap_hel_fcmu, gmap_hel_fcsd, gmap_hel_vvi)

## convert to matrix
mat_hel_all <- as.matrix(gmap_hel_all)

## k-means clustering with 3 target groups
kmn_hel_all <- kmeans(mat_hel_all, centers = 3, iter.max = 100, nstart = 10)

After inserting the classification results into a separate raster template (rst_tmp), an additional class for shadow (tagged '0') is created through multiplication with the initially created shadow mask.

## raster template
rst_tmp <- gmap_hel[[1]]

## insert values
rst_tmp[] <- kmn_hel_all$cluster

## apply shadow mask
rst_tmp <- rst_tmp * gmap_hel_shw

The clusters are initialized randomly, and hence, each land-cover type will be assigned a different ID when running the code repeatedly (which renders automated image creation impossible). Consequently, visual inspection of rst_tmp is required to assign classes 'A', 'B' and 'C' to the respective feature IDs of the previously ratified raster. In our case, bare soil is represented by '1', small vegetation by '3', and tall vegetation by '2'. Note that shadow will always be associated with '0' due to multiplication.

rat_tmp <- ratify(rst_tmp)
rat <- rat_tmp@data@attributes[[1]]
rat$Class <- c("S", "C", "A", "B")
rat <- rat[order(rat$Class), , ]
levels(rat_tmp) <- rat

Visualization

Finally, the classified raster can nicely be visualized using the 'YlGn' color palette from COLORBREWER 2.0 along with levelplot from the rasterVis package.

ylgn <- brewer.pal(3, "YlGn")
names(ylgn) <- c("A", "B", "C")
ylgnbl <- c(ylgn, c("S" = "black"))
levelplot(rat_tmp, col.regions = ylgnbl)

plot of chunk visualization

Posted in R | Leave a comment

remote is the new Reot

If you have used Reot before and tried to install it from CRAN recently, you may have noticed the following message:

Warning in install.packages : package ‘Reot’ is not available (for R version 3.0.2)

This is because the Reot package was abandoned due to a name change suggested by Huug van den Dool, the author of the EOT algorithm and also the reviewer of our JSS paper introducing the package (about which I am very happy). Here's what he said:

I am in slight doubt whether Reot is a good acronym. It looks in that spelling like REOF which often means rotated EOF. In case there is EOF in R, and there must be, how would that be called? Not Reof!?

I totally agree with his reservation about the acronym and therefore decided to rename the package remote which is short for R EMpirical Orthogonal TEleconections. So the name merely gained two letters. I thought it is quite clever as teleconnections imply forcing from some place not in the immediate vicinity of the observed phenomenon and therefore the term remote seems to fit quite well (think television remote control).

Apart from the name change, remote is really an enhanced version of Reot. Not in the sense that computational performance was improved, but in the sense that it is now coded in an object oriented way. This means that R's native plot() function will now work on objects returned by eot(). Furthermore, the package has gained a predict() function to use the identified linear models on new data not seen before by the model. This can be useful in many instances, e.g. to extend spatio-temporal data sets in time. In our paper (to which I will provide a link here as soon as it is published) we show one example using predict() to downscale NDVI images for the region of Mt. Kilimanjaro. Here I quickly want to show how easy it is to use this new functionality (taken from the example 0f ?remote::predict()). I would like to stress that this is solely to demonstrate the code, whether it is a valid application or not is another question.

library('remote')

### not very useful, but highlights the workflow
data(pacificSST)
data(australiaGPCP)

## train data using eot()
train <- eot(x = pacificSST[[1:10]],
             y = australiaGPCP[[1:10]],
             n = 1)
## 
## Calculating linear model ... 
## Locating 1. EOT ...
## Location: 271.5 11.5 
## Cum. expl. variance (%): 79.82
## predict using identified model
pred <- predict(train,
                newdata = pacificSST[[11:20]],
                n = 1)

## compare results
opar <- par(mfrow = c(1,2))
plot(australiaGPCP[[13]], main = "original", zlim = c(0, 10))
plot(pred[[3]], main = "predicted", zlim = c(0, 10))

plot of chunk unnamed-chunk-1

Another enhancement is that remote has gained classes EotMode for a single mode and EotStack for an object of multiple modes. The latter can be subset()ed and names() can be set and retrieved.

All in all, the new functionalities introduced in remote should make the analyses much more pleasant and easier to code as the data handling functionalities should be more familiar, especially for people who have worked with the raster package before. As we say in our paper, we are hopeful that there will be manifold applications where remote may prove to be a useful tool.

Posted in R | 2 Comments

Introducing Rsenal – magic R functions for things various

Today, I would like to quickly introduce our package “Rsenal”. It is basically a collection of R functions that we use in our daily lives at Environmental Informatics Marburg. The package is hosted at GitHub and can be installed using the install_github() function from package devtools.

This package is the opposite of a 'general purpose' package, yet there are some functions that may be of interest to the wider R user community, for example latticeCombineGrid() and latticeCombineLayer(). The idea behind these functions is to provide tools to easily combine lattice plot objects that are stored in a list. These may, for example, result from iterative processes using lapply().

Below is an example for latticeCombineGrid() using the panel.2dsmoother() example from latticeExtra modified such that we loop over four different standard deviation values in a lapply() loop.

library(Rsenal)
library(latticeExtra)

sds <- c(1, 2, 3, 4)

clrs <- colorRampPalette(brewer.pal(9, "RdBu"))

## example taken from 
## http://latticeextra.r-forge.r-project.org/#panel.2dsmoother&theme=default
## looping over 4 different standard deviations - sds
p_list <- lapply(seq(sds), function(i) {
  set.seed(1)
  xyz <- data.frame(x = rnorm(100), y = rnorm(100))
  xyz$z <- with(xyz, x * y + rnorm(100, sd = sds[i]))

  p <- levelplot(z ~ x * y, xyz, panel = panel.2dsmoother,
                 col.regions = clrs(1000), at = seq(-5, 5, 0.1),
                 aspect = 1)
  return(p)
  })

p_final <- latticeCombineGrid(p_list)
print(p_final)

plot of chunk unnamed-chunk-1

By default, panels are separated slightly (which I find a little more optically pleasing) and plotting order is from top left to bottom right (for lattice the default is bottom left to top right). Under the hood, Reduce() is used to combine the plot objects stored in a list.

latticeCombineLayer() works in a similar way, but lets you plot the different objects on top of each other.

Another useful function for package developers is bumpVersion() which lets you modify the Version: number and automatically changes the Date: entry in in your package DESCRPTION file. That said, it assumes that you use semantic software versioning. I usually commit all changes I made first and then bumpVersion() in a separate commit.

I hope that some people may find the fucntion collection in Rsenal useful.

Please provide feedback, feature requests, suggestions and bug reports here.

Posted in R | Leave a comment

Reot: Empirical Orthogonal Teleconnections in R

We are happy to introduce Reot, an R package designed for empirical orthogonal teleconnection (EOT) analysis of gridded geo-scientific space-time data based on the method by van den Dool et al. (2000). EOT denotes a regression-based approach to decompose spatio-temporal fields into a set of independent orthogonal patterns. In contrast to the classical approach of Empirical Orthogonal Functions (EOF), which are orthogonal in space and time, EOT analysis produces patterns that are orthogonal in either space or time (the current implementation of Reot provides the latter).

To identify these patterns, the time series of each pixel pp of the predictor domain is regressed against the time series of all pixels pr in the response domain (in case of a single field pr = pp – 1). The pixel with the highest sum of its coefficients of determination is defined as the ’base point’ of the first/leading mode. The time series of this point is the first/leading EOT. The next EOT is calculated on the residuals of the previous step, thus ensuring orthogonality of the modes. This procedure is repeated until a user-specified number of EOTs is identified.

Given that the amount of regressions to be calculated can be extremely large, we implemented the core regression functions of Reot in C++ (via Rcpp) to ensure acceptable computation times. All input and output is based on Raster* classes to ensure seamless integration with existing analysis tools and work-flows. Reot is available from CRAN. The development version is hosted on GitHub.

Examples

Example 1: Winter mean 700 mb height over the Northern Hemisphere

As a first example, we replicate one of the examples from van den Dool et al. (2000) (Example 3.d. – Figure 6). A spatio-temporal field of 700 mb geopotential heights of NCEP/NCAR Reanalysis grids Kalnay et al. (1996) is decomposed into its four leading modes exhibiting the prominent patterns of North Atlantic Oscillation (NAO) and Pacific-North American Pattern (PNA) as modes 1 and 2, respectively. The climatologically inclined reader is referred to the respective Section in van den Dool et al. (2000) for a more detailed description of the atmospheric dynamics and processes associated with the identified patterns. Here, we merely want to highlight, that the Reot implementation of the algorithm produces similar results to the original implementation by van den Dool et al. (2000).

library("rworldmap")
library("rgdal")
library("rgeos")
library("latticeExtra")
library("gridExtra")
library("Reot")

### load the necessary data
data("vdendool")
data("coastsCoarse")

### calculate the first four EOT modes
modes <- eot(pred = vdendool, resp = NULL, n = 4, 
             reduce.both = FALSE, standardised = FALSE, 
             print.console = TRUE)
## 
## Calculating linear model ... 
## Locating 1. EOT ...
## Location: -35 77.67 
## Cum. expl. variance (%): 21.53 
## 
## Calculating linear model ... 
## Locating 2. EOT ...
## Location: -165 43.15 
## Cum. expl. variance (%): 38.17 
## 
## Calculating linear model ... 
## Locating 3. EOT ...
## Location: 85 67.81 
## Cum. expl. variance (%): 46.08 
## 
## Calculating linear model ... 
## Locating 4. EOT ...
## Location: -25 53.01 
## Cum. expl. variance (%): 53.07
### set appropriate projection
ster <- CRS("+proj=stere +lat_0=90 +lon_0=-45")

xmin <- -180
xmax <- 180
ymin <- 20
ymax <- 90     # Coordinates for bounding box
bb <- cbind(x = c(xmin, xmin, xmax, xmax, xmin), 
            y = c(ymin, ymax, ymax, ymin, ymin))
SP <- SpatialPolygons(list(Polygons(list(Polygon(bb)), 
                                    "1")), 
                      proj4string = CRS(proj4string(coastsCoarse)))

gI <- gIntersects(coastsCoarse, SP, byid = TRUE) 
out <- vector(mode = "list", length = length(which(gI))) 
ii <- 1

for (i in seq(along = gI)) if (gI[i]) {
  out[[ii]] <- gIntersection(coastsCoarse[i, ], SP)
  row.names(out[[ii]]) <- row.names(coastsCoarse)[i]
  ii <- ii + 1
  }

nhem.coasts <- do.call("rbind", out)
nhem.coasts.ster <- spTransform(nhem.coasts, ster) 

lout <- list("sp.lines", nhem.coasts.ster, 
             col = "grey30", grid = TRUE)

### define colors
clrs <- colorRampPalette(rev(brewer.pal(9, "RdBu")))

### re-project modes
mode <- lapply(seq(modes), function(i) {
  projectRaster(modes[[i]]$r.predictor, crs = ster)
  })

### create title for each image
titles <- lapply(seq(mode), function(i) {
  paste("Mode ", i, " : ", "EV = ", 
        round(if (i > 1) {
          modes[[i]]$exp.var * 100 - 
            modes[[i - 1]]$exp.var * 100
          } else {
            modes[[i]]$exp.var * 100
            }, 1), " : ", "BP = ", 
        as.integer(modes[[i]]$loc.eot[, 1]), 
        ", ", as.integer(modes[[i]]$loc.eot[, 2]), 
        sep = "")
  })

### create plots
p <- lapply(seq(mode), function(i) {
  spplot(mode[[i]], sp.layout = lout, 
         main = list(titles[[i]], cex = 0.7),
         col.regions = clrs(1000), at = seq(-1, 1, 0.2),
         par.settings = list(axis.line = list(col = 0)),
         colorkey = list(height = 0.75, width = 1))
  })

### arrange and plot
f <- function(...) grid.arrange(..., heights = 1, ncol = 2)
do.call(f, p)

plot of chunk unnamed-chunk-1

Even though the location of the identified base points (BP) is somewhat offset, and hence the explained variance (EV) figures differ slightly compared to van den Dool (2000), it is obvious that the isolated patterns are very similar and represent the same signals. We can only speculate as to why the base point locations differ slightly, but potential reasons may include different version numbers of the reanalysis data, rounding discrepancies between the utilised programming languages (especially when summing up the coefficients of determination) and slight differences in geographic projections.

Example 2: Identifying tropical Pacific SST drivers for Australian precipitation

The processes of precipitation development are complex and not yet understood completely. The physical state of the atmosphere, which determines whether rain occurs or not at any point in space and time, is the result of a multitude of constantly changing factors. Influences range from local to hemispheric boundary conditions in all 4 dimensions (incl. time).

Some areas of the global oceans exhibit low-frequency anomaly signals which can influence precipitation variability world-wide. The most prominent example of a coupled ocean-atmosphere tropical SST variability is ENSO. ENSO has received much attention in the scientific literature since the major 1982 – 83 El Nino. Here we investigate, whether EOT analysis can be used to identify the ENSO signal as a driver for low-frequency Australian precipitation variability over the period 1982 to 2010. The data sets needed for this analysis are included in Reot. In order to reveal low-frequency signals such as ENSO, we need to prepare the raw data fields so that high-frequency variation is eliminated. We achieve this by creating seasonal anomalies using deseson() and by denoise()-ing the data to filter out some of the noise that is present in any spatio-temporal data field.

The first 3 leading modes of SSTs most influential for Australian rainfall variability can be calculated with:

data("australiaGPCP")
data("pacificSST")

sst.pred <- deseason(pacificSST, cycle.window = 12)
gpcp.resp <- deseason(australiaGPCP, cycle.window = 12)

sst.pred.dns <- denoise(sst.pred, expl.var = 0.9)
## 
## Using the first 19 components (of 348) to reconstruct series...
##  these account for 0.9 of variance in orig. series
gpcp.resp.dns <- denoise(gpcp.resp, expl.var = 0.9)
## 
## Using the first 37 components (of 348) to reconstruct series...
##  these account for 0.9 of variance in orig. series
modes <- eot(pred = sst.pred.dns, resp = gpcp.resp.dns, 
             n = 3, standardised = FALSE, 
             reduce.both = FALSE, print.console = FALSE)

As we can see, especially the principal components filter from denoise() is an important step, as we need only 19 (37) of the original 348 components for the SST (GPCP) data to explain 90 % of the respective inherent field variance.
To get a visual impression, the results for the first leading mode can be plotted using the standard Reot plotting routine plotEot():

plotEot(modes, eot = 1, 
        show.eot.loc = TRUE, 
        arrange = "long")

plot of chunk unnamed-chunk-3

We see that we are indeed able to isolate the ENSO signal as the most important SST driver in the tropical Pacific (EOT 1) for Australian precipitation. This signal is able to explain just above 4 % of the original variation found in rainfall over the analysed period. This may not seem much, but we need to keep in mind that precipitation is influenced by many factors, with local conditions playing a major role. Spatially, mainly the north-eastern part of the response domain is being explained with some locations showing negative correlations of up to 0.4. With regard to mainland Australia, it becomes obvious that the identified ENSO signal is not able to explain any rainfall variation in the inner-continental parts of the land mass. It is mainly the coastal areas that are influenced by the ENSO phenomenon. Note, that our analysis did not take into account any time-lags between the SST anomalies and precipitation. Even though in this particular example lagging does not increase the explanatory power of the SST signal (not shown), it can be expected that in many cases the influence will not manifest instantaneously and that a certain lag time will explain a higher portion of the rainfall variance.

Final remarks

We have just submitted a manuscript to the Journal of Statistical Software that describes Reot in much more detail. I will announce more on Reot once the final version of the paper is out, including another example of spatially downscaling NDVI observations from 8 km resolution to 250 m resolution.

For now, I would like to encourage people to try Reot in many different ways and applications and share their feedback with us. Feature requests and/or bug reports should be made on GitHub.

References

Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Chelliah M, Ebisuzaki W, Higgins W, Janowiak J, Mo K, Ropelewski C, Wang J, Leetmaa A, Reynolds R, Jenne R, Joseph D (1996). “The NCEP/NCAR 40-year reanalysis project.” Bulletin of the American Meteorological Society, 77(3), 437 – 471.

van den Dool HM, Saha S, Johansson A (2000). “Empirical orthogonal teleconnections.” Journal of Climate, 13(8), 1421 – 1435. URL http://journals.ametsoc.org/doi/abs/10.1175/1520-0442(2000)013%3C1421%3AEOT%3E2.0.CO%3B2.

Posted in climatology, R, space-time analysis | 5 Comments

Update to metvurst

Here's a quick update to metvurst in response to some issues encountered over the last weeks.

The most important change is that the strip() function now returns the plot object rather than printing it. This means that we can work with it afterwards. Given that all plotting is implemented using lattice/latticeExtra this seems natural and is now behaving accordingly. To highlight one advance of this new behaviour, let's revisit the earlier post, where I introduced the function and modify it slightly so that we end up with two graphs on one page to facilitate comparisons between variables.

First, we need some data (again from Fiji – this time only for 1993 and 1994)

## LOAD METVURST PACKAGE
library(metvurst)

## SET URL FOR DATA DOWNLOAD
url <- "http://www.bom.gov.au/ntc/IDO70004/IDO70004_"

## YEARS TO BE DOWNLOADED
yr <- 1993:1994

## READ DATA FOR ALL YEARS FROM URL INTO LIST
fijilst <- lapply(seq(yr), function(i) {
  read.csv(paste(url, yr[i], ".csv", sep = ""), na.strings = c(-9999, 999))
})

## TURN LIST INTO COMPLETE DATAFRAME AND CONVERT NA STRINGS TO NAs
fiji <- do.call("rbind", fijilst)
fiji[fiji == -9999.00] <- NA
fiji[fiji == -9999.0] <- NA
fiji[fiji == 999.0] <- NA

## CREATE POSIX DATETIME AND CONVERT UTC TO LOCAL FIJI TIME
Sys.setenv(TZ = "UTC") # set environment to UTC before conversion
dts <- as.POSIXct(strptime(fiji$Date...UTC.Time, 
                           format = "%d-%b-%Y %H:%M")) + 12 * 60 * 60

## CREATE CONDITIONING VARIABLE (IN THIS CASE YEAR)
year <- substr(as.character(dts), 1, 4)

We are now able to create plot objects (i.e. store the visualisation in an object) rather than print them straight away and use them later…

## CREATE STRIP FOR WATER TEMPERATURE
plot.water.temp <- strip(x = fiji$Water.Temperature, 
                         date = dts,
                         cond = year,
                         arrange = "long",
                         main = "Water Temperature [°C]")

## CREATE STRIP FOR AIR TEMPERATURE
plot.air.temp <- strip(x = fiji$Air.Temperature, 
                         date = dts,
                         cond = year,
                         arrange = "long",
                         main = "Air Temperature [°C]")

Now we can use these two objects and plot them on one page using grid

grid.newpage()

### define first plotting region (viewport)
vp1 <- viewport(x = 0, y = 1, 
                height = 0.5, width = 1,
                just = c("left", "top"),
                name = "top")

### enter vp1 
pushViewport(vp1)

### plot a plot - needs to be printed (and newpage set to FALSE)!!!
print(plot.water.temp, newpage = FALSE)

### leave vp1 - up one level (into root vieport)
upViewport(1)

### define second plot area
vp2 <- viewport(x = 0, y = 0, 
                height = 0.5, width = 1,
                just = c("left", "bottom"),
                name = "bottom")

### enter vp2
pushViewport(vp2)

### plot another plot
print(plot.air.temp, newpage = FALSE)

### destroy vp2 (as we're finished here)
popViewport()

plot of chunk unnamed-chunk-3

This is rather nice, as it enables direct comparisons between two variables. In this case we see that water temperature does exhibit the same seasonal behaviour as air temperature, whereas the diurnal signal is virtually non-existent… I hope this will spark some imagination for your own usage (e.g. comparisons of two climate station records or the like).

Note, the 3rd year we see here is a result of the conversion of UTC time to local Fiji time (we added 12 hours to UTC, hence end up with the first 12 hours of 1995)

sessionInfo()
## R version 3.0.1 (2013-05-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] RWordPress_0.2-3    reshape_0.8.4       plyr_1.8           
## [4] latticeExtra_0.6-24 lattice_0.20-21     RColorBrewer_1.0-5 
## [7] metvurst_1.0        knitr_1.2          
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.8    RCurl_1.95-4.1
## [5] stringr_0.6.2  tools_3.0.1    XML_3.98-1.1   XMLRPC_0.3-0
Posted in climatology, R, visualisation | 4 Comments

Creating publication quality graphics using R

grid manipulate

As part of a one-day workshop, I have developped an online tutorial on how to create publication quality graphics using R (from an academic point of view).

The tutorial can be found here

http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/

As mentioned in the tutorial, feel free to send me any feedback, criticism, general comments or bug reports.

Enjoy,

Tim

Btw, the entire tutorial was created using Rmarkdown and knitr. The .Rmd file can be found here

https://github.com/tim-salabim/metvurst/blob/master/markdown/20130617_data_vis_workshop.Rmd

Posted in R | 3 Comments

metvurst now a package, repository moved to GitHub

Inspired by a post on PirateGrunt, I finally managed to pack metvurst up and turn it into a proper R-Package (the fact that I’m on holiday and have some time also helped). As a side-effect of this, the repository has been moved from google code to GitHub. As I use RStudio for developping R-code, this shift seemed inevitable, as the integration of git in the package development tools in RStudio is very handy.

In order to install metvurst you need to have devtools. Making use of devtools install_github() function you can easily install and load metvurst:

library(devtools)
install_github('metvurst', 'tim-salabim')
library(metvurst)

I have tried it on Linux and Mac so far, so in case there are any problems on Windows, please let me know (a quick note if indeed it does work on Windows would be appreciated too).

For now, the two core functions strip() and windContours() along with some helper functions (mainly to convert wind speed and direction to u and v components and vice versa) are included.

The package is fully functional but there is no documentation for now. I will progressively add and update documentation manuals over the next few weeks (maybe months, depending on how busy I am once I return to work).

Have fun using metvurst and in case you have any questions, suggestions or critique don’t hesitate to get in touch.

Cheers

TimSalabim

Posted in Uncategorized | 2 Comments

resizing plot panels to fit data distribution

I am a big fan of lattice/latticeExtra. In fact, nearly all visualisations I have produced so far make use of this great package. The possibilities for customisation are endless and the amount of flexibility it provides is especially valuable for producing visualisations in batch mode/programatically.

Today I needed to visualise some precipitation data for a poster presentation of climate observations at Mt. Kilimanjaro. I wanted to show monthly precipitation observations in relation to long term mean monthly precipitation in order to show which months have been particularly wet or dry.
The important point here is that by combining two different visualisations of the same data, we need to make sure that we make these directly comparable. This means that the scales of the absolute rain amounts and the deviations need to be similar, so we can get an instant impression of the deviation in relation to the absolute amounts.

Here's what I've done with latticeExtra (using mock data):

First, we need some (semi-) random data.

## LOAD PACKAGE
library(latticeExtra, quietly = TRUE)

## CREATE MOCK DATA
# precipitation long term mean
pltmean <- 800
# precipitation long term standard deviation
pltsd <- 200
# precipitation observations
pobs <- rnorm(12, pltmean, pltsd)
# preceipitation deviation from long term mean
pdev <- rnorm(12, 0, 150)
# months
dates <- 1:12

Then we calculate the panel heights to be relative to the (precipitation) data distribution. This is crucial because we want the deviation data to be directly comparable to the observed values.

## CALCULATE RELATIVE PANEL HEIGHTS
y.abs <- max(abs(pobs))
y.dev <- range(pdev)[2] - range(pdev)[1]
yy.aspect <- y.dev/y.abs

Then, we create the bar charts as objects.

## COLOUR
clrs <- rev(brewer.pal(3, "RdBu"))

## CREATE THE PLOT OBJECTS
abs <- barchart(pobs ~ dates, horizontal = FALSE, strip = FALSE, origin = 0,
                between = list(y = 0.3),
                ylab = "Precipitation [mm]", xlab = "Months", col = clrs[1])

dev <- barchart(pdev ~ dates, horizontal = FALSE, origin = 0, 
                col = ifelse(pdev > 0, clrs[1], clrs[length(clrs)]))

Now, we combine the two plot objects into one and also create strips to be plotted at the top of each panel with labels providing some detail about the respective panel.

## COMBINE PLOT OBJECTS INTO ONE AND CREATE CUSTOM STRIPS FOR LABELLING
out <- c(abs, dev, x.same = TRUE, y.same = FALSE, layout = c(1,2))
out <- update(out, scales = list(y = list(rot = 0)), 
              strip = strip.custom(bg = "grey40", 
                                   par.strip.text = list(col = "white", 
                                                         font = 2),
                                   strip.names = FALSE, strip.levels = TRUE, 
                                   factor.levels = c("observed", 
                                                     "deviation from long term monthly mean")))

As a final step, we re-size the panels according to the panel heights calculated earlier.

## RESIZE PANELS RELATIVE TO DATA DISTRIBUTION
out <- resizePanels(out, h = c(1,yy.aspect), w = 1)

And this is what the final product looks like.

## PRINT PLOT
print(out)

plot of chunk unnamed-chunk-6

Note: I suggest you rerun this example a few times to see how the relative panel sizes change with the data distribution (which is randomly created during each run). This highlights the usefulness of such an approach for batch visualisations.

sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] gridBase_0.4-6      abind_1.4-0         fields_6.7         
##  [4] spam_0.29-2         reshape_0.8.4       plyr_1.8           
##  [7] latticeExtra_0.6-19 lattice_0.20-13     RColorBrewer_1.0-5 
## [10] RWordPress_0.2-3    rgdal_0.8-5         raster_2.0-41      
## [13] sp_1.0-5            knitr_1.1          
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    markdown_0.5.4
## [5] RCurl_1.95-3   stringr_0.6.2  tools_2.15.3   XML_3.95-0    
## [9] XMLRPC_0.2-5
Posted in climatology, R, visualisation | 4 Comments