codeRclub | bioCEED R coding club

CAT | graphics

ggplot2 is a very powerful plotting package available in R, but sometimes you just want more: maybe you want to want to make your plots more accessible to colour-blind audiences. Or maybe you just don’t like the included themes. Or maybe you just want more colour in your life like some of the students coming to the R-club and wondering how you can make pink plots.

Before we start, a good tip is to look at the syntax for the built-in themes such as theme_bw, theme_classic or just our good friend theme_grey (just type the name into the R console and press return). Another good tip is to use the help files, they can tell you more about all the possibilities of each theme option – see ?theme.

Some other things that are important to know to understand this guide:

  1. All the highlighted lines in this guide is actually read as one command into R. Still I would recommend you to copy this code into your editor of choice.
  2. You can use names of colours (see color()) rather than the hex codes (for example “#e600e6” – the first two digits indicate how much red is needed in hexidecimal so ff is full intensity red, the next two digits are for blue, and then green) to change the colours in a theme or when you are plotting, I have just used the hex codes here to get some more control over the different shades of pink.

And with that in mind let’s make a pink theme!

#Importing library

#Making a pink theme by modifying theme_bw 
theme_pink <- theme_bw()+                    
#Setting all text to size 20 and colour to purple/pink                                  
theme(text=element_text(size=20, colour="#b300b3"),    
#Changing the colour of the axis text to pink  
axis.text = element_text(colour="#e600e6"),
#..and the axis ticks..        
axis.ticks = element_line(colour="#B404AE"),  
#..and the panel border to purple/pink       
panel.border = element_rect(colour="#B404AE"),  
#Changing the colour of the major..
panel.grid.major = element_line(colour="#ffe6ff"), 
#..and minor grid lines to light pink     
panel.grid.minor = element_line((colour="#ffe6ff")),    
#Setting legend title to size 16..
legend.title = element_text(size=16),   
#..and the legend text to size 14                
legend.text = element_text(size=14),  
#Centering plot title text and making it bold                     
plot.title = element_text(hjust = 0.5, face="bold"),  
#Centering subtitle                  
plot.subtitle = element_text(hjust = 0.5))              

The next step is optional and that is to specify which colours we want to use for plotting.

#Pastel colours for each continent
melting_puppies <-c("#93DFB8","#FFC8BA","#E3AAD6",

And now lets use our new theme to make an actual plot using the dataset from the gapminder package.

#Importing library with dataset

#Subset with data from year 2007
year.2007.df <- subset(gapminder, year == "2007")

#Making plot

#Making a plot with the data from 2007, with gdpPercap on the x-axis..
#..and lifeExp on the y-axis, and using colours to identify them by..
#..continent, and want the point size scaled according to pop
pink.plot <- ggplot(year.2007.df, aes(x=gdpPercap, y=lifeExp, colour=continent, size = pop)) +              
#Telling R that that we want a scatterplot with semi-transparent points
geom_point(alpha = (5/8)) +              
#Telling R to use the pink theme we just made                                                     
theme_pink + 
#Changing the colours of the points into the colours saved in.. 
#..melting_puppies, and that I want the title of the points in legend.. be Continent
scale_colour_manual(values=melting_puppies, name="Continent") + 
#Scaling the size of the points so they fit within the range of..
#..2 to 22 and removing description of point size from legend
scale_size_continuous(range=c(2,22), guide = FALSE) + 
#Changing the labels of the x- and y-axis
labs(x="GDP per capita", y="Life expectancy") +
#Even though you have made a theme you can still edit it:
#Here we are changing the plot margins..
#..and that we want to change the position of the legend so it fits..
#..lower right corner of the plot
theme(plot.margin = unit(c(1,1,0.5,0.5), "cm"), legend.position = c(1, 0), legend.justification = c(1.05, -0.05)) +
#Adding a title and a subtitle
ggtitle(label = "Life expectancy in 2007", subtitle = "(size of circles indicate population size)") + 
#Increase the points size in the legend to 4, and making them opaque
guides(colour = guide_legend(override.aes = list(size=4, alpha = 1)))

#Looking at the plot

#Saving plot

This is what your example plot now should look like:

Congratulation, you now know the basics of how to make your own ggplot2 themes!

· ·

When data are imported into R it generally arrives as a data.frame, a type of object that contains named columns. We often want to access the contents of each column which can be done with the dollar or square-bracket notation.

attach() is used by some R-users to make the columns of a data.frame directly accessible rather than using dollar notation or data arguments.
Unfortunately although attach() saves a little typing and maybe makes the very first R experience a tiny bit easier, there is a large cost which I explore below.

The data.frame beaver1 holds body temperatures of a beaver (it is a found in the datasets package which is installed by default). These are the first few rows.

day time temp activ
346 840 36.33 0
346 850 36.34 0
346 900 36.35 0
346 910 36.42 0
346 920 36.55 0
346 930 36.69 0

If we attach() beaver1, we can access any of the columns of the data.frame directly.

##      0%     25%     50%     75%    100% 
  ## 36.3300 36.7600 36.8700 36.9575 37.5300

Boxplot of body temperature against activity

Simple, what could possibly go wrong.

Let’s attach second beaver dataset which has the same column names as the first.

  ## The following objects are masked from beaver1:
  ##     activ, day, temp, time

That generated a load of warnings that various objects from beaver1 were masked. Now if we run quantile or some other function, we get the data from beaver2

##      0%     25%     50%     75%    100% 
  ## 36.5800 37.1475 37.7350 37.9850 38.3500

If we want to use beaver1 again, we have to detach() beaver2 first.

##      0%     25%     50%     75%    100% 
  ## 36.3300 36.7600 36.8700 36.9575 37.5300

Attaching and detaching data.frames is obviously going to go horribly wrong very quickly unless you are very careful. Always.

One alternative would be to change the column names on one or both of the data.frames so that all column names are unique.

colnames(beaver1) <- paste("beaver1", colnames(beaver1), sep = "_")
colnames(beaver2) <- paste("beaver2", colnames(beaver2), sep = "_")
##      0%     25%     50%     75%    100% 
  ## 36.3300 36.7600 36.8700 36.9575 37.5300
##      0%     25%     50%     75%    100% 
  ## 36.5800 37.1475 37.7350 37.9850 38.3500
rm(beaver1, beaver2)#clean up

Possible, but hardly elegant and this now as much typing as the dollar notation shown below.

Another solution would be to combine the two datasets. We need to add a column identifying which beaver is which.

beavers <- rbind(cbind(id = "beaver1", beaver1), cbind(id = "beaver2", beaver2))
quantile(temp[id == "beaver1"])
##      0%     25%     50%     75%    100% 
  ## 36.3300 36.7600 36.8700 36.9575 37.5300
quantile(temp[id == "beaver2"])
##      0%     25%     50%     75%    100% 
  ## 36.5800 37.1475 37.7350 37.9850 38.3500

Having combined the data.frames, we now need to subset the objects to get the data for each beaver. This isn’t a simple solution (in some cases it will be horribly complicated), but combining the data.frames might be very useful for some analyses (for example if you wanted to run an ANOVA).

attach will also cause problems if there are objects with the same name as one of the columns of the data.frame.

temp <- 7 
  ##   0%  25%  50%  75% 100% 
  ##    7    7    7    7    7

If we attach the data.frames when object temp already exists, we get a warning, if we make temp afterwards, no warning is given and this object masks the column in the data.frame. Obviously, this could cause some nasty bugs. Good luck with them.

It is of course possible to avoid these problems with masking if we are very careful with naming objects. In practice, it is very easy to make mistakes which then cause difficult to interpret bugs.

Even if we manage to avoid errors when using attach() it makes the code difficult to read as it is not obvious which data.frame each object is coming from.

Avoiding attach()

Fortunately there are alternatives to attach().

Referencing the columns in the data.frame

We can reference the columns in the data.frame by using either dollar notation or square bracket notation. Dollar notation is generally neater looking and needs less typing.

##      0%     25%     50%     75%    100% 
  ## 36.3300 36.7600 36.8700 36.9575 37.5300
##      0%     25%     50%     75%    100% 
  ## 36.5800 37.1475 37.7350 37.9850 38.3500
plot(beaver2$time %/% 100 + 24*(beaver2$day - beaver2$day[1]) + (beaver2$time %% 100)/60, 
                            y = beaver2$temp, col = ifelse(beaver2$activ, 2, 1), xlab = "Time")
# %/% is integer division: 5%/%2 = 2
# %% gives the modulus: 5%%2 = 1

Beaver body temperature against time

When using the dollar notation, if there are spaces in the column names (a bad idea), the column name needs to be quoted.

with() and within()

If the command you are using makes many references to a data.frame, it, the code can become rather messy, as in the previous example where the day and time columns are combined to give hours.

In such cases the with() function can be useful. It is equivalent to attaching the data.frame for this block of code only (without problems with masking).

with(beaver1, {
    hours <- time %/% 100 + 24*(day - day[1]) + (time %% 100)/60
    plot(hours, temp, col = ifelse(activ, 2, 1))

with() can make code much easier to read.

within() is useful if we want to modify the contents of a data.frame. Here, I’m using the lubridate package to make the date and time of each measurement.

beavers <- within(beavers, {
  day <- dmy("01-01-1990") + days(x = day-1) + hours(time %/% 100) + minutes(time %% 100)
  rm(time)# remove unwanted column
  ##        id                 day  temp activ
  ## 1 beaver1 1990-12-12 08:40:00 36.33     0
  ## 2 beaver1 1990-12-12 08:50:00 36.34     0
  ## 3 beaver1 1990-12-12 09:00:00 36.35     0
  ## 4 beaver1 1990-12-12 09:10:00 36.42     0
  ## 5 beaver1 1990-12-12 09:20:00 36.55     0
  ## 6 beaver1 1990-12-12 09:30:00 36.69     0

The data argument

Any function that takes a formula argument (think y ~ x) has a data argument that can be given a data.frame.

mod <- gam(temp ~ s(unclass(day)), data  = beavers, subset = id == "beaver1")
#converting the date to seconds as gam doesn't like dates
plot(temp ~ day, data = beavers, subset = id == "beaver1", col = ifelse(activ, 2, 1), xlab = "Hours")
with(beavers, lines(day[id == "beaver1"], fitted(mod), col = 2))

Beaver body temperature against time

This is much better than attaching data as it makes it explicit which data are being used.

Do not be tempted to use the dollar notation in formula

## don't do this

This makes the code difficult to read, especially if there are multiple predictors, and will cause problems when making predictions.


ggplot(), an alternative system for plotting data from the ggplot2 package, wants to have the data in a data.frame. It does not need or want the data to be attached.

ggplot(data =  beavers, mapping = aes(x = day, y = temp)) + 
  geom_point() +
  geom_smooth(method = "gam", formula = y ~ s(x)) +
  facet_wrap(~id, dir = "v", scales = "free_x") + 

Beaver body temperature against time

Does attach() have any uses

attach() is not completely useless: attach() can also attach environments. This allows some nifty tricks for having utility function etc. available without cluttering up your global environment (but it would probably be better to make a package – this is not hard).

· · · ·



Expressions in R

expression() and related functions including bquote() are powerful tools for annotating figures with mathematical notation in R. This functionality is not obvious from their respective help files. demo(plotmath) nicely shows the huge potential of expression(), but does not help that much with getting the code need for many real cases.

I tend to get my expressions to work by trial and lots of errors (although having put this together, I now understand them at least temporarily). I’ve just searched through my code library and extracted and annotated some examples of expression() being used. I hope someone finds it useful.

I’m going to use expression() with title(), but the same expressions can be used with any of the functions (text(), title(), mtext(), legend(), etc) used for putting text on plots.

x11(width=4, height=5, point=14)
par(mar=rep(0,4), cex.main=.8)
plot(1, type="n", axes=FALSE, ann=FALSE)

The simplest use of expression is take a character or string of characters and it will be added to the plot. If the string contains spaces, it must be enclosed in quotes (alternatively, the space can be replaced by a tilde ~, which probably gives better code).

title(line=-1, main=expression(fish))

This use of expression is entirely pointless, but is a useful starting point. Some strings have special meanings, for example infinity will draw the infinity symbol. If for some reason you want to have “infinity” written on your plot, it must be in quotes. Greek letters can be used by giving their name in lower-case or with the first letter capitalised to get the lower or upper case character respectively.

title(line=-2, main=expression(infinity))
title(line=-3, main=expression(pi))
title(line=-4, main=expression(Delta))

Subscript or superscript can be added to a string using ^ and [] notation respectively.

title(line=-5, main=expression(r^2))
title(line=-6, main=expression(beta[1]))

If the string we want to have as sub- or superscript contains a space, the string must be in quotes. Braces can be used to force multiple elements to all be superscript.

Strings can be separated by mathematical operators.

title(line=-7, main=expression(N[high]-N[low]))
title(line=-8, main=expression(N[2]==5))

To make more complicated expressions, build them up from separate parts by either using * or paste to join them together (if you want a multiplication symbol, use %*%). The * notation gives nicer code.

title(line=-9, main=expression(Delta*"R yr"))
title(line=-10, main=expression(paste(Delta,"R yr")))
title(line=-11, main=expression(paste("Two Year Minimum ",O[2])))
#title(line=-11, main=expression(Two~Year~Minimum~O[2]))
title(line=-12, main=expression(paste("Coefficient ", beta[1])))
#title(line=-12, main=expression(Coefficient~beta[1]))
title(line=-13, main=expression(paste("TP ", mu,"g l"^-1)))
#title(line=-13, main=expression(TP~mu*g~l^-1))
title(line=-14, main=expression(paste(delta^18,"O")))
#title(line=-14, main=expression(delta^18*O))
title(line=-15, main=expression(paste("Foram ", exp(H*minute[bc]))))
#title(line=-15, main=expression(Foram~exp(H*minute[bc])))

To start an expression() with a superscript (or subscript), I use an empty string (you can also use phantom()).

title(line=-16, main= expression(""^14*C*" years BP"))
#title(line=-16, main= expression(phantom()^14*C~years~BP))

So far so good. But sometimes, you want to use the value of an R-object in plot annotation.

For example, if we wanted to label a point with its x value, this will not work.

title(line=-17, main= expression(x==x))

Instead of using expression(), we have to use bquote(), with the object we want written out inside .()

title(line=-18, main= bquote(x==.(x)))
title(line=-19, main= bquote(x==.(x)~mu*g~l^-1))
 Plot annotations with expression and bquote

Plot annotations with expression and bquote

If you understand these examples, you should be able to use the remainder of the functionality demonstrated by demo(plotmath) and at ?plotmath.

· · ·

Theme Design by