codeRclub | bioCEED R coding club

Archive for May 2015

May/15

21

Progress bars and for loops.

Today somebody asked me about building a progress bar into a for loop. This can be really useful if you are running lots of bootstrapping or Monte Carlo simulations, and you want some peace of mind so that you know that loop is still running as the computer chugs away in the background. It’s good to know whether it’s worth hanging around for the code finish, or better to go climbing/ skiing for the weekend/ or whether there is just time for a cup of tea.

I’ve written a dummy script here to show how this can be done. Fortunately the built in R.utils package contains a function for incorporating a progress bar on the R console using the function txtProgressBar(). I thought it would be fun for the for loop to actually do something, so I made it write out a message (stored in the cdRclb object) in an empty plot window. Here is the script:

# Write a message to plot and make an empty plot window
cdRclb  <-c( "c","o","d", "e", "R", "c", "l", "u", "b")
plot(x= 1:9, y = 1:9, type="n", axes=FALSE, frame.plot = TRUE, ann=FALSE)

So that the progress bar takes at least some time to finish, I decided to run the for loop for 900000 iterations (100000 x the length() of the object cdRclb).

# Set the number of iterations for the loop
nIts <- length(cdRclb) *100000

Then make the progress bar and write the loop. You write the txtProgressBar() function first, outside of the loop, and everything within the {} is repeated for nIts, the number of iterations. Here, for each iteration we find a new value, k, the iteration number in the loop divided by 100000.

Still within the for loop, we use an if statement to decide whether we should write-out out some of the message stored in cdRclb. When k is a round number, we write an item of cdRclb using text(). At the end of the iteration we update the progress bar using setTxtProgressBar() and the loop starts over. Check what happens in the plot window as the loop progresses, and also check the R console. Fun and extremely satisfying!

# create the progress bar
progBar <- txtProgressBar(min = 0, max = nIts, style = 3)

# Start the for loop.
for(i in 1:nIts){
	
	# Find k. NB, lots of other functions/commands could go here.
	k <- i/100000 
	
	# Use an if command to decide whether to plot part of the message. Only this when k is a whole number
	if(floor(k)==k) text(x = k ,y = 5, labels = cdRclb[k],  col= k+1, cex = 4)
   	
   	# Update the progress bar
   	setTxtProgressBar(progBar, i)
}

# Close the progress bar
close(progBar)

· · ·

In Friday’s codeRclub, we had a problem which involved finding the row and column names for items in a matrix greater than a specified value (e.g. finding the names of the pairs of samples in a  correlation matrix with correlation coefficient greater than 0.5). The problem is that using standard sub-setting methods you are able to find the locations/ values of the cells within the matrix, but not the row or column names. We solved the problem using an argument in the  which command in R. We wrote a function to do this, returning the row and column names and the correlation coefficients in a data.frame.

First simulate a correlation matrix and set our correlation cut-off value:

x <-  matrix(c(1,.8,.2,  .8,1,.7,  .2,.7,1),nrow=3, dimnames = list(c("a", "b", "c"), c("a", "b", "c")))  # Simulate the 3x3 matric and give the matrix row and column names of the samples

Then make a function, which.names.matrix, to return the row and column names of interest. x is a correlation matrix, cutVal is your correlation cut-off value.

which.names.matrix <- function(x, cutVal = 0.5){

x[lower.tri(x)] <- NA # Because it's a correlation matrix, we are only interested in one half of it, so set the lower triangle to NA.

diag(x) <- NA # Set the diagonals to NA

locs <- which(x>cutVal,arr.ind=TRUE) # Find the locations of the cells in the matrix > than cutVal

scores <- na.omit(x[x>cutVal]) # Get the scores of the cells > cutVal

data.frame(row = rownames(x)[locs[,1]], col = colnames(x)[locs[,2]], value = scores) # Return the data.frame with the row and column names, plus the scores

}
which.names.matrix(x)

· · ·

May/15

7

Expressions in R

expression() and related functions including bquote() are powerful tools for annotating figures with mathematical notation in R. This functionality is not obvious from their respective help files. demo(plotmath) nicely shows the huge potential of expression(), but does not help that much with getting the code need for many real cases.

I tend to get my expressions to work by trial and lots of errors (although having put this together, I now understand them at least temporarily). I’ve just searched through my code library and extracted and annotated some examples of expression() being used. I hope someone finds it useful.

I’m going to use expression() with title(), but the same expressions can be used with any of the functions (text(), title(), mtext(), legend(), etc) used for putting text on plots.

x11(width=4, height=5, point=14)
par(mar=rep(0,4), cex.main=.8)
plot(1, type="n", axes=FALSE, ann=FALSE)

The simplest use of expression is take a character or string of characters and it will be added to the plot. If the string contains spaces, it must be enclosed in quotes (alternatively, the space can be replaced by a tilde ~, which probably gives better code).

title(line=-1, main=expression(fish))

This use of expression is entirely pointless, but is a useful starting point. Some strings have special meanings, for example infinity will draw the infinity symbol. If for some reason you want to have “infinity” written on your plot, it must be in quotes. Greek letters can be used by giving their name in lower-case or with the first letter capitalised to get the lower or upper case character respectively.

title(line=-2, main=expression(infinity))
title(line=-3, main=expression(pi))
title(line=-4, main=expression(Delta))

Subscript or superscript can be added to a string using ^ and [] notation respectively.

title(line=-5, main=expression(r^2))
title(line=-6, main=expression(beta[1]))

If the string we want to have as sub- or superscript contains a space, the string must be in quotes. Braces can be used to force multiple elements to all be superscript.

Strings can be separated by mathematical operators.

title(line=-7, main=expression(N[high]-N[low]))
title(line=-8, main=expression(N[2]==5))

To make more complicated expressions, build them up from separate parts by either using * or paste to join them together (if you want a multiplication symbol, use %*%). The * notation gives nicer code.

title(line=-9, main=expression(Delta*"R yr"))
title(line=-10, main=expression(paste(Delta,"R yr")))
title(line=-11, main=expression(paste("Two Year Minimum ",O[2])))
#title(line=-11, main=expression(Two~Year~Minimum~O[2]))
title(line=-12, main=expression(paste("Coefficient ", beta[1])))
#title(line=-12, main=expression(Coefficient~beta[1]))
title(line=-13, main=expression(paste("TP ", mu,"g l"^-1)))
#title(line=-13, main=expression(TP~mu*g~l^-1))
title(line=-14, main=expression(paste(delta^18,"O")))
#title(line=-14, main=expression(delta^18*O))
title(line=-15, main=expression(paste("Foram ", exp(H*minute[bc]))))
#title(line=-15, main=expression(Foram~exp(H*minute[bc])))

To start an expression() with a superscript (or subscript), I use an empty string (you can also use phantom()).

title(line=-16, main= expression(""^14*C*" years BP"))
#title(line=-16, main= expression(phantom()^14*C~years~BP))

So far so good. But sometimes, you want to use the value of an R-object in plot annotation.

For example, if we wanted to label a point with its x value, this will not work.

x<-5
title(line=-17, main= expression(x==x))

Instead of using expression(), we have to use bquote(), with the object we want written out inside .()

title(line=-18, main= bquote(x==.(x)))
title(line=-19, main= bquote(x==.(x)~mu*g~l^-1))
 Plot annotations with expression and bquote

Plot annotations with expression and bquote

If you understand these examples, you should be able to use the remainder of the functionality demonstrated by demo(plotmath) and at ?plotmath.

· · ·

May/15

7

Formatting R code in WordPress

To show formatted code within a paragraph, for example a function name, use <code>plot</code> which will appear as plot.

To show a block of code, use something like this but using square brackets [] rather than braces {}.

{code language=”r”}
x<-rnorm(100)
hist(x)
#don't forget comments
{/code}

Setting the language to R lets the WordPress plugin use appropriate syntax highlighting.

When formatted, the code will look like this

x<-rnorm(100)
hist(x)
#don't forget comments

There are more options to set line numbers and highlighting by adding extra parameters.

Tips:

  • Keep the lines of code short – long lines will force the user to scroll.
  • Use the text editor not the visual editor (which may garble your code)
  • Check the code works (it is very tempting to edit it and break it)

No tags

Theme Design by devolux.nh2.me