codeRclub | bioCEED R coding club

TAG | colnames

In Friday’s codeRclub, we had a problem which involved finding the row and column names for items in a matrix greater than a specified value (e.g. finding the names of the pairs of samples in a  correlation matrix with correlation coefficient greater than 0.5). The problem is that using standard sub-setting methods you are able to find the locations/ values of the cells within the matrix, but not the row or column names. We solved the problem using an argument in the  which command in R. We wrote a function to do this, returning the row and column names and the correlation coefficients in a data.frame.

First simulate a correlation matrix and set our correlation cut-off value:

x <-  matrix(c(1,.8,.2,  .8,1,.7,  .2,.7,1),nrow=3, dimnames = list(c("a", "b", "c"), c("a", "b", "c")))  # Simulate the 3x3 matric and give the matrix row and column names of the samples

Then make a function, which.names.matrix, to return the row and column names of interest. x is a correlation matrix, cutVal is your correlation cut-off value.

which.names.matrix <- function(x, cutVal = 0.5){

x[lower.tri(x)] <- NA # Because it's a correlation matrix, we are only interested in one half of it, so set the lower triangle to NA.

diag(x) <- NA # Set the diagonals to NA

locs <- which(x>cutVal,arr.ind=TRUE) # Find the locations of the cells in the matrix > than cutVal

scores <- na.omit(x[x>cutVal]) # Get the scores of the cells > cutVal

data.frame(row = rownames(x)[locs[,1]], col = colnames(x)[locs[,2]], value = scores) # Return the data.frame with the row and column names, plus the scores


· · ·

Theme Design by