codeRclub | bioCEED R coding club

TAG | excel files

Nov/15

30

Importing excel data into R

Importing data is one of those things that is often much more effort that it should be. It is often convenient to enter data into excel, but this then needs to be imported into R. There are several ways to do this.

Probably the worst way to do this is to copy the data from excel and use read.table("clipboard"). It can lose precision (try importing 1/3 when the excel sheet has been set to display only one decimal place) and needs doing manually each time the data are imported. “clipboard” is OK for a quick peek at some data, but not as part of any serious analysis that needs to be repeatable.

I used to recommend that people save their data as a tab-delimited file and then import this with read.table(). The main problem with this is that it is easy to get the excel file and the tab-delimited file out of sync. Every time an edit is made to the excel file, the tab-delimited file needs to be re-created, and any edits that have been accidentally made to the tab-delimited file will be lost. Having two versions of the raw data is simply asking for trouble.

Importing the excel file directly is a much better strategy. There are several functions in R that will do this.

  • The RODBC package will read excel files, but only works with the 32-bit version of R which is a bit tedious.
  • The function read.excel() in the xlsx package. This needs java to be installed.
  • The function read_excel() in the readxl package. This is probably the fastest method.
  • Functions in the openxlsx package.
  • read.xls() in the gdata package, which is reported to be very slow
  • There are other solutions

See the stackoverflow discussion of the merits of different methods.

Most of these methods require that the excel worksheet(s) are clean. For example, only one table of data per sheet, columns are all of the same type (so no comments in an otherwise numeric field).

My current recommendation is to use the readxl package. It will only read data from excel, but this is what you want to do most of the time.

To install this package, run install.packages("readxl"). This only needs to be done once. Then run library(readxl) everytime you start R.

Theme Design by devolux.nh2.me