Importing excel data into R

Importing data is one of those things that is often much more effort that it should be. It is often convenient to enter data into excel, but this then needs to be imported into R. There are several ways to do this.

Probably the worst way to do this is to copy the data from excel and use read.table("clipboard"). It can lose precision (try importing 1/3 when the excel sheet has been set to display only one decimal place) and needs doing manually each time the data are imported. “clipboard” is OK for a quick peek at some data, but not as part of any serious analysis that needs to be repeatable.

I used to recommend that people save their data as a tab-delimited file and then import this with read.table(). The main problem with this is that it is easy to get the excel file and the tab-delimited file out of sync. Every time an edit is made to the excel file, the tab-delimited file needs to be re-created, and any edits that have been accidentally made to the tab-delimited file will be lost. Having two versions of the raw data is simply asking for trouble.

Importing the excel file directly is a much better strategy. There are several functions in R that will do this.

  • The RODBC package will read excel files, but only works with the 32-bit version of R which is a bit tedious.
  • The function read.excel() in the xlsx package. This needs java to be installed.
  • The function read_excel() in the readxl package. This is probably the fastest method.
  • Functions in the openxlsx package.
  • read.xls() in the gdata package, which is reported to be very slow
  • There are other solutions

See the stackoverflow discussion of the merits of different methods.

Most of these methods require that the excel worksheet(s) are clean. For example, only one table of data per sheet, columns are all of the same type (so no comments in an otherwise numeric field).

My current recommendation is to use the readxl package. It will only read data from excel, but this is what you want to do most of the time.

To install this package, run install.packages("readxl"). This only needs to be done once. Then run library(readxl) everytime you start R.

About Richard Telford

Institutt for biologi

6 Replies to “Importing excel data into R”

  1. One of the pitfalls of using readxl to read mixed data, it seems, is that you can get some very peculiar objects returned! And they look fine in ‘view’, so you only realise if you use ‘str’ or ‘class’ to inspect the data. Definitely be wary if using this for odd data formats, but it’s still better than the whole two-step csv thingy for datasets which may change.

    • Can you give an example?

      I’ve had fun with readxl escaping backslashes with a second backslash, so that a column of filenames had four backslashes instead of two. A case of trying to be helpful.

      • Sorry, it’s taken me forever to reply. What seems to happen is that it often imports a complex datastructure consisting of two tables and a dataframe sort of nested in one another:

        str(myexceldata)

        returns:

        Classes ‘tbl_df’, ‘tbl’ and ‘data.frame’

        The object ‘myexceldata’ behaves very strangely once you try to do anything with it (e.g. extract subsets, do sum functions). But the solution is easy:

        myexceldata<-as.data.frame(myexceldata)

        This has solved all the problems and it behaves like a 'proper' data frame again. One extra line, but worth it for an input excel file that gets updated rather frequently.

  2. Pingback: 2. How to import data into R – bioST@TS

  3. Pingback: Comparing two variables – Chi square test – bioST@TS

  4. Pingback: 2. How to import data into R -no – bioST@TS

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.