codeRclub | bioCEED R coding club

Nov/15

30

Importing excel data into R

Importing data is one of those things that is often much more effort that it should be. It is often convenient to enter data into excel, but this then needs to be imported into R. There are several ways to do this.

Probably the worst way to do this is to copy the data from excel and use read.table("clipboard"). It can lose precision (try importing 1/3 when the excel sheet has been set to display only one decimal place) and needs doing manually each time the data are imported. “clipboard” is OK for a quick peek at some data, but not as part of any serious analysis that needs to be repeatable.

I used to recommend that people save their data as a tab-delimited file and then import this with read.table(). The main problem with this is that it is easy to get the excel file and the tab-delimited file out of sync. Every time an edit is made to the excel file, the tab-delimited file needs to be re-created, and any edits that have been accidentally made to the tab-delimited file will be lost. Having two versions of the raw data is simply asking for trouble.

Importing the excel file directly is a much better strategy. There are several functions in R that will do this.

  • The RODBC package will read excel files, but only works with the 32-bit version of R which is a bit tedious.
  • The function read.excel() in the xlsx package. This needs java to be installed.
  • The function read_excel() in the readxl package. This is probably the fastest method.
  • Functions in the openxlsx package.
  • read.xls() in the gdata package, which is reported to be very slow
  • There are other solutions

See the stackoverflow discussion of the merits of different methods.

Most of these methods require that the excel worksheet(s) are clean. For example, only one table of data per sheet, columns are all of the same type (so no comments in an otherwise numeric field).

My current recommendation is to use the readxl package. It will only read data from excel, but this is what you want to do most of the time.

To install this package, run install.packages("readxl"). This only needs to be done once. Then run library(readxl) everytime you start R.

5 comments

  • Amy · 31 December 2015 at 16:47

    One of the pitfalls of using readxl to read mixed data, it seems, is that you can get some very peculiar objects returned! And they look fine in ‘view’, so you only realise if you use ‘str’ or ‘class’ to inspect the data. Definitely be wary if using this for odd data formats, but it’s still better than the whole two-step csv thingy for datasets which may change.

    Reply

    • Author comment by Richard Telford · 6 January 2016 at 07:08

      Can you give an example?

      I’ve had fun with readxl escaping backslashes with a second backslash, so that a column of filenames had four backslashes instead of two. A case of trying to be helpful.

      Reply

      • Amy · 10 March 2016 at 10:29

        Sorry, it’s taken me forever to reply. What seems to happen is that it often imports a complex datastructure consisting of two tables and a dataframe sort of nested in one another:

        str(myexceldata)

        returns:

        Classes ‘tbl_df’, ‘tbl’ and ‘data.frame’

        The object ‘myexceldata’ behaves very strangely once you try to do anything with it (e.g. extract subsets, do sum functions). But the solution is easy:

        myexceldata<-as.data.frame(myexceldata)

        This has solved all the problems and it behaves like a 'proper' data frame again. One extra line, but worth it for an input excel file that gets updated rather frequently.

        Reply

  • 2. How to import data into R – bioST@TS · 13 April 2016 at 23:30

    […] also this useful article posted on the codeRclub blog for more info on how to import Excel data into […]

    Reply

  • Comparing two variables – Chi square test – bioST@TS · 4 July 2016 at 15:57

    […] Now: Let’s import the table into R (more info on strategies to import data from Excel here): […]

    Reply

Leave a Reply

<<

>>

Theme Design by devolux.nh2.me