The R programming language is specifically designed for statistical data analysis. However, to do analysis, the data must come from somewhere, isn’t it?
Therefore, R has various data importing functions, packages that are specifically designed to do the task. If you are an extensive R-Studio user, there is even an option for importing data through a menu bar. We can import data present in different formats such as text, CSV, Excel file, SAS, SPSS, etc.
Through this article, we will see how to import data into R, and cover functions that allow the import to happen and also will look into famous readr and readxl packages with lots of examples. (If you are new to the R programming, please visit our article on Introduction to R Programming.)
One of the most basic data files that we ever try to import under R is no other than text data. The text data files are stored either with the extension “.txt” or “.csv”. The CSV files are more popular among users. We can import text files either using;
Basic functions in R that allow importing text files.
The functions from the readr package that allows importing text files.
Remember, the above-mentioned functions are will allow importing the text files (i.e. files with any of the extension .txt, .csv, .tsv, etc.)
There are two widely used functions (which are built-in under R) to import the text files. Those are read.csv() and read.table(). The read.csv() function is a special case of the read.table() that allows us to read the text files in R. Both these functions import a text file into the R environment as a data frame. So always keep that in mind, the imported file will be a data frame in R.
Let us see an example for the read.csv() function and how it allows us to import data into R.
The read.csv() function to read a text file in R
If you see the code above, it is quite simple. All you need to mention is the file name with extension (.csv in this case) under the double quotes to import a text data file into R.
"The point to be noted here is, the file should be on the working directory of R Programming."
We can always check out the working directory using the getwd() function in R. If you want to set your working directory to a specific path, you can use the setwd() function which does the task for you. You need to specify the path enclosed in double-quotes as an argument to setwd() function. See an example below:
The getwd() and the setwd() functions
Now, let us see how the read.table() function works to import a text file in R programming. The read.table() function is not as simple as the read.csv(). It requires some additional information about the file other than just the name. If you try to use the read.table() similarly to the read.csv(), you will end up with an error as shown below:
The error we get if we use the filename as an argument under the read.table() function
This function also requires other arguments to be filled. For example, we also need to specify if the given text file has a header or not. Besides, it is also mandatory to mention the delimiter argument. Let’s see the example below:
The example code for read.table() function in R
There is a nice article about the data types in R programming which could be accessed through the link Data Types in R Programming.
The readr package in R has some basic functions, the same as the read.table() and read.csv(). The package contains read_table() and read_csv() functions. The readr functions are more popular because they can act 10 times faster than the base R functions and load the data in quick succession.
The read_csv() function from the readr package is same as the read.csv() function. The read_csv(), besides reading and importing the text file also helps us with some advanced options. For example, we can set the column properties using read_csv() at the time of data import or could also specify the column names as per our requirements. (To know how to install and load a package in R, read out the article from my previous trail Packages in R Programming).
Let us see an example where we use the read_csv() function in combination with the col_types argument to set the column types manually.
The read_csv() function with col_types argument
Here, the interesting thing is, we can have the control of specifying the column types on our own at the time of import. We can also set the column names under read_csv() function. See an example below:
Setting column names in read_csv() function
The functions from the readr package are the same as those of base R functions. The only difference is in the speed of execution and some advanced features which functions under the readr package provide us. There is also a package called “data. table” that consists of a function named as fread(). This function works faster than the read.csv() and read_csv() both.
To import Excel files into R, we have different packages such as gdata, RODBC, XLConnect, RExcel, etc. However, the readxl is the most popular among the user and it suffices all the basic requirements of a data analyst associated with importing the excel files.
The functions this readxl package contains are read_excel() and excel_sheets(). The first one allows you to import an excel file into your system whereas the second one allows you to read different sheets present within the excel book.
"The benefit of the readxl package for us is, it can import both the old formatted.“.xls” files as well as the latest formatted “.xlsx” files that are XML based. Moreover, the functions will allow you to load the date/times data under the POSIXct format."
Let us see how the excel_sheets() function work to read the sheets from the given excel workbook.
excel_sheets() function to read the excel sheets and return their names as an output
As we can see, the function has accessed the excel file named “Locations” from the working directory and read out the number of tabs (sheets) with their names. Finally, the names of sheets are returned as an output.
Let us now see how the read_excel() function works to read a specific sheet from the excel file. See an example as shown below:
read_excel() function to read the first sheet from the excel file named Locations
Here, if you see in the code above, we need to specify the workbook name as a mandatory argument. The “sheet =” argument specifies which sheet from the given workbook needs to be imported. If the “sheet = ” argument is not specified, the first sheet from the workbook will be imported by default.
We can conclude that the following points to summarize the article;
Importing data into R is a core task for data analysts.
Usually, the data we import is either in the text (.txt, .csv, etc.) format or in Excel format. Having said that, R is compatible to import data from other sources as well. Such as SAS, SPSS, etc.
Some of the basic functions that we use to import text data files into R are read.csv(), and read.table(). read.csv() is a special case of read.table().
read.table() function requires the “sep = ” argument to be specified. This allows the function to import data column-wise with delimiters.
The readr package consists of similar packages to read text files. Namely read_csv() and read_table().
To import Excel files in R, we have multiple options with packages such as gdata, RODBC, XLConnect, RExcel. However the most convenient is readxl.
This package allows you to import files with both old extensions “.xls” as well as the new files with the “.xlsx” format.
The readxl package contains two important functions namely excel_sheets() and read_excel().
The excel_sheets() function will return a vector with sheet names that are present in the given workbook.
The read_excel() function imports a specified sheet from the given workbook. The sheet name can be specified under the “sheets = ” argument. However, if not specified, the first sheet will be imported by default.
Reliance Jio and JioMart: Marketing Strategy, SWOT Analysis, and Working EcosystemREAD MORE
6 Major Branches of Artificial Intelligence (AI)READ MORE
Top 10 Big Data TechnologiesREAD MORE
8 Most Popular Business Analysis Techniques used by Business AnalystREAD MORE
Deep Learning - Overview, Practical Examples, Popular AlgorithmsREAD MORE
7 types of regression techniques you should know in Machine LearningREAD MORE
Introduction to Time Series Analysis in Machine learningREAD MORE
How Does Linear And Logistic Regression Work In Machine Learning?READ MORE
7 Types of Activation Functions in Neural NetworkREAD MORE
Introduction to Logistic Regression - Sigmoid Function, Code ExplanationREAD MORE