Data Structures in R : Part 1

  • Lalit Salunkhe
  • Jul 01, 2020
  • R Programming
Data Structures in R : Part 1 title banner

It becomes extremely essential to have a collection of data instead of single objects when you are working on any programming language. Most of the programming languages such as C, C++, Java need you to define the data type of variables before assigning values to the variables. However, R is not that complicated.

 

It is smart enough to identify the data type at the time you assign a variable to it. About the data types, I have covered those in my previous article. You can find my previous article on the link Data Types in R,  and in this article, we will walk through that collection of data points which generally is termed as a data structure.

 

Data structures, you consider them as a tool which allows you to store a collection of data so that it can be used for various purposes. Well, some of these data structures allow you to store data of the same type and some of them allow you to store different data types as well. In this article, and the next one as well, we are going to discuss the data structures in R.

 

There are six data types in R roughly which data analysts and scientists consider usually. Those are listed as below:

 

  • Vectors

  • Lists

  • Matrices

  • Data Frames

  • Arrays

  • Factors

In this article, we will be discussing the vectors, Lists, and Matrices in detail. Whereas, in the next article of the same series, we will be discussing the Data Frames, Arrays, and the Factors.

 

Let’s start this exciting journey and I will make sure you know everything about the data structures throughout.

 

 

Vectors in R Programming

 

The most common data type in R is a vector. It is a one-dimensional data structure that can only contain data of homogeneous type. Vectors can be created by specifying arguments under the combine function “c()”. See the image below, where we created a vector using the combine function.


Code Illustration 1

Code Illustration1


Here, we have created a vector with four elements 1, 4, 5, and 6 respectively which is named as “x”. c() function allows us to combine multiple values together and create a vector as shown in the screenshot above. 

We can also use the built-in assign() function under R that allows us to create a vector under R environment.

See the screenshot below for a better realization:

 


Code Illustration 2

Code Illustration 2


Here, the assign function takes a vector name as a first argument which we need to specify under double or single quotes and then we can specify the values we need to assign that vector name to. 

 

As I already mentioned at the start of the article itself, the vectors are data structures of homogeneous type. Therefore, we can only assign values of the same data type at a time while creating one.


Code Illustration 3

Code Illustration 3


Having said that, you still can create a vector with heterogeneous data types, however, it will not be in a way you are expecting it. There will be data coercion issues, so better not to go that way. 

 

 

Lists in R Programming

 

Lists are the most dynamically designed data structures in R. They are allowing us to store the data of multiple types without coercion and changing the properties of data observations. Meaning, if you add data that is of different types, you still can be able to do the calculations without losing their general properties. They are ideally developed in contrast to vectors which are ideally developed to hold elements of the homogeneous data type.

 

In layman words, lists can hold the data of multiple types without coercing the type of data.

 

To create a list in R, we have a function named as ‘list()’. This function allows us to create a list within the R environment with different data types altogether.

 

See an example below for a better understanding:


Code Illustration 4

Code Illustration 4


You can also create a list by specifying list as an argument under it (list in the list). That’s the reason lists are often known as recursive vectors. The reason, they can allow you to store a list within a list itself.


Code Illustration 5

Code Illustration 5


If you see in the example above, we have created a list named “list2” by specifying a series of lists within the function itself. And each element is getting printed as a list.

 

We can also store vectors under a list (using combine function or directly specifying the vector named which we created), but again the same thing we need to take care of. All vectors should be of the same data type. Otherwise, there will be data type coercion.


Code Illustration 6

Code Illustration 6


Until now, we have seen the data structures which are of one-dimension (Vectors and Lists). The next data structure we are going to look at is a two-dimensional data structure (having rows and columns as the two dimensions respectively).

 

 

Matrices in R Programming

 

A Matrix in R Programming is a two-dimensional data structure that consists of rows and columns as dimensions respectively. However, the thing to note here is, a Matrix is of the homogeneous data type. Meaning, all rows and columns should have data of the same type (either numeric, string, boolean, complex, or integer). Having said that, the most commonly used data type while creating a matrix is numeric.

 

We have a matrix function under R that allows us to create a matrix under its environment. Below is the syntax for the matrix function under R.

 

matrix(nrow= , ncol= , byrow= , dimnames=)

Where,

 

data - is an input vector of elements that can be used to create a matrix structure.

 

nrow - specifies the number of rows that the resultant matrix should contain.

 

ncol - specifies the number of columns that the resultant matrix should contain.

 

byrow - is an argument with a logical call that specifies whether the matrix should be filled. by row or not. If mentioned TRUE, the matrix will be filled by row and will be filled by column if specified as FALSE. The default value for this argument is FALSE.

 

dimnames - this argument allows you to specify names for the rows and columns. In short, it allows you to add a row and column labels. However, this is not an optional argument and it is OK if you don’t specify it.

 

Let us create a matrix structure using this function call. See the image below for your reference:


Code Illustration 7

Code Illustration  7


Here, we are creating a matrix with three rows and three columns with elements getting filled row-wise.

 

If you don’t want to fill the matrix by row, you can either set the “byrow”option as FALSE or may remove the argument itself since the default value for byrow is FALSE itself in the function.


Code Illustration 8

Code Illustration 8


These are the three data structures we will be discussing here in this article. The next article will be focused on the remaining of the data structures i.e. Data Frames, Arrays, and Factors.

 

Conclusion

 

Data structures are an integral part of any programming language. In R programming language, we have six data types namely vectors, lists, matrices, data frames, arrays, and factors. We have tried discussing the first three data types of the list and how they can be created and how they work as well.

 

Stay tuned for my next article where I will be discussing the data frames, arrays, and factors. Until then, stay safe! 😊

0%

Comments