Introduction to R Programming: Part 2
In my previous article, I had given an introduction of R Programming with some basics such as what is R Programming, how R can be used as a calculator, how to assign value towards a variable, and data types in R.
In this article, it will all be about the different data structures present in R. I will try to discuss them in detail as well as with some examples so that you can have a hands-on of the same.
Data Structures
If you ever have worked on core programming languages such as C, C++, Java, etc. you may be well aware of the fact that the variable under which object value is about to be stored, needed to be declared as a data type first. Then and only then you can assign value to that variable. However, in R, you don’t need to define a variable data type before assigning value to it. R language is smart enough to detect the data type that being stored under the variable at the time of the assignment itself.
Every time we create a variable, it occupies some space in computer memory. When it comes in terms of large data, which ideally contains thousands of values (definitely of different data types), it makes no sense to store each value as a separate variable. This is where the concept of data structure has its emergence.
Data structures can be considered as tools in R, which allow you to store data of multiple types. It is the most efficient and organized way of storing data in order to have better data operations and manipulations. There are around five or six widely used data types under R programming language listed as below:
- Vectors
- Lists
- Matrices
- Data Frames
- Arrays
- Factors
We will discuss these data types one by one and in-depth.
Vectors in R
A vector is the most basic and commonly used data type in R. It can be considered as a data structure that consists of the same type of data elements or components. It is nothing but an array/matrix with dimension one. As I have discussed in the previous article as well, to create a vector, we need to use the “combine” operator “c()” in R. This function generates a vector that is also a one-dimensional array. See the Example below
> #Creating a vector using combine operator
> a <- c(2, 4, 6, 8)
> print(a)
[1] 2 4 6 8
Here in this example, we have used the “combine” operator in R that allows us to create a vector with four elements (2, 4, 6, 8) and is named as “a”. Take a note than the elements while creating a vector are separated with comma (“,”). It is possible to create a vector with multiple data types. However, it coerces to the most suitable data types in R.
To know more about the data types in R and coercion, please follow the previous article in this series at: Introduction to R Programming – Part 1
There are multiple ways of creating a vector in R. The one with specifying each element inside the “combine” operator is already been taken care of above. The other way of creating it are as below:
Using colon (“:”) operator
We can use the colon operator (“:”) in R to create a vector of elements between two numbers (both numbers will be included in the resultant vector). See the code below:
> #Creating vector using colon operator
> x <- 6:10
> print(x)
[1] 6 7 8 9 10
The colon operator in the above code generates a vector of numbers between 6 and 10 at interval one unit. Meaning each number of the vector “x” is at a unit distance from another. this is the by default length set for colon operator.
Using “seq()” Function
When it comes to creating a vector, we also can use one built-in function named “seq()” in R. It allows us to create a sequential vector of numbers with specified step value.
> #Creating vector using seq operator
> x <- seq(2, 10)
> print(x)
[1] 2 3 4 5 6 7 8 9 10
This “seq()“ function has a default length or step value as 1 unit. However, we can specify the length using “by” operator.
> #Specifying step using by operator under seq() function
> z <- seq(2, 4, by = 0.4)
> print(z)
[1] 2.0 2.4 2.8 3.2 3.6 4.0
This “by” operator allows us to set the length of a vector. in our example we have set the step at 0.4. which means after every 0.4 units the elements will change.
How to Access Elements of a Vector
We can access the elements of our vector using the concept of vector indexing. Indices are the positions at which we can say the elements of a vector are placed.
To access a specific element of a vector, we need to use the square braces [] and positional value (integer) within it. This method is also known as the indexing of a vector.
> #Accessing elements of a vector
> x <- c(2, 4, 6, 8, 10)
> print(x[1]) #Accessing first element of vector x
[1] 2
In the example above, we tried to access the first element of vector “x” using the indexing method. This method allows us to slice and dice a vector.
We can also use the colon (“:”) operator to slice the vector created. This option will allow us to specify the range of indices and the elements associated with those indices will pop-out
Lists in R
lists are one of the most popular data structures in R programming language. The reason for it being so popular is because, unlike the vectors, it is not restricted to the same data type at the time of creation. It allows you to store data with different types (such as numeric, string, logical, integer, etc.) within a single list and that too while keeping the characteristics of each data type.
To create a list in R, we have a specific function called “list()” that allows us to create a list which is a vector more generalized and containing different data types.
> vect1 <- c(2, 4, 6, 8) #Vector of numeric values
> vect2 <- c("R", "Python", "Sql", "Excel") #vector of strings
> vect3 <- c(TRUE, FALSE, FALSE, TRUE) # Vector of logical values
>#Creating a list that consist vect1 vect2 and vect3
> list1 <- list(vect1, vect2, vect3)
> print(list1)
[[1]]
[1] 2 4 6 8
[[2]]
[1] "R" "Python" "Sql" "Excel"
[[3]]
[1] TRUE FALSE FALSE TRUE
We can also create a list directly specifying the elements under the “list()” function using the “combine” operator to create a vector of each data type under “list()” function as an element. See the screenshot below:
>#Creating list by directly specifying elements
> list2 <- list(c(1, 2, 3, 4), c("Hello", "World!"), c(TRUE, FALSE))
> print(list2)
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Hello" "World!"
[[3]]
[1] TRUE FALSE
Now, if you would have noticed, there are three elements in the list we created and named as “list2”. Each element is a one-dimensional vector. the double square bracketed values are the element indices for this list. [[1]] specifies the first element of the list named as “list2”, [[2]] specifies the second element, and so on.
Accessing elements of a list
Accessing the elements of a list is the same as accessing the elements of a vector. all you need to do is specify the index number under the square brackets after the list.
> #Accessing elements of a list
> print(list2[1])
[[1]]
[1] 1 2 3 4
What if you try to access the element of list which is not exactly the part of it? For Ex. in the list named “list2”, we know that there are only three elements. what if we try to extract the element from the list at index 4?
> print(list2[4]) #Accessing the element which is not in the list
[[1]]
NULL
Well, this is something new. R doesn’t give you an error message for accessing an out of index element. Instead of that, it gives you a “NULL”. which specifies no value for index number 4 under the given list.
Adding element at the last position in the list
It is totally acceptable to add an element at the end of a list. We can do that in the R Programming Language. Which makes a point to note. Lists are mutable in R. meaning you can update and/or add elements of a list.
> list2[4] <- 1 + 2i #Adding an element at the fourth index inside list2
> print(list2)
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Hello" "World!"
[[3]]
[1] TRUE FALSE
[[4]]
[1] 1+2i
Here, we have added an element of complex data type at the end of the list (index 4). We can also add multiple elements at the last position in the list. This usually can be achieved by adding a list inside the list.
> list2[4] <- list(c(1+2i, 4+7i, 3+5i)) #Adding a list inside the given list
>
> print(list2)
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Hello" "World!"
[[3]]
[1] TRUE FALSE
[[4]]
[1] 1+2i 4+7i 3+5i
Update the existing elements of a list
We can update the elements of an existing list, in the same way, using indexing. See an example below:
> list2[2] <- list(c("You're", "Welcome")) #Updating an existing element of a list
> print(list2)
[[1]]
[1] 1 2 3 4
[[2]]
[1] "You're" "Welcome"
[[3]]
[1] TRUE FALSE
[[4]]
[1] 1+2i
In this example, we have updated the second element of “list2”.
In this way we can create and manipulate a list. The next data structure in our list is Matrices
Matrices in R
A Matrix is a two-dimensional rectangular structure in R that consists of rows and columns. A Vector in R can also be considered as a matrix with one dimension. elements in a matrix are arranged in a fixed number of rows and columns and those elements are usually of the data type numeric. In order to create a matrix structure in R, we need to use matrix function.
Syntax for matrix function in R is as shown below:
matrix(data, nrow = , ncol = , byrow = FALSE/TRUE, dimnames = ), Where –
- data – specifies the input vector of elements using which matrix structure can be formed
- nrow – specifies the number of rows in the resultant matrix
- ncol – specifies the number of columns in the resultant matrix
- byrow – specifies whether the matrix is filled by row or not while creating. default value is set to FALSE. If set TRUE, the matrix will be filled with vector elements row-wise
- dimnames – can allow you to specify the names for rows and columns of a resultant matrix
Lets see how to create a matrix using “nrow“ and “ncol“ arguments only.
> #Creating a matrix with 3 rows and two columns
> mat1 <- matrix(c(1, 3, 5, 7, 9, 11), nrow = 3, ncol = 2)
> print(mat1)
[,1] [,2]
[1,] 1 7
[2,] 3 9
[3,] 5 11
Now, inside the matrix named “mat1”, we want the data to be populated row-wise. We will set value for the argument “byrow” as TRUE.
> #Creating a matrix where elements are populated row-wise
> mat1 <- matrix(c(1, 3, 5, 7, 9, 11), nrow = 3, ncol = 2, byrow = TRUE)
> print(mat1)
[,1] [,2]
[1,] 1 3
[2,] 5 7
[3,] 9 11
Now, the next thing is, we would like to have names for the rows and columns of the resultant matrix. We need to specify the values under the “dimnames” argument.
> row_names <- c("row1", "row2", "row3") # Vector of row names
> col_names <- c("col1", "col2") # Vector of column names
> # Adding row names and column names under mat1
> mat1 <- matrix(
c(1, 3, 5, 7, 9, 11),
nrow = 3,
ncol = 2,
byrow = TRUE,
dimnames = list(row_names, col_names)
)
> print(mat1)
col1 col2
row1 1 3
row2 5 7
row3 9 11
Accessing elements from a matrix
Since matrix is a data structure with two-dimensions – rows and columns, we need to use the indices of rows and columns in order to access a particular element from a matrix. we can specify indices for rows and columns inside the squared braces with comma as a separator after the name of a matrix.
For Ex. suppose “mat1” is a matrix and we want to access an element which is placed at a position of mth row and nth column, we can use mat1[m, n] to access that element.
> #Acessing element from mat1 which is at second row and second column
> print(mat1[2, 2])
[1] 7
> mat1[1, ] # elements from first row for both of the columns
col1 col2
1 3
> mat1[ ,2] # Elements from all rows with second column
row1 row2 row3
3 7 11
Modify the elements of a Matrix
Same as with the lists, it is possible to modify the elements of a matrix as well.
> mat1[3 , 1] <- 12 # Modifying element from third row and first column
>
> print(mat1)
col1 col2
row1 1 3
row2 5 7
row3 12 11
In this article, we will stop here. We will keep discussing the remaining three data structures in the next part of this article in detail the same as above. Until then, ciao! 😉
Stay Home! Stay Safe!