Introduction to Functions in R Programming
The most widely used part of any programming language is functions. They can be considered as a block of codes that can be reused again and again to perform a repeating task in a programming language. Using function not only reduces the repeatability but also helps in reducing the complexity of a program. Imagine yourself writing the same code hundred of times. You are making your code a complex one to read and at the same time repeating the same line of code makes it dull. Instead of that, it is always a good idea to create a function for the task and use it to reduce the repeatability and complexity of your code. Remember a rule: lesser the lines of code, more user-friendly the code will be.
R programming language is also rich with built-in functions. Having said that, we have all the freedom to develop functions of our own (user-defined functions), which make day to day tasks easy and saves a lot of effort for programmers. These functions are considered as objects under R and are encapsulated under the curly braces {}. In this article, we will go through the functions in R and how they work.
Syntax of a function in R
Following is a general syntax of any function in R:
function_name <-
function ( arglist ){
expr
return (Value_to_return)
}
Where –
function_name
– is the name assigned to the function created. This name can be called out every time we need to use this function under a block of code.arglist
– is a list of arguments that we specify under function. These arguments work as input to any function. They usually work as place holders while defining the function. However, it becomes mandatory to provide their values when we call the function out. A function may consist of one argument or more than one argument.expr
– it stands for a set of expressions that do all the magic and considered as an integral part of the function body. These are nothing but instructions which programming language will follow.value_to_return
– This is the final value which function returns after completing its task by following instructions provided intoexpr
section.
A thing to note is, functions are objects. They take control of the R programming interpreter for some time with arguments to complete a definitive task. Once the task is completed, they may return a value or may not return a value (depending upon the user requirements) before handing control over to the interpreter.
While calling a function under our block of codes, all we need is the function_name and arguments within it which are required for that function to be executed.
function_name ( arglist )
Here, function_name
is the name we have given to this function at the time of defining it and arglist
is the list of argument/s which we need the function to be executed on.
Need of functions in R
In general words, as we discussed earlier a function can be used whenever we feel that we need to repeat certain lines of code again and again. Consider an example below:
#Defining three vectors with six elements in each
vect1 <- c(76,52,55,66,50,73)
vect2 <- c(48,73,72,64,71,35)
vect3 <- c(70,78,74,85,90,80)
#Repetitive Task
vect1 <- vect1 ^ 3 - vect1
vect2 <- vect2 ^ 3 - vect2
vect3 <- vect3 ^ 3 - vect3
#Printing Values
print(vect1)
print(vect2)
print(vect3)
In the example code above, we have defined three vectors namely vect1
, vect2
, and vect3
with six elements in each vector. Now, we wanted to rescale each of these vectors in a way that, the vector should be subtracted from its cube and the value should again be stored under the same vector. Finally, we have used the print()
function to print the values of each vector. If we run this code, we get a series of output as below:
> print(vect1)
[1] 438900 140556 166320 287430 124950 388944
> print(vect2)
[1] 110544 388944 373176 262080 357840 42840
> print(vect3)
[1] 342930 474474 405150 614040 728910 511920
However, this seems to be a repeating task and will consume multiple lines when added under our code. Instead of that, why should not we create a function which takes a vector as an argument and rescale it as we wanted? in that way, we will reduce the reputation in our code. This is how a necessity of function gets evolved. Let’s create a function that will do this task for us.
rescale <- function(vector_name){
vector_name <- vector_name ^ 3 - vector_name
return(vector_name)
}
In the code above, we are creating a function named rescale. This function takes vector_name as an argument and under the function body, it executes an expression that cubes each value from the vector_name and subtracts vector_name from the same. Finally, the function returns the newly transformed vector as an output.
We will run this code to create a function and after that, use this function on our three vectors to check whether it works as expected.
> rescale(vect1)
[1] 438900 140556 166320 287430 124950 388944
> rescale(vect2)
[1] 110544 388944 373176 262080 357840 42840
> rescale(vect3)
[1] 342930 474474 405150 614040 728910 511920
You will see we have the same output as the previous code. However, here we reduced the repetition of the code block by creating a function. Instead of that, we are now using this newly created function named rescale on vectors to get the desired output. This is how a function gets evolved.
A thing to note here is, before using rescale function on the vectors, we have recreated those vectors using a previous set of values. Otherwise, our results would have been different.
Functions with one line
There is a possibility that your function body may contain a single line expression while defining a new function of your own. An interesting feature of R is though, in such cases, it accepts the expression as a function body without you specifying it under curly braces. Therefore, a general trivia: If your function is a one-liner, it is OK to not use curly braces while defining the same. This, however, is not recommended to maintain the clean code structure as well as to reduce ambiguity. Let’s see an example below:
> vect3 <- c(2, 4, 6, 8, 10) #creating a vector with 5 element
> print(vect3) #Printing the vector to see output
[1] 2 4 6 8 10
> #Defining a function named cube
> cube <- function(vector_name) vector_name <- vector_name ^ 3
> #applying cube function on vect3 and printing output
> print(cube(vect3))
[1] 8 64 216 512 1000 #Each value from vect3 is cubed
Here, we have created a function named cube which raises the power of each element under vector named vect3 three times and stores the output under the same vector. A noteworthy thing here is, we haven’t used the curly braces while defining an expression under function named cube. This even though looks simple, is not recommended for the user. This type of code may create ambiguity while for the reader and also doesn’t look clean.
Importance of explicit return in a Function
It is OK if you forget to add an explicit return value while defining a function in R (but not recommended; same as previous case). In such cases, where you tend to forget mentioning the explicit return, the last expression gets evaluated from the function body automatically.
Consider an example below where we are not using explicit return while defining a function.
#Creating a function named return_check
return_check <- function(value){
if (value > 0) {
return_value <- "Positive"
}
else if (value < 0) {
return_value <- "Negative"
}
else {
return_value <- "Zero"
}
return_value
}
Here, under function return_check, at the time of execution, the variable return_value will result in “Positive”, “Negative” or “Zero” based on whether the input value is greater than zero, less than zero or zero. Here we are not using any explicit return in the above code. Instead of that, we are calling return_value at the end of the function definition. This will allow us to get the result once the function is executed on a numeric value. See some of the executions as below:
> return_check(-50)
[1] "Negative"
> return_check(22)
[1] "Positive"
> return_check(0)
[1] "Zero"
A code without explicit return is comparatively slow while execution than a code with an explicit return. Because, in such code (without explicit return), the system executes all the expressions throughout the function body, stores output in the buffer and at the end returns the output (look at return_value called at the end of the code).
well, return function allows us to get the immediate result once an expression is executed under a function body. See the code below:
#Creating a function named return_check
return_check <- function(value){
if (value > 0) {
return("Positive")
}
else if (value < 0) {
return("Negative")
}
else {
return("Zero")
}
}
Within the code above, we have used the explicit return at the end of each expression under function body and now when I execute it and use on a value, see what’s on line.
> return_check(20)
[1] "Positive"
You will not see any difference in output as it is still working fine as per the conditional logic as the previous code was. However, the main thing that’s noteworthy is in the backend or compiler level. In contrast to previous code which was evaluating each expression under the function body, this code above terminates the execution of remaining expression once it found that a certain expression holds. This saves time at the compiler level. Imagine yourself in a situation where you are creating a function with a huge number of expressions, in such cases, the above code with the explicit return will take lesser time to compile and generate output than the previous code without explicit return.
Scoping of a function in R
Scoping is an important aspect of any programming language. Ideally, we ignore scoping at the beginning; reason? we are damn sure that whatever we are coding is going to stay with us. Thus it doesn’t affect us even if the variables under function are defined globally or not. However, gradually, we will go towards a phase where we need to share our code with some of our colleagues or end-users. This is the time, we start (and should always) worrying about the quality of our code – starts with scoping.
Every function gets defined under the local environment in R, and it tries to communicate in every possible way with an outer layer or top layer of environment that is called the global environment. While trying to establish a communication with the global environment, there are certain limitations; the local environment can communicate only up-to the extent it has been allowed to. This is nothing but the scoping.
Certain scoping rules
- When a function gets defined, it is inside a global environment.
- Input variables created and used as an argument inside a function are having local scope. They are part of the function only and have no existence outside the function body. Meaning, they can’t be found outside the function body if tried to.
- Since each time function gets invoked, the current invocation is independent of all previous ones. Due to which, the variables which are declared within a function always tend to change their values.
- Arguments are immutable in a function. Meaning, when you are changing the value of an argument, you actually are creating a new variable with new value and system using it as an argument. This makes your code much safer than other programming languages and you can debug it out easily.
Conclusion
We can conclude from this article that; functions are an integral part of R programming language and can be considered as a part of basic building blocks. They help us reduce the complexity of our code (both in terms length and time), they help us in dealing with repeated task more smoothly, use of functions not only make our tasks easy, but also reduces the complexity of code written.
We will stop here and come up with a new article when we see each other next time around. until then, Stay Safe! 🙂