# Functions in R – apply(), mapply(), tapply(), lapply()

##### Share

new blog

**Functions in R**

We are going to introduce some basic functions which can help us to work easily in day-to-day work.

**apply()**

The apply function can be used to apply function to margins of an array or matrix.

The syntax of apply() function is :

apply(x, margin, fun, …)

x = an array , including a matrix.

margin = a vector giving the subscripts which the function will be applied over.

fun = the function to be applied

… = optional arguments to function

We create a matrix of 200 elements . We use rnorm() function to create random numbers.It contains 20 rows and 10 columns.

*x <- matrix(rnorm(200), 20,10)*

For calculating function on rows , we use margin =1 . We use apply() to calculate sum of rows of matrix **x**.

*rowSums<- apply(x, 1, sum)*

*rowSums*

We can calculate mean value of 20 rows of matrix by using mean function in apply() .

*rowMeans<- apply(x,1,mean)*

We can find sum column-wise by using margin= 2 and sum function .

*colSums <- apply(x, 2, sum)*

*colSums *

We also find mean of all columns by using mean function.

*colMeans1 <- apply(x, 2, mean)*

*colMeans1*

We create an array **a** as :

*a <- array(1:20, c(2,2,2))*

*a*

We can also find mean of all rows by using following code :

*apply(a, 1,mean)*

We can also find mean of all columns by using following code:

We can also find mean of array in respect of combination of rows and columns by using margin = c(1,2) .

We can also find sum on third dimension . It calculate sum on two matrix separately.

*apply(a,3,sum)*

It shows sum of first matrix as 10.

It shows sum of second matrix as 26.

**lapply()**

It returns a list of same length as X , each element of which is the result of applying FUN to corresponding element of X . It takes three arguments a list , a function and other arguments.

We create a list **x** which stores a vector “a” and another vector “b” stores random numbers.

*x <- list(a = 1:5, b = rnorm(10))*

We want to find mean of list x :

*lapply(x, mean)*

We apply function runif() to “**x**” vector. So, **x** is changed to list and then we apply **runif()** to every element of **x**.

*x <- 1:4*

*lapply(x,runif)*

It create random numbers of uniform distribution. So , it create random number of each element in **x**.

We also specify min and max parameters to generate random numbers between these numbers.

*lapply(x, runif, min=0, max=10)*

We create a list object “m” :

*m <- list(a= matrix(1:4, 2,2), b = matrix(1:6, 3,2))*

We create a function as function(x) x[,1] to apply on list elements. The list elements are matrices .

The function(x) is used to show first column of matrix .

*lapply(m,function(x) x[,1])*

We create a list “x” of four elements contains random numbers . The element “c” contains 20 random numbers with mean equals to 1 . The element “d” contains 100 random numbers with mean 5 .

*x <- list(a= 1:4, b = rnorm(10), c=rnorm(20,1), d= rnorm(100,5))*

We find mean value of each element of list .

*lapply(x, mean)*

**unlist()**

It simplifies list output to a vector .

*unlist(lapply(x, mean))*

sapply()

It is used to show output in vector or matrix form .

*sapply(x, mean)*

We create a matrix “m” of 30 rows and 3 columns. We use cbind(rnorm(30,0),rnorm(30,2),rnorm(30,5))

to combine 30 random numbers of mean 0,30 random numbers of mean 2 and random numbers of mean 5.

*m <- matrix(cbind(rnorm(30, 0), rnorm(30, 2), rnorm(30, 5)), nrow=30, ncol=3)*

We find mean of first column as:

*mean(m[,1])*

We find mean of second column as:

*mean(m[,2])*

We find mean of third column as:

*mean(m[,3])*

We can find column-wise mean of matrix by using function(x) mean(m[,x]).

The function is used to find mean of each column of matrix “m”.

*sapply(1:3, function(x) mean(m[,x])) *

We use length() function to count the number of elements.

We use function(x) length(x[x<0]) , it is used to count the number of elements in each column of matrix where value is less than 0.

*apply(m, 2, function(x) length(x[x<0]))*

We use function(x) mean(x[x>0]) to find mean of column of matrix “m” where values are greater than 0 .

*apply(m, 2, function(x) mean(x[x>0]))*

We want to find square of numbers 1 to 3. We use simplify parameter to represent output in list or vector . If simplify = TRUE or T , than output represent in vector or matrix. If simplify = FALSE or F , then

output represent in list .

We used SIMPLIFY=F , it shows output in list form.

*sapply(1:3, function(x) x^2, simplify=F)*

**tapply()**

The **tapply** function can be used to apply a function to a category of items.

We check the structure of tapply function as:

*str(tapply)*

INDEX = list of one or more factors

We are using mtcars dataset to apply tapply function.

We check the details of mtcars dataset .

*?mtcars*

We want to calculate average weight of car for each category of number of cylinders .

*tapply(mtcars$wt,mtcars$cyl,mean)*

We create a object of random numbers as :

*x <- c(rnorm(10), runif(10), rnorm(10,1))*

We create a factor variable by using **gl()** function . It takes first argument as number of factors and second argument as the number of replications of each factor. We create three factors for 10 random numbers in pairs in **x**.

*f <-gl (3,10)*

We calculate average of each factor.

*tapply(x, f, mean)*

We create a data frame “**a**” as combination of “**x**” and “**y**” vectors.

*x <- 1:20*

*y<- factor(rep(letters[1:5], each= 4))*

*a<-data.frame(x,y)*

We calculate sum of “x” values by factors associated with “y”.

*tapply(x,y, sum)*

We are using **iris** dataset. We attach iris dataset by using this code:

attach(iris)

We can view **iris** dataset as:

View(iris)

We check out the structure of **iris **dataset .

str(iris)

We calculate average Petal.Length of each Species . Here , Species is factor variable .

tapply(iris$Petal.Length, Species, mean)

**by()**

It is same as tapply() , which is applied to data frames.

The syntax of by() is :

by(data, INDICES, FUN, …, simplify = TRUE)

We calculate average of four columns of iris data by Species wise.

*by(iris[,1:4], iris$Species,colMeans)*

**mapply()**

The mapply() function stands for multivariate apply.

We apply rep function to replicate values . The first argument represent function to apply .The second argument is a vector to pass the function . The third argument is number of times to replicate the values. So , the vector have values 1 to 4 ,which replicate 4 times to 1 times.

*mapply(rep, 1:4, 4:1)*

We create two list objects as:

*blue<- list(a = c(1:10), b = c(11:20))*

*red <- list(c = c(21:30), d = c(31:40))*

We calculate summation of vector “a” and “b” from list “blue” . We also sum vector “c” and “d” from list “red”.

*mapply(sum, blue$a,blue$b,red$c, red$d)*

We sum two lists elements wise .We sum “a” with “c” and “b” with “d” . In output , “a” and “b” represent the total sum of “a” with “c” and “b” with “d” respectively.

*mapply(sum,blue,red)*

**split()**

The syntax of split is :

split(x, f, drop = FALSE, …)

It divides the data in the vector **x** in to groups defined by **f**.

x – vector or data frame containing values to be divided into groups

f – a factor variable

drop – if levels that do not occur should be dropped

*x <- c(rnorm(10), runif(10), rnorm(10,1) )*

*f <- gl(3,10)*

We split vector “x” by factor-wise “f”. It shows factors associated with data from vector “x”.

*split(x, f)*

We are going to use airquality dataset . So , we check the description of dataset.

*?airquality*

*head(airquality)*

We split dataset by Month variable. It shows month wise data .

*s <- split( airquality, airquality$Month)*

We calculate average of each column on “s” list.

*lapply(s, colMeans, na.rm = TRUE)*