Functions in R – apply(), mapply(), tapply(), lapply()
Functions in R
We are going to introduce some basic functions which can help us to work easily in day-to-day work.
The apply function can be used to apply function to margins of an array or matrix.
The syntax of apply() function is :
apply(x, margin, fun, …)
x = an array , including a matrix.
margin = a vector giving the subscripts which the function will be applied over.
fun = the function to be applied
… = optional arguments to function
We create a matrix of 200 elements . We use rnorm() function to create random numbers.It contains 20 rows and 10 columns.
x <- matrix(rnorm(200), 20,10)
For calculating function on rows , we use margin =1 . We use apply() to calculate sum of rows of matrix x.
rowSums<- apply(x, 1, sum)
We can calculate mean value of 20 rows of matrix by using mean function in apply() .
We can find sum column-wise by using margin= 2 and sum function .
colSums <- apply(x, 2, sum)
We also find mean of all columns by using mean function.
colMeans1 <- apply(x, 2, mean)
We create an array a as :
a <- array(1:20, c(2,2,2))
We can also find mean of all rows by using following code :
We can also find mean of all columns by using following code:
We can also find mean of array in respect of combination of rows and columns by using margin = c(1,2) .
We can also find sum on third dimension . It calculate sum on two matrix separately.
It shows sum of first matrix as 10.
It shows sum of second matrix as 26.
It returns a list of same length as X , each element of which is the result of applying FUN to corresponding element of X . It takes three arguments a list , a function and other arguments.
We create a list x which stores a vector “a” and another vector “b” stores random numbers.
x <- list(a = 1:5, b = rnorm(10))
We want to find mean of list x :
We apply function runif() to “x” vector. So, x is changed to list and then we apply runif() to every element of x.
x <- 1:4
It create random numbers of uniform distribution. So , it create random number of each element in x.
We also specify min and max parameters to generate random numbers between these numbers.
lapply(x, runif, min=0, max=10)
We create a list object “m” :
m <- list(a= matrix(1:4, 2,2), b = matrix(1:6, 3,2))
We create a function as function(x) x[,1] to apply on list elements. The list elements are matrices .
The function(x) is used to show first column of matrix .
We create a list “x” of four elements contains random numbers . The element “c” contains 20 random numbers with mean equals to 1 . The element “d” contains 100 random numbers with mean 5 .
x <- list(a= 1:4, b = rnorm(10), c=rnorm(20,1), d= rnorm(100,5))
We find mean value of each element of list .
It simplifies list output to a vector .
It is used to show output in vector or matrix form .
We create a matrix “m” of 30 rows and 3 columns. We use cbind(rnorm(30,0),rnorm(30,2),rnorm(30,5))
to combine 30 random numbers of mean 0,30 random numbers of mean 2 and random numbers of mean 5.
m <- matrix(cbind(rnorm(30, 0), rnorm(30, 2), rnorm(30, 5)), nrow=30, ncol=3)
We find mean of first column as:
We find mean of second column as:
We find mean of third column as:
We can find column-wise mean of matrix by using function(x) mean(m[,x]).
The function is used to find mean of each column of matrix “m”.
sapply(1:3, function(x) mean(m[,x]))
We use length() function to count the number of elements.
We use function(x) length(x[x<0]) , it is used to count the number of elements in each column of matrix where value is less than 0.
apply(m, 2, function(x) length(x[x<0]))
We use function(x) mean(x[x>0]) to find mean of column of matrix “m” where values are greater than 0 .
apply(m, 2, function(x) mean(x[x>0]))
We want to find square of numbers 1 to 3. We use simplify parameter to represent output in list or vector . If simplify = TRUE or T , than output represent in vector or matrix. If simplify = FALSE or F , then
output represent in list .
We used SIMPLIFY=F , it shows output in list form.
sapply(1:3, function(x) x^2, simplify=F)
The tapply function can be used to apply a function to a category of items.
We check the structure of tapply function as:
INDEX = list of one or more factors
We are using mtcars dataset to apply tapply function.
We check the details of mtcars dataset .
We want to calculate average weight of car for each category of number of cylinders .
We create a object of random numbers as :
x <- c(rnorm(10), runif(10), rnorm(10,1))
We create a factor variable by using gl() function . It takes first argument as number of factors and second argument as the number of replications of each factor. We create three factors for 10 random numbers in pairs in x.
f <-gl (3,10)
We calculate average of each factor.
tapply(x, f, mean)
We create a data frame “a” as combination of “x” and “y” vectors.
x <- 1:20
y<- factor(rep(letters[1:5], each= 4))
We calculate sum of “x” values by factors associated with “y”.
We are using iris dataset. We attach iris dataset by using this code:
We can view iris dataset as:
We check out the structure of iris dataset .
We calculate average Petal.Length of each Species . Here , Species is factor variable .
tapply(iris$Petal.Length, Species, mean)
It is same as tapply() , which is applied to data frames.
The syntax of by() is :
by(data, INDICES, FUN, …, simplify = TRUE)
We calculate average of four columns of iris data by Species wise.
The mapply() function stands for multivariate apply.
We apply rep function to replicate values . The first argument represent function to apply .The second argument is a vector to pass the function . The third argument is number of times to replicate the values. So , the vector have values 1 to 4 ,which replicate 4 times to 1 times.
mapply(rep, 1:4, 4:1)
We create two list objects as:
blue<- list(a = c(1:10), b = c(11:20))
red <- list(c = c(21:30), d = c(31:40))
We calculate summation of vector “a” and “b” from list “blue” . We also sum vector “c” and “d” from list “red”.
mapply(sum, blue$a,blue$b,red$c, red$d)
We sum two lists elements wise .We sum “a” with “c” and “b” with “d” . In output , “a” and “b” represent the total sum of “a” with “c” and “b” with “d” respectively.
The syntax of split is :
split(x, f, drop = FALSE, …)
It divides the data in the vector x in to groups defined by f.
x – vector or data frame containing values to be divided into groups
f – a factor variable
drop – if levels that do not occur should be dropped
x <- c(rnorm(10), runif(10), rnorm(10,1) )
f <- gl(3,10)
We split vector “x” by factor-wise “f”. It shows factors associated with data from vector “x”.
We are going to use airquality dataset . So , we check the description of dataset.
We split dataset by Month variable. It shows month wise data .
s <- split( airquality, airquality$Month)
We calculate average of each column on “s” list.
lapply(s, colMeans, na.rm = TRUE)