Functions in R
We are going to introduce some basic functions which can help us to work easily in daytoday work.
apply()
The apply function can be used to apply function to margins of an array or matrix.
The syntax of apply() function is :
apply(x, margin, fun, ...)
x = an array , including a matrix.
margin = a vector giving the subscripts which the function will be applied over.
fun = the function to be applied
... = optional arguments to function
We create a matrix of 200 elements . We use rnorm() function to create random numbers.It contains 20 rows and 10 columns.
x < matrix(rnorm(200), 20,10)
For calculating function on rows , we use margin =1 . We use apply() to calculate sum of rows of matrix x.
rowSums< apply(x, 1, sum)
rowSums
We can calculate mean value of 20 rows of matrix by using mean function in apply() .
rowMeans< apply(x,1,mean)
We can find sum columnwise by using margin= 2 and sum function .
colSums < apply(x, 2, sum)
colSums
We also find mean of all columns by using mean function.
colMeans1 < apply(x, 2, mean)
colMeans1
We create an array a as :
a < array(1:20, c(2,2,2))
a
We can also find mean of all rows by using following code :
apply(a, 1,mean)
We can also find mean of all columns by using following code:
We can also find mean of array in respect of combination of rows and columns by using margin = c(1,2) .
We can also find sum on third dimension . It calculate sum on two matrix separately.
apply(a,3,sum)
It shows sum of first matrix as 10.
It shows sum of second matrix as 26.
lapply()
It returns a list of same length as X , each element of which is the result of applying FUN to corresponding element of X . It takes three arguments a list , a function and other arguments.
We create a list x which stores a vector "a" and another vector "b" stores random numbers.
x < list(a = 1:5, b = rnorm(10))
We want to find mean of list x :
lapply(x, mean)
We apply function runif() to "x" vector. So, x is changed to list and then we apply runif() to every element of x.
x < 1:4
lapply(x,runif)
It create random numbers of uniform distribution. So , it create random number of each element in x.
We also specify min and max parameters to generate random numbers between these numbers.
lapply(x, runif, min=0, max=10)
We create a list object "m" :
m < list(a= matrix(1:4, 2,2), b = matrix(1:6, 3,2))
We create a function as function(x) x[,1] to apply on list elements. The list elements are matrices .
The function(x) is used to show first column of matrix .
lapply(m,function(x) x[,1])
We create a list "x" of four elements contains random numbers . The element "c" contains 20 random numbers with mean equals to 1 . The element "d" contains 100 random numbers with mean 5 .
x < list(a= 1:4, b = rnorm(10), c=rnorm(20,1), d= rnorm(100,5))
We find mean value of each element of list .
lapply(x, mean)
unlist()
It simplifies list output to a vector .
unlist(lapply(x, mean))
sapply()
It is used to show output in vector or matrix form .
sapply(x, mean)
We create a matrix "m" of 30 rows and 3 columns. We use cbind(rnorm(30,0),rnorm(30,2),rnorm(30,5))
to combine 30 random numbers of mean 0,30 random numbers of mean 2 and random numbers of mean 5.
m < matrix(cbind(rnorm(30, 0), rnorm(30, 2), rnorm(30, 5)), nrow=30, ncol=3)
We find mean of first column as:
mean(m[,1])
We find mean of second column as:
mean(m[,2])
We find mean of third column as:
mean(m[,3])
We can find columnwise mean of matrix by using function(x) mean(m[,x]).
The function is used to find mean of each column of matrix "m".
sapply(1:3, function(x) mean(m[,x]))
We use length() function to count the number of elements.
We use function(x) length(x[x<0]) , it is used to count the number of elements in each column of matrix where value is less than 0.
apply(m, 2, function(x) length(x[x<0]))
We use function(x) mean(x[x>0]) to find mean of column of matrix "m" where values are greater than 0 .
apply(m, 2, function(x) mean(x[x>0]))
We want to find square of numbers 1 to 3. We use simplify parameter to represent output in list or vector . If simplify = TRUE or T , than output represent in vector or matrix. If simplify = FALSE or F , then
output represent in list .
We used SIMPLIFY=F , it shows output in list form.
sapply(1:3, function(x) x^2, simplify=F)
tapply()
The tapply function can be used to apply a function to a category of items.
We check the structure of tapply function as:
str(tapply)
INDEX = list of one or more factors
We are using mtcars dataset to apply tapply function.
We check the details of mtcars dataset .
?mtcars
We want to calculate average weight of car for each category of number of cylinders .
tapply(mtcars$wt,mtcars$cyl,mean)
We create a object of random numbers as :
x < c(rnorm(10), runif(10), rnorm(10,1))
We create a factor variable by using gl() function . It takes first argument as number of factors and second argument as the number of replications of each factor. We create three factors for 10 random numbers in pairs in x.
f <gl (3,10)
We calculate average of each factor.
tapply(x, f, mean)
We create a data frame "a" as combination of "x" and "y" vectors.
x < 1:20
y< factor(rep(letters[1:5], each= 4))
a<data.frame(x,y)
We calculate sum of "x" values by factors associated with "y".
tapply(x,y, sum)
We are using iris dataset. We attach iris dataset by using this code:
attach(iris)
We can view iris dataset as:
View(iris)
We check out the structure of iris dataset .
str(iris)
We calculate average Petal.Length of each Species . Here , Species is factor variable .
tapply(iris$Petal.Length, Species, mean)
by()
It is same as tapply() , which is applied to data frames.
The syntax of by() is :
by(data, INDICES, FUN, ..., simplify = TRUE)
We calculate average of four columns of iris data by Species wise.
by(iris[,1:4], iris$Species,colMeans)
mapply()
The mapply() function stands for multivariate apply.
We apply rep function to replicate values . The first argument represent function to apply .The second argument is a vector to pass the function . The third argument is number of times to replicate the values. So , the vector have values 1 to 4 ,which replicate 4 times to 1 times.
mapply(rep, 1:4, 4:1)
We create two list objects as:
blue< list(a = c(1:10), b = c(11:20))
red < list(c = c(21:30), d = c(31:40))
We calculate summation of vector "a" and "b" from list "blue" . We also sum vector "c" and "d" from list "red".
mapply(sum, blue$a,blue$b,red$c, red$d)
We sum two lists elements wise .We sum "a" with "c" and "b" with "d" . In output , "a" and "b" represent the total sum of "a" with "c" and "b" with "d" respectively.
mapply(sum,blue,red)
split()
The syntax of split is :
split(x, f, drop = FALSE, ...)
It divides the data in the vector x in to groups defined by f.
x  vector or data frame containing values to be divided into groups
f  a factor variable
drop  if levels that do not occur should be dropped
x < c(rnorm(10), runif(10), rnorm(10,1) )
f < gl(3,10)
We split vector "x" by factorwise "f". It shows factors associated with data from vector "x".
split(x, f)
We are going to use airquality dataset . So , we check the description of dataset.
?airquality
head(airquality)
We split dataset by Month variable. It shows month wise data .
s < split( airquality, airquality$Month)
We calculate average of each column on "s" list.
lapply(s, colMeans, na.rm = TRUE)

