LOADING

Type to search

Using ggplot2 for Data Analytics in R On Diamond Data Set

To Know more about the Different Corporate Training & Consulting Visit our website www.Instrovate.com Or Email : info@instrovate.com or WhatsApp / Call at +91 74289 52788

R Programming

Using ggplot2 for Data Analytics in R On Diamond Data Set

Share

plot

We are using diamonds dataset to explore qplot() .

Load ggplot2 package

library(ggplot2)

View diamonds dataset

View(diamonds)

Using ggplot2 for Data Analytics in R On Diamond Data Set 29

We see the structure of diamonds dataset.

str(diamonds)

Using ggplot2 for Data Analytics in R On Diamond Data Set 30

Output:

We check top 6 observations of diamonds dataset.

head(diamonds)

Using ggplot2 for Data Analytics in R On Diamond Data Set 31

We check the summary of variables of diamonds dataset. It shows all the basic descriptive statistics of diamonds dataset.

summary(diamonds)

Using ggplot2 for Data Analytics in R On Diamond Data Set 32

I check the dimension of diamonds . It shows 53940 rows and 10 columns.

dim(diamonds)

Using ggplot2 for Data Analytics in R On Diamond Data Set 33

We plot histogram in ggplot2 by using ggplot() function to define diamonds dataset and add geom_histogram() function to plot histogram . Aesthetic mappings describe how variables in the data are mapped to visual properties(aesthetics) of geoms.  We used binwidth to adjust bins width .

ggplot(data=diamonds) + geom_histogram(binwidth=500, aes(x=diamonds$price))

Using ggplot2 for Data Analytics in R On Diamond Data Set 34

We add labels of x-axis and y-axis by using xlab and ylab parameters. We add title to the graph by using ggtitle() .

ggplot(data=diamonds) + geom_histogram(binwidth=500, aes(x=diamonds$price)) + ggtitle(“Diamond Price Distribution”) + xlab(“Diamond Price U$”) + ylab(“Frequency”)

Using ggplot2 for Data Analytics in R On Diamond Data Set 35

We used theme_minimal() to add white theme to show the graph.

ggplot(data=diamonds) + geom_histogram(binwidth=500, aes(x=diamonds$price)) + ggtitle(“Diamond Price Distribution”) + xlab(“Diamond Price U$”) + ylab(“Frequency”) + theme_minimal()

Using ggplot2 for Data Analytics in R On Diamond Data Set 36

We can see from the graph , there is high frequency of diamonds have price below $5000.

We can get average value of diamond price.

mean(diamonds$price)

Using ggplot2 for Data Analytics in R On Diamond Data Set 37

We can get median of diamond proce.

median(diamonds$price)

Using ggplot2 for Data Analytics in R On Diamond Data Set 38

xlim() is used for adding limits of x-axis.

ggplot(data=diamonds) + geom_histogram(binwidth=500, aes(x=diamonds$price)) + ggtitle(“Diamond Price Distribution”) + xlab(“Diamond Price U$ – Binwidth 500”) + ylab(“Frequency”) + theme_minimal() + xlim(0,2500)

Using ggplot2 for Data Analytics in R On Diamond Data Set 39

We changed the binwidth to 100 to show changes in graph .

ggplot(data=diamonds) + geom_histogram(binwidth=100, aes(x=diamonds$price)) + ggtitle(“Diamond Price Distribution”) + xlab(“Diamond Price U$- Binwidth 100”) + ylab(“Frequency”) + theme_minimal() + xlim(0,2500)

Using ggplot2 for Data Analytics in R On Diamond Data Set 40

By changing binwidth , frequency dropped from 10,000 to 2,000 in diamonds between $500 and $1,000 .

We again change binwidth to 50 to see changes in distribution.

ggplot(data=diamonds) + geom_histogram(binwidth=50, aes(x=diamonds$price)) + ggtitle(“Diamond Price Distribution”) + xlab(“Diamond Price U$ – Binwidth 50”) + ylab(“Frequency”) + theme_minimal() + xlim(0,2500)

Using ggplot2 for Data Analytics in R On Diamond Data Set 41

We can see different frequency by cut of diamond.

ggplot(data=diamonds) + geom_histogram(binwidth=100, aes(x=diamonds$price)) + ggtitle(“Diamond Price Distribution by Cut”) + xlab(“Diamond Price U$”) + ylab(“Frequency”) + theme_minimal() + facet_wrap(~cut)

Using ggplot2 for Data Analytics in R On Diamond Data Set 42

We can see there is wide difference in frequency of different cut of diamond.

We can scatter plot between carat and price of diamonds.

qplot(carat, price, data= diamonds)

Using ggplot2 for Data Analytics in R On Diamond Data Set 43

Now , we can make samples to see better visualization of diamonds dataset .

For creating sample of dataset , we use sample() function.

First , we take help from Help window to see description of sample().

?sample()

It shows this window :

Using ggplot2 for Data Analytics in R On Diamond Data Set 44

Sample technique is randomized technique. It is used to take samples in random manner. So, to make sample reproducible we use set.seed() function.

set.seed(2)

The sample remains same every time we run this function.

We used sample(nrow(diamonds),1000) function . In this function ,

nrow(diamonds) return 53940. So , we are choosing 1000 observations from it.

We used diamonds[sample() , ] function . It is used to select all rows in diamonds dataset on the basis of sample() function. So, it will return 1000 observations of diamonds dataset. We create dsmall dataset where 1000 observations are stored in it.

dsmall <- diamonds[sample(nrow(diamonds), 1000),]

dsmall

We create a scatter plot between carat and price in dsmall dataset . We select different colour for each color of diamonds . We set size of points to represent.

qplot(carat, price, data= dsmall, colour= color, size=4)

Using ggplot2 for Data Analytics in R On Diamond Data Set 45

We select the shape on the basis of cut values in diamonds.

Using ggplot2 for Data Analytics in R On Diamond Data Set 46

We can increase and decrease size of points in graph by using I() function .

qplot(carat, price, data= dsmall, colour= “red”, size= I(2))

Using ggplot2 for Data Analytics in R On Diamond Data Set 47

We used I() in colour to increase intensity of colour , when there are more number of observations .We add alpha parameter to see bulk of points lie.

qplot(carat, price, data= dsmall, colour= I(“red”), size= I(2), alpha= (1/10))

Using ggplot2 for Data Analytics in R On Diamond Data Set 48

We plot a scatter plot between carat and price of dsmall dataset. We also add “smooth” in geom parameter to add smooth line in scatter plot , which shows the average values of points .

qplot(carat, price, data = dsmall, geom = c(“point”, “smooth”))

Using ggplot2 for Data Analytics in R On Diamond Data Set 49

We also plot scatter plot between carat and price in diamonds dataset. We add “smooth” in geom parameter  to add smooth line .

qplot(carat, price, data = diamonds, geom = c(“point”, “smooth”))

Using ggplot2 for Data Analytics in R On Diamond Data Set 50

We find out the price per carat of diamonds across different colors of diamonds using boxplots.

qplot(color, price / carat, data = diamonds, geom = “boxplot”)

Using ggplot2 for Data Analytics in R On Diamond Data Set 51

We create a jittered points to explore the distribution of price per carat varies with the colour of diamonds using geom parameter. The alpha parameter used to show more intensity of observations across observations.

qplot(color, price / carat, data = diamonds, geom = “jitter”, alpha = I(1 / 5))

Using ggplot2 for Data Analytics in R On Diamond Data Set 52

As we decrease alpha value the dark mlack colour shows more observations at that point .

qplot(color, price / carat, data = diamonds, geom = “jitter”,alpha = I(1 / 50))

Using ggplot2 for Data Analytics in R On Diamond Data Set 53

We create a histogram of carat in diamonds . We used fill() to fill bars by different color values from diamonds dataset.  

qplot(carat,data = diamonds, geom = “histogram”, fill= color)

Using ggplot2 for Data Analytics in R On Diamond Data Set 54

We create a density plot of various color values of diamonds .

qplot(carat, data = diamonds, geom = “density”, colour = color)

Using ggplot2 for Data Analytics in R On Diamond Data Set 55

We used binwidth equals to 0.01 for represent bins width in very minute levels. We limit carat value to 3. It shows count of carat of each color of diamonds.

qplot(carat, data = diamonds, facets = color~., geom = “histogram”, binwidth = 0.01, xlim = c(0, 3))

Using ggplot2 for Data Analytics in R On Diamond Data Set 56

We plot scatter plot of price per carat to carat values of dsmall dataset. It shows smoothing curve line also.

qplot( carat, price/carat, data = dsmall,

       ylab = expression(frac(price,carat)),

       xlab = “Weight (carats)”,

       main=”Small diamonds”,

       xlim = c(.2,1)

)  + geom_smooth()

Using ggplot2 for Data Analytics in R On Diamond Data Set 57

Leave a Comment

Your email address will not be published. Required fields are marked *