Using ggplot2 for Data Analytics in R On Diamond Data Set

The diamonds data set is actually a part of the ggplot2 package. Install ggplot2 package , if you already do not have it . 

install.packages("ggplot2")

Load ggplot2 package

library(ggplot2)

View diamonds dataset

View(diamonds)

Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training !

 

We see the structure of diamonds dataset.

str(diamonds)

Output:

We check top 6 observations of diamonds dataset.

head(diamonds)

We check the summary of variables of diamonds dataset. It shows all the basic descriptive statistics of diamonds dataset.

summary(diamonds)

I check the dimension of diamonds . It shows 53940 rows and 10 columns.

dim(diamonds)

We plot histogram in ggplot2 by using ggplot() function to define diamonds dataset and add geom_histogram() function to plot histogram . Aesthetic mappings describe how variables in the data are mapped to visual properties(aesthetics) of geoms.  We used binwidth to adjust bins width .

ggplot(data=diamonds) + geom_histogram(binwidth=500, aes(x=diamonds$price))

We add labels of x-axis and y-axis by using xlab and ylab parameters. We add title to the graph by using ggtitle() .

ggplot(data=diamonds) + geom_histogram(binwidth=500, aes(x=diamonds$price)) + ggtitle("Diamond Price Distribution") + xlab("Diamond Price U$") + ylab("Frequency")

We used theme_minimal() to add white theme to show the graph.

ggplot(data=diamonds) + geom_histogram(binwidth=500, aes(x=diamonds$price)) + ggtitle("Diamond Price Distribution") + xlab("Diamond Price U$") + ylab("Frequency") + theme_minimal()

 

We can see from the graph , there is high frequency of diamonds have price below $5000.

We can get average value of diamond price.

mean(diamonds$price)

We can get median of diamond proce.

median(diamonds$price)

xlim() is used for adding limits of x-axis.

ggplot(data=diamonds) + geom_histogram(binwidth=500, aes(x=diamonds$price)) + ggtitle("Diamond Price Distribution") + xlab("Diamond Price U$ - Binwidth 500") + ylab("Frequency") + theme_minimal() + xlim(0,2500)

We changed the binwidth to 100 to show changes in graph .

ggplot(data=diamonds) + geom_histogram(binwidth=100, aes(x=diamonds$price)) + ggtitle("Diamond Price Distribution") + xlab("Diamond Price U$- Binwidth 100") + ylab("Frequency") + theme_minimal() + xlim(0,2500)

By changing binwidth , frequency dropped from 10,000 to 2,000 in diamonds between $500 and $1,000 .

We again change binwidth to 50 to see changes in distribution.

 

ggplot(data=diamonds) + geom_histogram(binwidth=50, aes(x=diamonds$price)) + ggtitle("Diamond Price Distribution") + xlab("Diamond Price U$ - Binwidth 50") + ylab("Frequency") + theme_minimal() + xlim(0,2500)

 

We can see different frequency by cut of diamond.

ggplot(data=diamonds) + geom_histogram(binwidth=100, aes(x=diamonds$price)) + ggtitle("Diamond Price Distribution by Cut") + xlab("Diamond Price U$") + ylab("Frequency") + theme_minimal() + facet_wrap(~cut)

We can see there is wide difference in frequency of different cut of diamond.

We can scatter plot between carat and price of diamonds.

qplot(carat, price, data= diamonds)

 Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training !

Now , we can make samples to see better visualization of diamonds dataset .

For creating sample of dataset , we use sample() function.

First , we take help from Help window to see description of sample().

?sample()

It shows this window :

Sample technique is randomized technique. It is used to take samples in random manner. So, to make sample reproducible we use set.seed() function.

set.seed(2)

The sample remains same every time we run this function.

We used sample(nrow(diamonds),1000) function . In this function ,

nrow(diamonds) return 53940. So , we are choosing 1000 observations from it.

We used diamonds[sample() , ] function . It is used to select all rows in diamonds dataset on the basis of sample() function. So, it will return 1000 observations of diamonds dataset. We create dsmall dataset where 1000 observations are stored in it.

dsmall <- diamonds[sample(nrow(diamonds), 1000),]

dsmall

We create a scatter plot between carat and price in dsmall dataset . We select different colour for each color of diamonds . We set size of points to represent.

qplot(carat, price, data= dsmall, colour= color, size=4)

We select the shape on the basis of cut values in diamonds.

We can increase and decrease size of points in graph by using I() function .

qplot(carat, price, data= dsmall, colour= "red", size= I(2))

We used I() in colour to increase intensity of colour , when there are more number of observations .We add alpha parameter to see bulk of points lie.

qplot(carat, price, data= dsmall, colour= I("red"), size= I(2), alpha= (1/10))

We plot a scatter plot between carat and price of dsmall dataset. We also add "smooth" in geom parameter to add smooth line in scatter plot , which shows the average values of points .

qplot(carat, price, data = dsmall, geom = c("point", "smooth"))

We also plot scatter plot between carat and price in diamonds dataset. We add "smooth" in geom parameter  to add smooth line .

qplot(carat, price, data = diamonds, geom = c("point", "smooth"))

We find out the price per carat of diamonds across different colors of diamonds using boxplots.

qplot(color, price / carat, data = diamonds, geom = "boxplot")

We create a jittered points to explore the distribution of price per carat varies with the colour of diamonds using geom parameter. The alpha parameter used to show more intensity of observations across observations.

qplot(color, price / carat, data = diamonds, geom = "jitter", alpha = I(1 / 5))

As we decrease alpha value the dark mlack colour shows more observations at that point .

qplot(color, price / carat, data = diamonds, geom = "jitter",alpha = I(1 / 50))

We create a histogram of carat in diamonds . We used fill() to fill bars by different color values from diamonds dataset.  

qplot(carat,data = diamonds, geom = "histogram", fill= color)

We create a density plot of various color values of diamonds .

qplot(carat, data = diamonds, geom = "density", colour = color)

We used binwidth equals to 0.01 for represent bins width in very minute levels. We limit carat value to 3. It shows count of carat of each color of diamonds.

qplot(carat, data = diamonds, facets = color~., geom = "histogram", binwidth = 0.01, xlim = c(0, 3))

 

We plot scatter plot of price per carat to carat values of dsmall dataset. It shows smoothing curve line also.

qplot( carat, price/carat, data = dsmall,

      ylab = expression(frac(price,carat)),

      xlab = "Weight (carats)",

      main="Small diamonds",

      xlim = c(.2,1)

)  + geom_smooth()

 

Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training !



 

 

Subscribe to our mailing list

* indicates required

Looking for Corporate Training ? Reach out to us at Akriti.Lal@instrovate.com

Reach out to us if you are looking for Corporate Training to Build The Next Generation Analytical Workforce with an in-depth understanding of  Exploratory Data Analysis , Data Visualisation, Data Analytics , AI First , Machine Learning & Deep Learning Training & Consulting helping them to take Data Informed Decision at each stage of the business. 

We understand that At the present times , the Entire Industry is in a Tranformation stage with the Softwares  being rebuilt with Artificial Intelligence Capabilities .

We need SMART WORKFORCE for the SMART SOFTWARES to reap the maximum return . 

 

Whatsapp at +91-9953805788 or email at - akriti.lal@instrovate.com if you would like to know more . 

 

  • Corporate Tableau Training in Gurgaon
  • Corporate Data Analytics Training in Gurgaon
  • Corporate Microsoft Power BI Training in Gurgaon
  • Corporate Microstrategy Training in Gurgaon
  • Corporate Google Data Studio Training in Gurgaon
  • Corporate Python Training in Gurgaon
  • Corporate Advance Analytics in R Programming Training in Gurgaon
  • Corporate Machine Learning Training in Gurgaon
  • Corporate Deep Learning Training in Gurgaon
  • Corporate Data Visualization Training in Gurgaon

 

Address

Instrovate Technologies

Noida, Gurgaon

 

akriti.lal@instrovate.com 

Hit Your Refresh Button To Rise Higher
Print Print | Sitemap
All rights reserved @ Instrovate Technologies
GSTIN : 09AAECI6862K1Z1

Terms Of Services    Privacy Policy    Disclaimer     Refund Policy    


Call

E-mail