"ggpubr" package in R for Data Visualization

We are going to use "ggpubr" package for data visualization .

ggpubr

It provides some easy-to-use functions for creating and customizing "ggplot2" based publication ready plots.

We install "ggpubr" package as:

We load "ggpubr" package as:

We set the seed of random number generator , which is useful for creating random objects can be reproduced.

set.seed(1234)

We are creating a data frame contains variable 'sex' and 'weight' . We are using rnorn() function to generate random numbers from normal distribution . We are creating first 300 random numbers with mean 45 and next 300 random numbers with mean 49 .  

wdata = data.frame(

 sex = factor(rep(c("F", "M"), each=300)),

 weight = c(rnorm(300, 45), rnorm(300, 49)))

We check top four observations of data frame wdata as:

head(wdata, 4)

We create a density plot by using ggdensity()  function.

The first argument specifies the dataset  and x specifies the variable to be drawn . The add argument is used  to add mean line in the plot. We added rug to the plot so that we can display individual plots of density plot. We used color argument to color on the basis of sex value. We used fill argument to fill color according to sex value . We used palette for coloring or filling by group.

ggdensity(wdata, x = "weight",

         add = "mean", rug = TRUE,

         color = "sex", fill = "sex",

         palette = c("#00AFBB", "#E7B800"))

 

We plot histogram with same options by using gghistogram() function .

gghistogram(wdata, x = "weight",

           add = "mean", rug = TRUE,

           color = "sex", fill = "sex",

           palette = c("#00AFBB", "#E7B800"))

The default value of bins are 30 to plot histogram .

 

 

gghistogram(wdata, x = "weight",

           add = "mean", rug = TRUE,

           color = "sex", fill = "sex",bins = 50,

           palette = c("#00AFBB", "#E7B800" ))


 

We have changes bins equal to 50 to see the difference in histogram formation. Now , the plot is more wider and more frequent observations can be seen .

 

We want to work on ToothGrowth dataset . We load ToothGrowth dataset by using following code :

data("ToothGrowth")

We check the description of ToothGrowth dataset as :

?ToothGrowth

 

df <- ToothGrowth

We want to see top four observations of ToothGrowth dataset.

head(df, 4)

 

We create a box plot by using ggboxplot() function . The arguments of function are :

data - a data frame

x - character string containing the name of x variable

y - character string containing one or more variables to plot

color - outline color

palette - the color palette to be used for coloring or filling by groups .

add - character vector for adding another plot element . We are adding "jitter" in the plot

shape - the shape or symbol to represent different box plots points .

 

We want to plot box plot with different doses with respect to len or Tooth length .

We can check the dose values by using following code :

unique(df$dose)

 

p <- ggboxplot(df, x = "dose", y = "len",

              color = "dose", palette =c("#00AFBB", "#E7B800", "#FC4E07"),

              add = "jitter", shape = "dose")

p


 

 

We are using stat_compare_means() function to compare p-values to a ggplot for box plots , dot plots and  stripcharts .

 

The arguments of stat_compare_means() are -

comparisons - a list of two length vectors . The entries in vectors are either the names of two values on the x-axis or the two integers that correspond to the index of the groups of interest , to be compared .

We add label.y argument to 50 for absolute positioning of the label .

 

my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )

p + stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value

 stat_compare_means(label.y = 50)                   

We create violin plots with box plots inside . We used add.params argument to add different parameters like color  , shape , size etc .

ggviolin(df, x = "dose", y = "len", fill = "dose",

        palette = c("#00AFBB", "#E7B800", "#FC4E07"),

        add = "boxplot", add.params = list(fill = "white"))+

 stat_compare_means(comparisons = my_comparisons, label = "p.signif")+ # Add significance levels

 stat_compare_means(label.y = 50)                                      # Add global the p-value

We can also create dot plots and adding mean and standard deviation line in plot .

ggdotplot(df, x = "dose", y = "len", color = "dose", fill = "dose",

         palette = c("#00AFBB", "#E7B800", "#FC4E07"),

         add = "mean_sd", add.params = list(color = "black"))

 

We create a new data frame as :

df3 <- data.frame(supp=rep(c("AB", "SK"), each=3),

                 dose=rep(c("D0.5", "D1", "D2"),2),

                 len=c(7.2, 12, 34, 5, 8, 34.2))

We print the value of df3 as :

print(df3)

We create a bar plot to fill color on the basis of "supp" group . We use lab.col to specify color of label as white and lab. pos to specifying the position of labels. So , lab.pos defined position as inside the plot.

ggbarplot(df3, x = "dose", y = "len",

         fill = "supp", color = "supp", palette = c("#00AFBB", "#E7B800"),

         label = TRUE, lab.col = "white", lab.pos = "in")

 

 

We plot line plots with multiple groups . Here , we want to plot line plots combination of dose and len values. we use shape group by supp values .

ggline(df3, x = "dose", y = "len",

      linetype = "supp", shape = "supp",

      color = "supp",  palette = c("#00AFBB", "#E7B800"))

 

We can create a pie chart by using ggpie() function .

We create a data frame df4 as :

df4 <- data.frame(

 group = c("Male", "Female", "Child"),

 value = c(22, 19, 45))

We check the dataset  df4 as :

df4

We create a new variable labs to store the combination of group and values .

labs <- paste0(df4$group, " (", df4$value, "%)")

ggpie(df4, x = "value", fill = "group", color = "white",

     palette = c("#00AFBB", "#E7B800", "#FC4E07"),

     label = labs, lab.pos = "in", lab.font = "white")

 

We want to work with "mtcars" dataset . We load "mtcars" dataset as :

data("mtcars")

We create a new object to store mtcars dataset.

dfm <- mtcars

We convert the cyl variable to a factor

dfm$cyl <- as.factor(dfm$cyl)

We add a new column name to store the name of cars .

dfm$name <- rownames(dfm)

We check top observations of dfm dataset

head(dfm[, c("wt", "mpg", "cyl")])

We create a scatter plot with concentration ellipses and labels . We use repel to avoid overplotting text labels .

ggscatter(dfm, x = "wt", y = "mpg",

         color = "cyl", shape = "cyl",

         palette = c("#00AFBB", "#E7B800", "#FC4E07"),

         ellipse = TRUE, mean.point = TRUE,

         rug = TRUE, label = "name", font.label = 10, repel = TRUE)

 


 

We create bar plot and sort data in descending order by using sort.val =desc . We fill color in the bars by cyl values  . We set white color to bar borders . We used sort.by.groups as FALSE to not sort data by groups . We used x.text.angle = 90 to rotate x-axis in 90⁰ . 

ggbarplot(dfm, x = "name", y = "mpg",

         fill = "cyl",               # change fill color by cyl

         color = "white",            # Set bar border colors to white

         palette = "jco",            # jco journal color palett. see ?ggpar

         sort.val = "desc",          # Sort the value in dscending order

         sort.by.groups = FALSE,     # Don't sort inside each group

         x.text.angle = 90           # Rotate vertically x axis texts

)

 

We change the value of sort.by.groups as TRUE , the data sort by each group .

ggbarplot(dfm, x = "name", y = "mpg",

         fill = "cyl",               # change fill color by cyl

         color = "white",            # Set bar border colors to white

         palette = "jco",            # jco journal color palett. see ?ggpar

         sort.val = "asc",           # Sort the value in dscending order

         sort.by.groups = TRUE,      # Sort inside each group

         x.text.angle = 90           # Rotate vertically x axis texts

)

 

We create a dot chart by using following code :

ggdotchart(dfm, x = "name", y = "mpg",

          color = "cyl",                                

          palette = c("#00AFBB", "#E7B800", "#FC4E07"),

          sorting = "ascending",                

          add = "segments",                       

          ggtheme = theme_pubr()           

)

We add background theme in Plots window by using ggtheme = theme_pubr()

 

We create a dot chart on graph between mpg and name of mtcars dataset . You can see various attributes of ggdotchart() function as :

?ggdotchart

ggdotchart(dfm, x = "name", y = "mpg",

          color = "cyl",                                # Color by groups

          palette = c("#00AFBB", "#E7B800", "#FC4E07"), # Custom color palette

          sorting = "descending",                       # Sort value in descending order

          add = "segments",                             # Add segments from y = 0 to dots

          rotate = TRUE,                                # Rotate vertically

          group = "cyl",                                # Order by groups

          dot.size = 6,                                 # Large dot size

          label = round(dfm$mpg),                        # Add mpg values as dot labels

          font.label = list(color = "white", size = 9,

                            vjust = 0.5),               # Adjust label parameters

          ggtheme = theme_pubr()                        # ggplot2 theme

)

 





 

 

Subscribe to our mailing list

* indicates required

Looking for Corporate Training ? Reach out to us at Akriti.Lal@instrovate.com

Reach out to us if you are looking for Corporate Training to Build The Next Generation Analytical Workforce with an in-depth understanding of  Exploratory Data Analysis , Data Visualisation, Data Analytics , AI First , Machine Learning & Deep Learning Training & Consulting helping them to take Data Informed Decision at each stage of the business. 

We understand that At the present times , the Entire Industry is in a Tranformation stage with the Softwares  being rebuilt with Artificial Intelligence Capabilities .

We need SMART WORKFORCE for the SMART SOFTWARES to reap the maximum return . 

 

Whatsapp at +91-9953805788 or email at - akriti.lal@instrovate.com if you would like to know more . 

 

  • Corporate Tableau Training in Gurgaon
  • Corporate Data Analytics Training in Gurgaon
  • Corporate Microsoft Power BI Training in Gurgaon
  • Corporate Microstrategy Training in Gurgaon
  • Corporate Google Data Studio Training in Gurgaon
  • Corporate Python Training in Gurgaon
  • Corporate Advance Analytics in R Programming Training in Gurgaon
  • Corporate Machine Learning Training in Gurgaon
  • Corporate Deep Learning Training in Gurgaon
  • Corporate Data Visualization Training in Gurgaon

 

Address

Instrovate Technologies

Noida, Gurgaon

 

akriti.lal@instrovate.com 

Hit Your Refresh Button To Rise Higher
Print Print | Sitemap
All rights reserved @ Instrovate Technologies
GSTIN : 09AAECI6862K1Z1

Terms Of Services    Privacy Policy    Disclaimer     Refund Policy    


Call

E-mail