Histogram in R using ggplot2
A histogram is an approximate representation of the distribution of numerical data. In a histogram, each bar groups numbers into ranges. Taller bars show that more data falls in that range. It is used to display the shape and spread of continuous sample data.
Plotting Histogram using ggplot2 in R
We can use the ggplot2 library in R to plot an histogram. The geom_histogram() function is an in-built function of the ggplot2 module.
1. Creating Sample Data
We’re setting a seed for reproducibility and creating a data frame with simulated income data for two groups: Average Female income and Average Male income. Each group has 20,000 values generated from a normal distribution.
set.seed(123)
df <- data.frame(
gender=factor(rep(c(
"Average Female income ", "Average Male incmome"), each=20000)),
Average_income=round(c(rnorm(20000, mean=15500, sd=500),
rnorm(20000, mean=17500, sd=600)))
)
head(df)
Output :

2. Ploting a Histogram
We’re loading the ggplot2
package and creating a histogram of the Average_income
variable from the data frame using ggplot()
. This helps visualize the distribution of income values across both groups.
install.packages("ggplot2")
library(ggplot2)
ggplot(df, aes(x=Average_income)) + geom_histogram()
Output:

Customize the Histogram
There are several customizations that can be made to a histogram as per the needs.
1. Changing the border color of the Histogram
The color argument within color in this modified code is set to "black" to indicate the border color of the histogram bars.
ggplot(df, aes(x = Average_income)) +
geom_histogram(color = "black", fill = "steelblue") +
labs(x = "Average Income", y = "Frequency") +
ggtitle("Histogram of Average Income") +
theme_minimal()
Output:

2. Changing the Bin Width of Histogram
We’re using ggplot()
to plot a histogram of Average_income
, setting binwidth = 1
to create more detailed income intervals. This gives a clearer view of how the income values are distributed.
ggplot(df, aes(x=Average_income)) +
geom_histogram(binwidth=1)
Output:

3. Changing colors of the Histogram
We’re creating a histogram of Average_income
with white borders and a red fill using ggplot()
. This enhances the visual contrast and makes the distribution easier to interpret.
plot <- ggplot(df, aes(x=Average_income)) +
geom_histogram(color="white", fill="red")
plot
Output:

4. Add Descriptive Statistics to Histogram Using geom_vline()
We are creating a histogram of Average_income
by gender with overlapping bars, customizing the bin width and transparency. We add vertical dashed and dotted lines for the mean and median using geom_vline()
, and customize colors with scale_fill_manual()
and scale_color_manual()
. The plot is simplified with theme_minimal()
, and the title, labels, and legend position are adjusted for clarity.
histogram_plot <- ggplot(df, aes(x = Average_income, fill = gender)) +
geom_histogram(binwidth = 500, position = "identity", alpha = 0.7) +
geom_vline(aes(xintercept = mean(Average_income, na.rm = TRUE), color = gender),
linetype = "dashed", size = 1) +
geom_vline(aes(xintercept = median(Average_income, na.rm = TRUE), color = gender),
linetype = "dotted", size = 1) +
scale_fill_manual(values = c("blue", "green")) +
scale_color_manual(values = c("red", "black")) +
theme_minimal() +
ggtitle("Distribution of Average Income by Gender") +
xlab("Average Income") +
ylab("Frequency") +
theme(legend.position = "top")
print(histogram_plot)
Output:

5. Plotting Probability Densities of Histogram
We are creating a histogram with a density plot overlay to visualize the distribution of Average_income
. We use geom_histogram()
to create the bars, with density values on the y-axis, and add a vertical dashed line for the mean using geom_vline()
. A density curve is added with geom_density()
to highlight the overall distribution shape. We customize the plot with a title, axis labels, and apply a minimal theme.
ggplot(df, aes(x = Average_income, y = after_stat(density))) +
geom_histogram(aes(y = after_stat(density)), bins = 30, fill = "lightblue",
color = "black", alpha = 0.7) +
geom_vline(aes(xintercept = mean(Average_income, na.rm = TRUE)), color = "red",
linetype = "dashed", size = 1.5) +
geom_density(color = "black", size = 1.5, alpha = 0.5) +
ggtitle("Distribution of Home Prices") +
xlab("Price") +
ylab("Density") +
theme_minimal()
Output:

6. Plotting Histogram Based on Groups
We are creating a histogram of Sepal.Length
from the iris
dataset, with colors based on the Species
column. The bars are outlined in black with a transparency of 0.7, and we use scale_fill_manual()
to customize the color palette for each species. The plot includes a title, axis labels, and uses a minimal theme.
ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_histogram(bins = 30, color = "black", alpha = 0.7) +
ggtitle("Distribution of Sepal Length by Species") +
xlab("Sepal Length") +
ylab("Frequency") +
scale_fill_manual(values = c("blue", "pink", "red")) +
theme_minimal()
Output:

Alternative: Plotting Histogram of Sepal Length (Faceted by Species)
We are creating a histogram of Sepal.Length
from the iris
dataset, with colors based on the Species
column. The plot is faceted by Species
, allowing each species to have its own histogram with free scales. We customize the labels and apply a minimal theme
ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_histogram(bins = 30, color = "black", alpha = 0.7) +
facet_wrap(~Species, scales = "free") +
ggtitle("Histogram of Sepal Length by Species") +
xlab("Sepal Length") +
ylab("Frequency") +
theme_minimal()
Output:

In this article, we explored how to create histograms in R using the ggplot2
package, covering basic plotting, customization, and enhancements to effectively visualize data distributions.