Plotting histogram with percentages in ggplot2

Question

I am trying to plot a histogram using ggplot2 with percentage on the y-axis and numerical values on the x-axis.

A sample of my data and script looks like this (below) and goes on for about 100,000 rows (or more).

A    B
0.2  x
1    y
0.995    x
0.5  x
0.5  x
0.2  y

ggplot(data, aes(A, colour=B)) + geom_bar() +stat_bin(breaks=seq(0,1, by=0.05)) + scale_y_continuous(labels = percent)

I want to know the percentage of B values distributed in each bin of A value, instead of the number of B values per A value.

The code as it is now gives me a y-axis with ymax as 15000. The y-axis is supposed to be in percentages (0-100).

Henrik · Accepted Answer · 2013-09-19 18:08:12Z

2

Is this what you want? I assume your data frame is called df:

# calculate proportions of B for each level of A
df2 <- as.data.frame(with(df, prop.table(table(A, B))))
df2
#       A B      Freq
# 1   0.2 x 0.1666667
# 2   0.5 x 0.3333333
# 3 0.995 x 0.1666667
# 4     1 x 0.0000000
# 5   0.2 y 0.1666667
# 6   0.5 y 0.0000000
# 7 0.995 y 0.0000000
# 8     1 y 0.1666667

ggplot(data = df2, aes(x = A, y = Freq, fill = B)) +
geom_bar(stat = "identity", position = position_dodge())

enter image description here

answered Sep 19, 2013 at 18:08

Henrik

68k15 gold badges153 silver badges166 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mengll Over a year ago

Yes! However, when I try to add a frequency column using the first line, my data gets shortened and some values of B are missing.

Henrik Over a year ago

@Mengll, sorry, but I don't quite understand what you mean. The table of frequencies, that is converted to a data frame, is an aggregated version of your original data frame, so yes your data will be "shortened". Say you have 500 lines of y = 0.5. These will boil down to a single line of a percentage of y in 'bin' 0.5.

Mengll Over a year ago

I did not understand that, but it makes sense now. My resulting plot looks strange, but that's probably because of my own dataset. Thank you!

Collectives™ on Stack Overflow

Plotting histogram with percentages in ggplot2

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related