matplotlib - Create a histogram/bar chart for ratings/full numbers
In my continued work with matplotlib I wanted to plot a histogram (or bar chart) for a bunch of star ratings to see how they were distributed.
Before we do anything let’s import matplotlib as well as pandas:
import random
import pandas as pd
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
Next we’ll create an array of randomly chosen star ratings between 1 and 5:
stars = pd.Series([random.randint(1, 5) for _ in range(0, 100)])
We want to plot a histogram showing the proportion for each rating. The following code will plot a chart and store it in an SVG file:
_, ax1 = plt.subplots()
ax1.hist(stars, 5)
plt.tight_layout()
plt.savefig("/tmp/hist.svg")
plt.close()
This is what the chart looks like:
This is ok, but the labels on the x axis are a bit weird - the value for each rating doesn’t align with the corresponding bar. I came across this StackOverflow post, which shows how to solve this problem by using a bar chart instead. I ended up with this code:
_, ax2 = plt.subplots()
stars_histogram = stars.value_counts().sort_index()
stars_histogram /= float(stars_histogram.sum())
stars_histogram *= 100
stars_histogram.plot(kind="bar", width=1.0)
plt.tight_layout()
plt.savefig("/tmp/bar.svg")
plt.close()
This is what the chart looks like now:
Much better!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.