11 Apr 2021

Pandas - Format DataFrame numbers with commas and control decimal places

I’m still playing around with the UK’s COVID-19 vaccination data and in this blog post we’ll learn how to format a DataFrame that contains a mix of string and numeric values.

Note	On 10th November 2022 I created a video that covers the same content as this blog post. Let me know if it’s helpful 😊

We’ll be using Pandas' styling functionality, which generates CSS and HTML, so if you want to follow along you’ll need to install Pandas and Jupyter:

BASH pip install pandas jupyter

Next, launch Jupyter and create a notebook:

BASH jupyter notebook

Let’s start by importing Pandas:

PYTHON import pandas as pd

And now we’ll create a DataFrame containing the data that we want to format:

PYTHON df = pd.DataFrame({
    "LTLA Name": ["Amber Valley", "Ashfield", "Bassetlaw"],
    "Population": [72179, 77988, 70832],
    "PercentageVaccinated": [0.228571, 0.223342, 0.184493]
})

Table 1. Results
	LTLA Name	Population	PercentageVaccinated
0	Amber Valley	72179	0.228571
1	Ashfield	77988	0.223342
2	Bassetlaw	70832	0.184493

One way to do this is to format the values in place, as shown below:

PYTHON df.loc[:, "Population"] = df["Population"].map('{:,d}'.format)
df.loc[:, "PercentageVaccinated"] = df["PercentageVaccinated"].map('{:.2f}'.format)

After this transformation, the DataFrame looks like this:

Table 2. Results
	LTLA Name	Population	PercentageVaccinated
0	Amber Valley	72,179	0.23
1	Ashfield	77,988	0.22
2	Bassetlaw	70,832	0.18

This works, but it changes the underlying values in the DataFrame to be objects, which we can see by calling the dtypes function:

PYTHON df.dtypes

Output

LTLA Name               object
Population              object
PercentageVaccinated    object
dtype: object

This means that we can’t do any number based computations anymore. For example, if we try to multiple the Population and PercentageVaccinated columns:

PYTHON df.Population * df.PercentageVaccinated

Output

Traceback (most recent call last):
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 142, in _na_arithmetic_op
    result = expressions.evaluate(op, left, right)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/computation/expressions.py", line 235, in evaluate
    return _evaluate(op, op_str, a, b)  # type: ignore[misc]
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/computation/expressions.py", line 69, in _evaluate_standard
    return op(a, b)
TypeError: can't multiply sequence by non-int of type 'str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/ops/common.py", line 65, in new_method
    return method(self, other)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/arraylike.py", line 105, in __mul__
    return self._arith_method(other, operator.mul)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/series.py", line 4998, in _arith_method
    result = ops.arithmetic_op(lvalues, rvalues, op)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 189, in arithmetic_op
    res_values = _na_arithmetic_op(lvalues, rvalues, op)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 149, in _na_arithmetic_op
    result = _masked_arith_op(left, right, op)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 91, in _masked_arith_op
    result[mask] = op(xrav[mask], yrav[mask])
TypeError: can't multiply sequence by non-int of type 'str'

So if we want to use the underlying data for anything else, formatting like this isn’t a good solution.

Instead we can use Pandas styling functionality. The documentation is a bit overwhelming, but Chris Moffitt has a great introductory article on the styling API.

If we want to apply the same formatting to every column, we can pass a style to style.format. e.g. we could restrict every column to 2 decimal places, as shown below:

PYTHON df.style.format("{.2f")

Note	For a description of valid format values, see the Format Specification Mini-Language documentation or Python String Format Cookbook.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/io/formats/style.py in _repr_html_(self)
    203         Hooks into Jupyter notebook rich display system.
    204         """
--> 205         return self.render()
    206
    207     @doc(

~/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/io/formats/style.py in render(self, **kwargs)
    619         self._compute()
    620         # TODO: namespace all the pandas keys
--> 621         d = self._translate()
    622         # filter out empty styles, every cell will have a class
    623         # but the list of props may just be [['', '']].

~/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/io/formats/style.py in _translate(self)
    403                     "value": value,
    404                     "class": " ".join(cs),
--> 405                     "display_value": formatter(value),
    406                     "is_visible": (c not in hidden_columns),
    407                 }

~/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/io/formats/style.py in <lambda>(x)
   1715 ) -> Callable:
   1716     if isinstance(formatter, str):
-> 1717         formatter_func = lambda x: formatter.format(x)
   1718     elif callable(formatter):
   1719         formatter_func = formatter

ValueError: Unknown format code 'f' for object of type 'str'

That doesn’t work because the LTLA Name column contains string values, which can’t be formatted as a number. We can work around that problem by dropping the LTLA Name column:

df.drop(["LTLA Name"], axis=1).style.format("{:.2f}")

Table 3. Results
	Population	PercentageVaccinated
0	72179.00	0.23
1	77988.00	0.22
2	70832.00	0.18

This works, but we’ve lost the LTLA Name column and the Population column isn’t formatted how we’d like. Instead of passing a single style to style.format, we can instead pass a dictionary of {"column: "style"}. So to style Population with a comma as thousands separator and PercentageVaccinated with two decimal places, we can do the following:

df.style.format({
    "Population": "{:,d}",
    "PercentageVaccinated": "{:.2f}"
})

Table 4. Results
	LTLA Name	Population	PercentageVaccinated
0	Amber Valley	72,179	0.23
1	Ashfield	77,988	0.22
2	Bassetlaw	70,832	0.18

And if we go one step further, we can also use the hide_index function to get rid of the index column:

df.style.format({
    "Population": "{:,d}",
    "PercentageVaccinated": "{:.2f}"
}).hide_index()

Table 5. Results
LTLA Name	Population	PercentageVaccinated
Amber Valley	72,179	0.23
Ashfield	77,988	0.22
Bassetlaw	70,832	0.18

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.