Python/pandas: Column value in list (ValueError: The truth value of a Series is ambiguous.)
I’ve been using Python’s pandas library while exploring some CSV files and although for the most part I’ve found it intuitive to use, I had trouble filtering a data frame based on checking whether a column value was in a list.
A subset of one of the CSV files I’ve been working with looks like this:
$ cat foo.csv
"Foo"
1
2
3
4
5
6
7
8
9
10
Loading it into a pandas data frame is reasonably simple:
import pandas as pd
df = pd.read_csv('foo.csv', index_col=False, header=0)
>>> df
Foo
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
If we want to find the rows which have a value of 1 we’d write the following:
>>> df[df["Foo"] == 1]
Foo
0 1
Finding the rows with a value less than 7 is as you’d expect too:
>>> df[df["Foo"] < 7]
Foo
0 1
1 2
2 3
3 4
4 5
5 6
Next I wanted to filter out the rows containing odd numbers which I initially tried to do like this:
odds = [i for i in range(1,10) if i % 2 <> 0]
>>> odds
[1, 3, 5, 7, 9]
>>> df[df["Foo"] in odds]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/pandas/core/generic.py", line 698, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Unfortunately that doesn’t work and I couldn’t get any of the suggestions from the error message to work either. Luckily pandas has a special http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.isin.html function for this use case which we can call like this:
>>> df[df["Foo"].isin(odds)]
Foo
0 1
2 3
4 5
6 7
8 9
Much better!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.