Pandas: Find rows where column/field is null
In my continued playing around with the Kaggle house prices dataset I wanted to find any columns/fields that have null values in.
If we want to get a count of the number of null fields by column we can use the following code, adapted from Poonam Ligade’s kernel:
Prerequisites
import pandas as pd
Count the null columns
train = pd.read_csv("train.csv")
null_columns=train.columns[train.isnull().any()]
train[null_columns].isnull().sum()
LotFrontage 259
Alley 1369
MasVnrType 8
MasVnrArea 8
BsmtQual 37
BsmtCond 37
BsmtExposure 38
BsmtFinType1 37
BsmtFinType2 38
Electrical 1
FireplaceQu 690
GarageType 81
GarageYrBlt 81
GarageFinish 81
GarageQual 81
GarageCond 81
PoolQC 1453
Fence 1179
MiscFeature 1406
dtype: int64
So there are lots of different columns containing null values. What if we want to find the solitary row which has 'Electrical' as null?
Single column is null
print(train[train["Electrical"].isnull()][null_columns])
LotFrontage Alley MasVnrType MasVnrArea BsmtQual BsmtCond BsmtExposure \
1379 73.0 NaN None 0.0 Gd TA No
BsmtFinType1 BsmtFinType2 Electrical FireplaceQu GarageType GarageYrBlt \
1379 Unf Unf NaN NaN BuiltIn 2007.0
GarageFinish GarageQual GarageCond PoolQC Fence MiscFeature
1379 Fin TA TA NaN NaN NaN
And what if we want to return every row that contains at least one null value? That’s not too difficult - it’s just a combination of the code in the previous two sections:
All null columns
print(train[train.isnull().any(axis=1)][null_columns].head())
LotFrontage Alley MasVnrType MasVnrArea BsmtQual BsmtCond BsmtExposure \
0 65.0 NaN BrkFace 196.0 Gd TA No
1 80.0 NaN None 0.0 Gd TA Gd
2 68.0 NaN BrkFace 162.0 Gd TA Mn
3 60.0 NaN None 0.0 TA Gd No
4 84.0 NaN BrkFace 350.0 Gd TA Av
BsmtFinType1 BsmtFinType2 Electrical FireplaceQu GarageType GarageYrBlt \
0 GLQ Unf SBrkr NaN Attchd 2003.0
1 ALQ Unf SBrkr TA Attchd 1976.0
2 GLQ Unf SBrkr TA Attchd 2001.0
3 ALQ Unf SBrkr Gd Detchd 1998.0
4 GLQ Unf SBrkr TA Attchd 2000.0
GarageFinish GarageQual GarageCond PoolQC Fence MiscFeature
0 RFn TA TA NaN NaN NaN
1 RFn TA TA NaN NaN NaN
2 RFn TA TA NaN NaN NaN
3 Unf TA TA NaN NaN NaN
4 RFn TA TA NaN NaN NaN
Hope that helps future Mark!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.