· r-2

# R: Filter a data frame based on values in two columns

In the most recent assignment of the Computing for Data Analysis course we had to filter a data frame which contained N/A values in two columns to only return rows which had no N/A's.

I started with a data frame that looked like this:

``````
> # we'll just use a few rows to make it easier to see what's going on
> data[2494:2500,]
Date sulfate nitrate ID
2494 2007-10-30    3.25   0.902  2
2495 2007-10-31      NA      NA  2
2496 2007-11-01      NA      NA  2
2497 2007-11-02    6.56   1.270  2
2498 2007-11-03      NA      NA  2
2499 2007-11-04      NA      NA  2
2500 2007-11-05    6.10   0.772  2
``````

We want to only return the rows which have a value in both the 'sulfate' and the 'nitrate' columns.

I initially tried to use the Filter function but wasn't very successful:

``````
> smallData <- data[2494:2500,]
> Filter(function(x) !is.na(x\$sulfate), smallData)
Error in x\$sulfate : \$ operator is invalid for atomic vectors
``````

I'm not sure that Filter is designed to filter data frames - it seems more appropriate for lists or vectors - so I ended up filtering the data frame using what I think is called an extract operation:

``````
> smallData[!is.na(smallData\$sulfate) & !is.na(smallData\$nitrate),]
Date sulfate nitrate ID
2494 2007-10-30    3.25   0.902  2
2497 2007-11-02    6.56   1.270  2
2500 2007-11-05    6.10   0.772  2
``````

The code inside the square brackets returns a collection indicating whether or not we should return each row:

``````
> !is.na(smallData\$sulfate) & !is.na(smallData\$nitrate)
[1]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE
``````

which is equivalent to doing this:

``````
> smallData[c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE),]
Date sulfate nitrate ID
2494 2007-10-30    3.25   0.902  2
2497 2007-11-02    6.56   1.270  2
2500 2007-11-05    6.10   0.772  2
``````

We put a comma after the list of true/false values to indicate that we want to return all the columns otherwise we'd get this error:

``````
> smallData[c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE)]
Error in `[.data.frame`(smallData, c(TRUE, FALSE, FALSE, TRUE, FALSE,  :
undefined columns selected
``````

We could filter the columns as well if we wanted to:

``````
> smallData[c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE), c(1,2)]
Date sulfate
2494 2007-10-30    3.25
2497 2007-11-02    6.56
2500 2007-11-05    6.10
``````

As is no doubt obvious, I don't know much R so if there's a better way to do anything please let me know.

The full code is on github if you're interested in seeing it in context.