# R: Error in approxfun(x.values.1, y.values.1, method = "constant", f = 1, : zero non-NA points

I've been following Michy Alice's logistic regression tutorial to create an attendance model for London dev meetups and ran into an interesting problem while doing so.

Our dataset has a class imbalance i.e. most people RSVP 'no' to events which can lead to misleading accuracy score where predicting 'no' every time would lead to supposed high accuracy.

```
Source: local data frame [2 x 2]
attended n
(dbl) (int)
1 0 1541
2 1 53
```

I sampled the data using caret's upSample function to avoid this:

```
attended = as.factor((df %>% dplyr::select(attended))$attended)
upSampledDf = upSample(df %>% dplyr::select(-attended), attended)
upSampledDf$attended = as.numeric(as.character(upSampledDf$Class))
```

I then trained a logistic regression model but when I tried to plot the area under the curve I ran into trouble:

```
p <- predict(model, newdata=test, type="response")
pr <- prediction(p, test$attended)
prf <- performance(pr, measure = "tpr", x.measure = "fpr")
Error in approxfun(x.values.1, y.values.1, method = "constant", f = 1, :
zero non-NA points
```

I don't have any NA values in my data frame so this message was a bit confusing to start with. As usual Stack Overflow came to the rescue with the suggestion that I was probably missing positive/negative values for the independent variable i.e. 'approved'.

A quick count on the test data frame using dplyr confirmed my mistake:

```
> test %>% count(attended)
Source: local data frame [1 x 2]
attended n
(dbl) (int)
1 1 582
```

I'll have to randomly sort the data frame and then reassign my training and test data frames to work around it.