r - Error in model.frame.default for Predict() - "Factor has new levels" - For a Char Variable -


i have dataset split test/train datasets. following split produced logistic model with:

logmodel1 = glm(y ~ . -var1 -var2 -var3, data=train, family=binomial) 

if use model make predictions on same train set, no error (though of course not-super-useful test of model). used code below predict on test set:

predictlog1 <- predict(logmodel1, type="response", newdata=test) 

but following error:

error in model.frame.default(terms, newdata, na.action = na.action, xlev = object$xlevels) : factor mycharvar has new levels observation of mycharvar, another...

here's what's got me particularly confused:

  • mycharvar character variable in both train , test sets. i've confirmed str(test$mycharvar) , str(train$mycharvar)
  • my model not use mycharvar part of prediction.

i found explanation bullet 2 @ link: "factor has new levels" error variable i'm not using

and suggestion there remove character variables altogether train , test sets has provided me workaround @ least i'm not held up. seems pretty inelegant, opposed removing them model "-mycharvar". if understands why character variable in test set throw "factor has new levels" error i'd interested.

the person answered question in post linked gave indication on why mycharvar still considered in model. when use z~.-y, formula expands z~(x+y)-y.

now, answer other question: consider following quote predict() documentation: "for factor variables having numeric levels, can specify numeric values in newdata without first converting variables factors. these numeric values checked make sure match level, variable converted internally factor".

i think can assume same kind of behaviour occurs mycharvar. mycharvar values first checked against corresponding existing levels in model , goes wrong. testset contains values mycharvar never encountered during training of model (note glm function performs factor conversion. throws warning when conversion needs take place). in summary, error means model unable make predictions unknown levels in testdata never encountered during training of model.

in this post there clarification given on issue.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -