machine learning - dealing with the missing value when using C4.5 technique -
i'm trying build classifier "model" using classification techniques. beginning c4.5 technique, faced problem of missing values so:
how deal missing values exist in data-set ?
should have stay on "?" in missing attribute ?
there several ways of dealing missing values:
- get missing data: if possible, try acquire missing values.
- discard missing data: reduce data available dataset having no missing values discarding instances missing values or features.
- imputation: better strategy impute missing values, i.e., infer them known part of data. common approach use mean, median or frequent value of row or column in missing values located. recommended use multiple imputations.
this might help: http://jmlr.csail.mit.edu/papers/volume8/saar-tsechansky07a/saar-tsechansky07a.pdf
Comments
Post a Comment