Handling of missing data in the predictor variables when using Tree-based techniques for training and generating predictions.

 

Authors
Cevallos Valdiviezo, Holger
Format
MasterThesis
Status
publishedVersion
Description

Having missing data is a common issue in real datasets. In the prediction context, handling this problem incorrectly can lead to problems such as biased error estimates and bad prediction results. In our study, we want to focus on missing data present in the feature variables. To tackle this problem, we compare ten techniques that deal with missing values, either by themselves or through the implementation of an imputation method, and which eventually use Classification and Regression trees (CART) or Random Forest (RF) to generate predictions.

Publication Year
2012
Language
eng
Topic
ALGORITHMS
BAGGING
MULTIVARIATE IMPUTATION
REGRESSION TREES
Repository
Repositorio SENESCYT
Get full text
http://repositorio.educacionsuperior.gob.ec/handle/28000/287
Rights
openAccess
License
openAccess