Handling of missing data in the predictor variables when using Tree-based techniques for training and generating predictions.
- Authors
- Cevallos Valdiviezo, Holger
- Format
- MasterThesis
- Status
- publishedVersion
- Description
Having missing data is a common issue in real datasets. In the prediction context, handling this problem incorrectly can lead to problems such as biased error estimates and bad prediction results. In our study, we want to focus on missing data present in the feature variables. To tackle this problem, we compare ten techniques that deal with missing values, either by themselves or through the implementation of an imputation method, and which eventually use Classification and Regression trees (CART) or Random Forest (RF) to generate predictions.
- Publication Year
- 2012
- Language
- eng
- Topic
- ALGORITHMS
BAGGING
MULTIVARIATE IMPUTATION
REGRESSION TREES
- Repository
- Repositorio SENESCYT
- Rights
- openAccess
- License
- openAccess