A common - but incorrect - method of handling missing data is to exclude cases with missing values; this is both inefficient and runs the risk of introducing bias in the analysis. The experimental results show that the proposed method produces better classification accuracy and its complexity is not much different than the complexities of reduced-error pruning and minimum-error pruning approaches.
This study investigated the influence of pruning on the accuracy and tree size. Funding: The authors received no specific funding for this work. The study revealed that when the probability was estimated by Laplace correction at leaves level, all pruning methods were improved [ 16 ].
Like stepwise variable selection in regression analysis, decision tree methods can be used to select the most relevant input variables that should be used to form decision tree models, which can subsequently be used to formulate clinical hypotheses and inform subsequent research.
During the data classification process, some branches of the decision tree may contain noise or outliers in the training data and these results in a complex tree which is difficult to understand.
Both discrete and continuous variables can be used either as target variables or independent variables. Several types of research that the details will be discussed in Related Works section, have been conducted in the literature to store and manipulate this valuable data for further decision making.
While post-pruning algorithms estimate the misclassification errors at each decision node, PBMR method estimates the risk-rate of a node and its leaf and then propagates this error up the tree instead of estimating the misclassification errors.
In pre-pruning, pruning is implemented during the tree building process and tries to stop the process when over-fitting is encountered. Figures Abstract Pruning is applied in order to combat over-fitting problem where the tree is pruned back with the goal of identifying decision tree with the lowest error rate on previously unobserved instances, breaking ties in favour of smaller trees with high accuracy.
Proposed algorithm adopts a post-pruning bottom-up method for C4. Although the research in [ 14 ] performed an empirical comparison for five pruning methods, the experiment results showed that the methods such as critical-value pruning, error complexity pruning and reduced-error pruning outperformed the pessimistic-error pruning and minimum-error pruning in terms of the tree size and accuracy.