Wart treatment method selection using AdaBoost with random forests as a weak learner

Selection of wart treatment method using machine learning is being a concern to researchers. Machine learning is expected to select the treatment of warts such as cryotherapy and immunotherapy to patients appropriately. In this study, the data used were cryotherapy and immunotherapy datasets. This study aims to improve the accuracy of wart treatment selection with machine learning. Previously, there are several algorithms have been proposed which were able to provide good accuracy in this case. However, the existing results still need improvement to achieve better level of accuracy so that treatment selection can satisfy the patients. The purpose of this study is to increase the accuracy by improving the performance of weak learner algorithm of ensemble machine learning. AdaBoost is used in this study as a strong learner and Random Forest (RF) is used as a weak learner. Furthermore, stratified 10-fold cross validation is used to evaluate the proposed algorithm. The experimental results show accuracy of 96.6% and 91.1% in cryotherapy and immunotherapy respectively.


Introduction
Warts are caused by the Human papilloma virus which causes the growth of warts in the body area.Commonly, viruses can be transmitted in just a few areas such as hands, feet and genitals.Then, the most common types found in patients are common and plantar.Although, some people also suffered from genital type.This virus can be transmitted to other people through contact with patients, vulnerable immune system, and during the phase of children or adolescents [1][2].Some warts are not felt and do not cause symptoms.However, this type of plantar wart often makes the patient feels pain because this wart is often found on the soles of the feet [1,3].Treatment for wart disease has been applied by many researchers in the medical field.Sometimes warts can be cured with simple treatment but there are also those that require more in-depth treatment [4].
Treatment of warts can give benefits but also side effects in patients.Both can be happened in terms of cost and duration of treatment.Cryotherapy is one of the treatment of warts that uses salicylic acid or liquid nitrogen.Indeed, this treatment is most often recommended for plantar warts because it is more effective but some are also applied to common warts [1,[5][6].On the other hand, immunotherapy treatment can treat warts like common and plantar but this treatment is based on the activation of the patient's immune system to deal with the virus and suppress its activity [3,7].So, medical experts assume that there is no method to select definite therapies for healing [4].
In this case, researchers assumed that the choice of therapy (cryotherapy and immunotherapy) for the treatment of warts (common, plantar, and both) can be developed using machine learning.The reference [8][9][10] proposed a rules-based fuzzy expert system algorithm to process 180 patients who had been collected from the Ghaem dermatology clinic.Indeed, rulesbased fuzzy algorithms provide fairly good accuracy.However, they suggest to develop another machine learning algorithm that is expected to improve the success rate.Because they only obtained accuracy of 80% on cryotherapy and 83.33% on immuno-therapy.The reference [11] proposed random forest as weighting feature and C4.5 as a classifier.Then, the result obtained the level of accuracy of 93.3% on cryotherapy and 84.4% on immunotherapy.Furthermore, the reference [12] aimed to increase the level of accuracy on cryotherapy and immunotherapy datasets by proposing the decision tree algorithm, which is able to improve the accuracy.The result showed accuracy for cryotherapy and immunotherapy of 94,4% and 90%, respectively.The accuracy of the proposed method is better than previous methods.However, the accuracy can be improved by other methods.
The purpose of this study is to improve the accuracy by using boosting-algorithm which is required to boost weak learner classifier [13].Weak learner can be called weak hypothesis that means this classifier produces accuracy with labeled data better than random guessing but may be failed if used directly for classification due to simplicity and limitation of single classifier system.Otherwise, a strong learner is well-correlated with the true labels [14].The weak learner has only produced a weak hypothesis wherein the boosting algorithm can improve a weak hypothesis to provide a good result.Furthermore, the final hypothesis is generated by boosting algorithm that can be influenced from the result of weak learner classifier.It depends on the result of weak learner, if the weak learner provides the good result then boosting algorithm can provide the good result too or even better.Therefore, to get a good result on weak learner then random forest is chosen in this study.Random Forest [15] is well-known strong learner classifier due to algorithm that is combined by two algorithms such as bagging and decision trees.Then, the result is obtained from random forest algorithm, which is able to overcome bias and variance, is very useful to provide a good result on weak learner classifier.The RF algorithm is treated as weak learner in this case.
The purpose of this study is to improve the level of accuracy better than previous research by using AdaBoost algorithm.First, AdaBoost as strong learner is proposed which has a role to improve the performance of the weak learner to become a strong learner and RF is being a weak learner can be combined with AdaBoost.Second, this study also used an algorithm that has been proposed by previous researchers [12] namely boosting trees.However, in this study the boosting-trees algorithm was implemented just to compare the performance of the algorithm proposed in this study.On the other hand, the decision tree was also used for comparison, namely the Classification and Regression Tree (CART) which is similar to ID3.

Datasets
This study used two datasets that are available in the UCI Machine Learning repository [16][17].Datasets are contained 180 samples of patients with common, plantar and both warts which had been collected from the Ghaem hospital dermatology clinic [8][9][10].Furthermore, datasets are divided into two datasets, namely cryotherapy and immunotherapy.
Cryotherapy data contains 90 patients who use cryotherapy treatment with liquid nitrogen.This data consists of six input features and single output to be classified.The explanation of the data about cryotherapy can be seen in Table 1.On the other hand, the immunotherapy data contains 90 patients, which used immunotherapy treatment with candida antigen.This data has the same features as cryotherapy, except one feature added (Induration diameter).The explanation of the dataset about cryotherapy can be seen in Table 2.

Mutual Information
Mutual information is calculated with a random variable between  and .The measurement of information is done between two variables that measure how much the value of uncertainty are.Eq. 1 [18][19] shows mutual information as follows.

AdaBoost
Adaptive boosting is an algorithm introduced by Freund and Schapire [13] in 1995.This algorithm acts to strengthen the performance of the weak learner algorithm so it can create a classifier that is relatively strong.The normalized distribution vector is shown in Eq. 3.
where  is the number of iterations.This algorithm conducts several trainings to generate hypotheses by maintaining the value of the probability distribution   or weight.Furthermore, the weak learner is applied to give a hypothesis ℎ  ∶  ⤍ .Weak learner must produce a weak hypothesis sequence.After that, error of hypothesis ℎ  is calculated with Eq. 4 [13].
Weak hypothesis ℎ are required to have a prediction error less than ½.After that, the parameter  can be used to update the weight (Eq. 5 and Eq. 6) [13,20].
Eq. 6 is an equation to calculate the new weight based previously calculated weight and β in Eq. 5. Finally, Eq. 7 [20] represented the final hypothesis.
The final hypothesis ℎ  is generated from a combination of weak output hypothesis by using major weighted vote.

Random Forest
Random Forest is an algorithm consisting of a series of random trees.Each tree is built based on the training set and random vector (Θ  ).Then, the tree produces a classifier that is structured {ℎ(, Θ  ),  = 1, . . .}, where Θ  as a random vector is distributed and each tree gives a vote for the best class (Eq.8) [15].
I (•) is a function indicator.Margin function serves to ensure the number of votes at X, Y. if mg (X, Y)> 0 then the vote classifier has a high level of accuracy.Conversely, if mg (X, Y) <0 then the vote classifier has a high error rate.To ensure the error level, RF used generalization error (Eq.9) [15].
where subscript ,  indicates that the probability is over ,  space.

Classification and Regression Trees
CART is a classification algorithm that is similar to C4.5 in building decision trees.Furthermore, the decision tree produces prediction based on historical data [21].Classification trees measure prediction errors by misclassification.Conversely, Regression trees measure prediction errors by numerical predicted values [22].
Initially, CART evaluate all possibilities on all input features by choosing one of the best features.Furthermore, the best split of feature is selected and used as the root node.After that, the same process is also applied to determine the node after root until all nodes are in place without overlapping.Usually, the split is applied to CART's algorithm based on Gini Index (Eq.10) [21].

Proposed Method
The purpose of this study is to improve the classification performance in two datasets, namely cryotherapy and immunotherapy.In this case, we proposed several solutions to determine features that affect the accuracy.Features are selected using mutual information which represent how close the feature X is to the target Y.Then, the value of mutual information obtained in both cryotherapy and immunotherapy datasets are sorted from highest to lowest.The details can be seen in Table 3 and Table 4.
where  is the number of classes and   is the probability of the class.The feature selection process is determined by using sequential forward selection (SFS) [23] known as bottom-up search.Initially, the subset of features is started from an empty subset of features.Therefore, the subset of features is selected from a new candidate features with the highest value of mutual information and added one by one to a subset of selected features [24].Furthermore, the selected features are used for machine learning training.The number of features are determined based on the result obtained.To ensure the performance of the algorithm, stratified 10-fold is applied.Cross validation method is very helpful to overcome the problem of bias and variance [25].After that, the RF algorithm is combined with boosting algorithm.The steps are given in Fig. 1.
Performance measures are applied to evaluate the algorithm such as accuracy (Eq.11), sensitivity (Eq.12), specifity (Eq.13) and informedness (Eq.14).The accuracy is helpful to describe the general performance of classifier.The sensitivity is used to measure performance on positive cases and the specificity is used to measere the performance on negative cases.Informedness elaborates both sensitivity and specificity [11,26].

Results and Discussion
In this study, the data used are cryotherapy and immunotherapy.There are six attributes for cryotherapy and seven attributes for immunotherapy with single target in both datasets.Scikit learn (sklearn) [27] is a library that contains a machine learning algorithm with Python language.Later, both datasets will be processed using sklearn library.
Informedness = (Sensitivity + Specificity) − 1 (14) There are two steps conducted.First, we proposed a mutual information method to select features that contribute to the classification.Feature selection is implemented using SFS method.Second, AdaBoost is used to improve the performance of weak learner algorithms.In this case RF is used as a weak learner.
The feature is selected based on experiment.In this study, to find out the number of features (k) experimental work was conducted based on k = 1 to k =4 because only 4 features that have a mutual information value.Then, based on experiment, the best number of k is found with k=4 for cryotherapy and k=3 for immunotherapy.The selected features or attributes can be seen in the first 4 features in Table 3 (cryotherapy) and the first 3 features in Table 4 (immunotherapy).
In this study, stratified 10-fold cross validation is used.The result on cryotherapy dataset by using the AdaBoost (with RF) is TP = 45, TN = 42, FP = 0, FN = 3. AdaBoost (with CART) are TP = 46, TN = 41, FP = 1, FN = 2.Then, the result of AdaBoost (with RF) on immunotherapy are TP = 69, TN = 13, FP = 6, FN = 2 and AdaBoost (with CART) are TP = 68, TN = 13, FP = 6, FN = 3.The accuracy, sensitivity, and specificity can be seen in Table 5.The results obtained show that the algorithms used both AdaBoost (with RF) and AdaBoost (with CART) provide good accuracy results of 96.6%.On the other hand, the AdaBoost (with RF) is able to provide better results from the AdaBoost (with CART) which is 91.1% in immunotherapy.The results obtained in cryotherapy show the same.So, to ensure which is better between AdaBoost (with RF) and AdaBoost (with CART) then informedness is used (Table 6).Table 6 shows that for cryotherapy classification using AdaBoost (with RF) gives better informedness with the difference of 0.003 from AdaBoost (with CART).On the other hand, immunotherapy using AdaBoost (with RF) gives better result with a difference of 0.14 than AdaBoost (with CART).The experimental results using informedness proves that the AdaBoost (with RF) is better than AdaBoost (with CART) for cryotherapy and immunotherapy datasets.Table 7 shows the comparison result between previous studies and our proposed method for cryotherapy.The experimental results on cryotherapy by using AdaBoost (with RF) is better in accuracy and specificity.Then, AdaBoost (with CART) is better in accuracy and sensitivity.Thus, the results show that AdaBoost (with CART) and AdaBoost (with RF) are better than previous studies.However, AdaBoost (with CART) and AdaBoost (with RF) represent the same result of accuracy.Hence, informedness is used.It can be concluded that AdaBoost (with RF) is better than AdaBoost (with CART) and ID3 on cryotherapy.The imbalance class problem make it difficult to achieve the high accuracy classification.Therefore, RF as weak learner was used to overcome the imbalance class problem.Then, the comparison results of immunotherapy show that AdaBoost (with RF) has the best performance, which slightly outperforms ID3 and AdaBoost (with CART).It can be concluded that RF as weak learner can overcome the imbalance class problem on immunotherapy and AdaBoost can boost the weak learner to achieve better performance on both cryotherapy and immunotherapy classification.This classification can be used to select the proper treatment of wart disease.

Conclusion
The experimental works conducted on cryotherapy and immunotherapy datasets show that AdaBoost (with RF) and AdaBoost (with CART) are able to provide higher performance than previous studies.For future work, the classification performance in cryotherapy and immunotherapy datasets can still be improved.In addition, the problem of imbalance class on immunotherapy can be solved using sampling methods.Therefore, future work can try sampling methods such as RUS, ROS, SMOTE, and others to overcome imbalance class problem.

Table 3 .
Mutual information for cryotherapy dataset

Table 4 .
Mutual information for immunotherapy dataset

Table 5 .
The result of classifier performance on two datasets

Table 7 .
Comparison results of cryotherapy

Table 8 .
Comparison Results of Immunotherapy