Ensemble Methods In Machine Learning Pdf 89540 | 20 Item Download 2022-09-15 18-19-11

Partial capture of text on file.
          Review for Ensemble Methods in Machine Learning, Thomas G. Dietterich 
          Summary 
          Ensemble learning is method of combining a set of classifiers’ decision somehow in the sake 
          of more accurate pronouncement. The criterions for ensemble methods work better than any 
          individual combined of it are each individual hypothesis has to be accurate, at least 50% 
          accurate, and to be diverse, therefore the error made by any classifier is uncommon within all 
          of them, and so the majority vote will be able to correct this error. 
           
          Three essences that make the ensemble method appearance: 1.Statisical- the finite number of 
          samples cause the learning algorithm unable to solve some uncertainties to generate a concise 
          hypothesis but rather a number of potential equally good hypotheses, choose the vote answer 
          from a combination could avoid the risk of selecting one from a bad hypothesis.  2. 
          Computational- many learning algorithm will be stuck in the local optima, a multiply search 
          path in the hypotheses space may somewhat increase the chance of finding the global optima.  
          3. Representational- a hypothesis is limited by the knowledge representation of the learning 
          algorithm, and the weighted combination of hypotheses mays extend the representative power.  
           
          The author illustrates different methods for assembling ensembles, includes: Enumerating the 
          hypotheses-syndicate the possible hypothesis to make a final decision, Manipulating Training 
          examples- resampling the training to generate multiply hypotheses, Manipulating Input 
          Features-selecting different subset for multiply training, Manipulating the Output Targets-
          creating multiply hypotheses with regard to the different grouped targets, Injecting 
          Randomness- adding randomness into learning algorithms.  
           
          The comparisons of the performance of C4.5, adaboost, bagging, and randomized tree 
          ensemble method are shown. And the result explain that when the problem is not so 
          complexity then the 3 reasons for ensemble is absent, therefore a single classifier can handle 
          very well, otherwise the ensemble method could provide better results. In general, adaboost 
          has best performance when the training set contains little noise, otherwise over-fit the noise, 
          the author discussed that the nature of adaboost that aggressively extend the margin of the 
          coverage should be easily overfitting, but the stage-wise prevents this happen more often.   
          	
  
          Critic 
          This article is a survey more than a research paper, although it shows some experiment results 
          regard to the performance of different ensemble methods. A survey is a paper that provides 
          the new coming some helpful information of the particular topic. This article is presented 
          nicely in a reasonable layout that will enhance its readability and informative. In the 
          introduction, it explains what ensemble methods in machine learning are and how it may 
          work. Then the three fundamental reasons shows the motivations of the ensemble which 
          indicates the problems in most of machine learning, and hence increase the importance of 
          ensemble method and attract the audiences to further reading. Then the methods of 
          constructing the ensemble are illustrated therefore it provides the reader the information of the 
          research achievements of ensemble method, and the information has practical usefulness. 
          Then the comparisons of ensemble methods indicate the limitation and advantages of 
          different kind of ensemble methods.  For this well-formed structure, the readers can have a 
          more concrete understanding of ensemble learning. 
          	
  
          Review for Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy 
          and Cost, Vikas C. Raykar, Balaji Krishnapuram, Shipeng Yu 
          Summary 
          This research paper proposed a new method for training cascades of classifiers called soft 
          cascades in contracted to traditional cascades. It stated that the conventional method has 3 
          problems that can be solved by using the proposed method: Joint training of all stages-a 
          cascade is generally train sequentially, but for a soft cascade, it is available to train once, and 
          the thresholds for each classifier can be trained as a post-processing step; Tradeoff between 
          accuracy and cost – traditional cascade classifiers have no explicitly concerns about the 
          accuracy and the cost, but this method can be used to stress different needs. Computation cost 
          of training- the post-processing step for adjusting thresholds could reduce the computational, 
          but a hard cascade has to be retrained for every new thresholds.  
           
          In this paper, section 2 gives basis information of a cascade of classifier, and then the keys of 
          soft cascade are shown:  a soft cascade rejects instances based on the posterior class 
          probability evidenced by the classifier for that stage, and the positive instance could only be 
          classified after it passing through all the stages. As a soft cascade only trains once, the 
          optimization of all stages at the same time requires that each stage emphasise different types 
          of false positive in order to optimize the accuracy of the whole cascade. 
           
          Then the writer showed the method for training the cascade, the training process is majorly 
          involving of finding the maximum likelihood estimate for the parameters of linear classifier. 
          To provide a better estimation, the maximum a-posteriori is used. In order to address the cost, 
          a parameter for the expected cost is added to the maximum a-posteriori equation. Similarly a 
          parameter for the accuracy is also inserted.  
           
          To prove their novel method is more efficient, the writers conduct several experiments with 
          medical datasets, which typically have high cost for feature acquisition. And the results show 
          that the accuracy of soft cascade is generally little lower than the best one, but it can 
          dramatically reduce the feature acquisition cost, in hundreds times. 
          	
  
          Critic 
          There are several issues that would decrease the readability and comprehensibility of the 
          paper: 1. the term soft cascade was not explained in the context of its first occurrence, the 
          reader has to read several times back and forth, which could arise the difficulty of 
          understanding.  2. The authors claim that the computational cost problem of hard cascade 
          could be solved by the proposed method, which may not be necessary. The proposed in order 
          to optimization all stages simultaneously, this could be required more complex computation, 
          and the post-processing step for computing the thresholds does not occur in hard cascade, so 
          the summation of these could excess the computation cost of a hard cascade.  
           
          I assume that the datasets have little noises, because the Adaboost is very sensitive to noise, 
          but the results show that it can achieve a high performance in these datasets. It is nice to 
          provide a noisy dataset to prove that the accuracy and cost tradeoff mechanism could handle 
          well in such situations, because in many causes the accuracy is heavily affected by noises, so 
          a few tradeoff from accuracy to cost can result huge decreasing. 
           
          Although some issues exist, this paper is informative. The experimental datasets chosen from 
          a field that can emphasize the cost give the paper a better persuasion towards its importance.
                                  Tradeoff between Machine Learning and Pattern Recognition 
                                  Before	
  discussing	
  the	
  tradeoff,	
  what	
  is	
  the	
  difference	
  between	
  Machine	
  learning	
  and	
  Pattern	
  
                                  Recognition	
  has	
  to	
  be	
  identified.	
  “Pattern	
  recognition	
  has	
  its	
  origins	
  in	
  engineering,	
  whereas	
  
                                  machine	
  learning	
  grew	
  out	
  of	
  computer	
  science.	
  However,	
  these	
  activities	
  can	
  be	
  viewed	
  as	
  
                                  two	
  facets	
  of	
  the	
  same	
  field”,	
  Christopher-‐M-‐Bishop	
  [textbook].	
  
                                  	
  
                                                                                        Figure	
  1.	
  	
  Artificial	
  Intelligence	
                   	
  
                                                                                                                 	
  
                                  From	
  the	
  figure,	
  we	
  can	
  see	
  that	
  pattern	
  recognition	
  is	
  subfield	
  of	
  AI	
  that	
  applies	
  machine	
  
                                  learning	
  and	
  statistics	
  methodology	
  to	
  solve	
  the	
  problems	
  of	
  finding	
  hidden	
  patterns	
  in	
  the	
  
                                  targets.	
  	
  It	
  generally	
  has	
  broader	
  applications	
  than	
  machine	
  learning.	
  	
  
                                  	
  
                                  Wikipedia	
  describes	
  the	
  pattern	
  recognition	
  is	
  based	
  on	
  the	
  probability	
  theory;	
  therefore	
  
                                  most	
  of	
  its	
  pattern	
  recognition	
  algorithms	
  has	
  the	
  probabilistic	
  nature.	
  Other	
  algorithms	
  
                                  from	
  machine	
  learning’s	
  outcome	
  are	
  deterministic.	
  
                                  	
  
                                  Probabilistic	
  based	
  Pattern	
  recognition	
  algorithms	
  can	
  output	
  result	
  with	
  an	
  associated	
  
                                  confidence	
  value	
  that	
  are	
  mathematically	
  grounded	
  by	
  probability	
  theory,	
  and	
  this	
  value	
  can	
  
                                  also	
  be	
  used	
  by	
  a	
  different	
  probability	
  theory	
  based	
  algorithms.	
  Sometimes,	
  when	
  it	
  has	
  a	
  
                                  confidence	
  value	
  under	
  some	
  thresholds,	
  it	
  could	
  decline	
  to	
  provide	
  a	
  valid	
  output.	
  In	
  
                                  contrast,	
  general	
  machine	
  learning	
  algorithm	
  would	
  still	
  provide	
  the	
  “best”	
  decision,	
  no	
  
                                  matter	
  the	
  fact	
  that	
  it	
  may	
  be	
  a	
  decision	
  little	
  better	
  than	
  the	
  worst	
  assumption.	
  Because	
  it	
  
                                  is	
  probabilistic-‐based,	
  it	
  can	
  naturally	
  tackle	
  the	
  problems	
  of	
  uncertainty	
  propagation	
  better,	
  
                                  especially	
  for	
  large	
  tasks	
  contain	
  lots	
  of	
  uncertainties.	
  But	
  as	
  this	
  probability	
  is	
  generated	
  out	
  
                                  of	
  some	
  distribution	
  function,	
  the	
  searching
The words contained in this file might help you see if this file matches what you are looking for:

...Review for ensemble methods in machine learning thomas g dietterich summary is method of combining a set classifiers decision somehow the sake more accurate pronouncement criterions work better than any individual combined it are each hypothesis has to be at least and diverse therefore error made by classifier uncommon within all them so majority vote will able correct this three essences that make appearance statisical finite number samples cause algorithm unable solve some uncertainties generate concise but rather potential equally good hypotheses choose answer from combination could avoid risk selecting one bad computational many stuck local optima multiply search path space may somewhat increase chance finding global representational limited knowledge representation weighted mays extend representative power author illustrates different assembling ensembles includes enumerating syndicate possible final manipulating training examples resampling input features subset output targets cr...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area