308x Filetype PDF File size 0.18 MB Source: www.cs.auckland.ac.nz
Review for Ensemble Methods in Machine Learning, Thomas G. Dietterich
Summary
Ensemble learning is method of combining a set of classifiers’ decision somehow in the sake
of more accurate pronouncement. The criterions for ensemble methods work better than any
individual combined of it are each individual hypothesis has to be accurate, at least 50%
accurate, and to be diverse, therefore the error made by any classifier is uncommon within all
of them, and so the majority vote will be able to correct this error.
Three essences that make the ensemble method appearance: 1.Statisical- the finite number of
samples cause the learning algorithm unable to solve some uncertainties to generate a concise
hypothesis but rather a number of potential equally good hypotheses, choose the vote answer
from a combination could avoid the risk of selecting one from a bad hypothesis. 2.
Computational- many learning algorithm will be stuck in the local optima, a multiply search
path in the hypotheses space may somewhat increase the chance of finding the global optima.
3. Representational- a hypothesis is limited by the knowledge representation of the learning
algorithm, and the weighted combination of hypotheses mays extend the representative power.
The author illustrates different methods for assembling ensembles, includes: Enumerating the
hypotheses-syndicate the possible hypothesis to make a final decision, Manipulating Training
examples- resampling the training to generate multiply hypotheses, Manipulating Input
Features-selecting different subset for multiply training, Manipulating the Output Targets-
creating multiply hypotheses with regard to the different grouped targets, Injecting
Randomness- adding randomness into learning algorithms.
The comparisons of the performance of C4.5, adaboost, bagging, and randomized tree
ensemble method are shown. And the result explain that when the problem is not so
complexity then the 3 reasons for ensemble is absent, therefore a single classifier can handle
very well, otherwise the ensemble method could provide better results. In general, adaboost
has best performance when the training set contains little noise, otherwise over-fit the noise,
the author discussed that the nature of adaboost that aggressively extend the margin of the
coverage should be easily overfitting, but the stage-wise prevents this happen more often.
Critic
This article is a survey more than a research paper, although it shows some experiment results
regard to the performance of different ensemble methods. A survey is a paper that provides
the new coming some helpful information of the particular topic. This article is presented
nicely in a reasonable layout that will enhance its readability and informative. In the
introduction, it explains what ensemble methods in machine learning are and how it may
work. Then the three fundamental reasons shows the motivations of the ensemble which
indicates the problems in most of machine learning, and hence increase the importance of
ensemble method and attract the audiences to further reading. Then the methods of
constructing the ensemble are illustrated therefore it provides the reader the information of the
research achievements of ensemble method, and the information has practical usefulness.
Then the comparisons of ensemble methods indicate the limitation and advantages of
different kind of ensemble methods. For this well-formed structure, the readers can have a
more concrete understanding of ensemble learning.
Review for Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy
and Cost, Vikas C. Raykar, Balaji Krishnapuram, Shipeng Yu
Summary
This research paper proposed a new method for training cascades of classifiers called soft
cascades in contracted to traditional cascades. It stated that the conventional method has 3
problems that can be solved by using the proposed method: Joint training of all stages-a
cascade is generally train sequentially, but for a soft cascade, it is available to train once, and
the thresholds for each classifier can be trained as a post-processing step; Tradeoff between
accuracy and cost – traditional cascade classifiers have no explicitly concerns about the
accuracy and the cost, but this method can be used to stress different needs. Computation cost
of training- the post-processing step for adjusting thresholds could reduce the computational,
but a hard cascade has to be retrained for every new thresholds.
In this paper, section 2 gives basis information of a cascade of classifier, and then the keys of
soft cascade are shown: a soft cascade rejects instances based on the posterior class
probability evidenced by the classifier for that stage, and the positive instance could only be
classified after it passing through all the stages. As a soft cascade only trains once, the
optimization of all stages at the same time requires that each stage emphasise different types
of false positive in order to optimize the accuracy of the whole cascade.
Then the writer showed the method for training the cascade, the training process is majorly
involving of finding the maximum likelihood estimate for the parameters of linear classifier.
To provide a better estimation, the maximum a-posteriori is used. In order to address the cost,
a parameter for the expected cost is added to the maximum a-posteriori equation. Similarly a
parameter for the accuracy is also inserted.
To prove their novel method is more efficient, the writers conduct several experiments with
medical datasets, which typically have high cost for feature acquisition. And the results show
that the accuracy of soft cascade is generally little lower than the best one, but it can
dramatically reduce the feature acquisition cost, in hundreds times.
Critic
There are several issues that would decrease the readability and comprehensibility of the
paper: 1. the term soft cascade was not explained in the context of its first occurrence, the
reader has to read several times back and forth, which could arise the difficulty of
understanding. 2. The authors claim that the computational cost problem of hard cascade
could be solved by the proposed method, which may not be necessary. The proposed in order
to optimization all stages simultaneously, this could be required more complex computation,
and the post-processing step for computing the thresholds does not occur in hard cascade, so
the summation of these could excess the computation cost of a hard cascade.
I assume that the datasets have little noises, because the Adaboost is very sensitive to noise,
but the results show that it can achieve a high performance in these datasets. It is nice to
provide a noisy dataset to prove that the accuracy and cost tradeoff mechanism could handle
well in such situations, because in many causes the accuracy is heavily affected by noises, so
a few tradeoff from accuracy to cost can result huge decreasing.
Although some issues exist, this paper is informative. The experimental datasets chosen from
a field that can emphasize the cost give the paper a better persuasion towards its importance.
Tradeoff between Machine Learning and Pattern Recognition
Before
discussing
the
tradeoff,
what
is
the
difference
between
Machine
learning
and
Pattern
Recognition
has
to
be
identified.
“Pattern
recognition
has
its
origins
in
engineering,
whereas
machine
learning
grew
out
of
computer
science.
However,
these
activities
can
be
viewed
as
two
facets
of
the
same
field”,
Christopher-‐M-‐Bishop
[textbook].
Figure
1.
Artificial
Intelligence
From
the
figure,
we
can
see
that
pattern
recognition
is
subfield
of
AI
that
applies
machine
learning
and
statistics
methodology
to
solve
the
problems
of
finding
hidden
patterns
in
the
targets.
It
generally
has
broader
applications
than
machine
learning.
Wikipedia
describes
the
pattern
recognition
is
based
on
the
probability
theory;
therefore
most
of
its
pattern
recognition
algorithms
has
the
probabilistic
nature.
Other
algorithms
from
machine
learning’s
outcome
are
deterministic.
Probabilistic
based
Pattern
recognition
algorithms
can
output
result
with
an
associated
confidence
value
that
are
mathematically
grounded
by
probability
theory,
and
this
value
can
also
be
used
by
a
different
probability
theory
based
algorithms.
Sometimes,
when
it
has
a
confidence
value
under
some
thresholds,
it
could
decline
to
provide
a
valid
output.
In
contrast,
general
machine
learning
algorithm
would
still
provide
the
“best”
decision,
no
matter
the
fact
that
it
may
be
a
decision
little
better
than
the
worst
assumption.
Because
it
is
probabilistic-‐based,
it
can
naturally
tackle
the
problems
of
uncertainty
propagation
better,
especially
for
large
tasks
contain
lots
of
uncertainties.
But
as
this
probability
is
generated
out
of
some
distribution
function,
the
searching
no reviews yet
Please Login to review.