254x Filetype PDF File size 0.31 MB Source: projet.liris.cnrs.fr
21st International Conference on Pattern Recognition (ICPR 2012)
November 11-15, 2012. Tsukuba, Japan
Orthogonal Isometric Projection
1 1,2 1 1
Yali Zheng , Yuan Yan Tang , Bin Fang , Taiping Zhang
1. Chongqing University, Chongqing, China 2. University of Macau, Macau, China
zhengyl, fb, tpzhang@cqu.edu.cn, yytang@umac.mo
Abstract with the Graph Laplacian theory. Cai et al. [5] pro-
posedavariationofLaplacianface(LPP)—Orthogonal
In contrast with Isomap, which learns the low- Laplacianface (OLPP), which iteratively computed the
dimension embedding, and solves problem under the orthogonal eigenvectors to compose the projection ma-
classic Multi-dimension Scaling (MDS) framework, we trix. Kokiopoulou and Saad [6] analyzed and compared
propose a dimensionality reduction technique, called LPPandOLPP,andproposedanOrthogonalNeighbor-
Orthogonal Isometric Projection (OIP), in this paper. hood Preserving Projections (ONPP). It can be thought
We consider an explicit orthogonal linear projection of as an orthogonal version of LLE, but projections are
by capturing the geodesic distance, which is able to learned explicitly as a standard eigenvalue problem.
handle new data straightforward, and leads to a stan- Dimensionalityreductiontechniquesiseithertoseek
dard eigenvalue problem. And we extend our method to for a representation of data in low dimension to benefit
Sparse Orthogonal Isometric Projection (SOIP), which the data analysis or to map data from high-dimensional
can be solved efficiently using LARS. Numerical exper- space to low-dimensional space through an explicit
iments are reported to demonstrate the performance of linear or nonlinear projection learned from a training
OIPbycomparingwithafewcompetingmethods. dataset. Multidimensional Scaling (MDS) [1] is a clas-
sic data embedding technique, and considers preserv-
ing the pairwise distance to obtain the low dimension
1. Introduction configuration. Principal component analysis (PCA) can
be used as a projection method, which learns a lin-
Researchers are interested in reducing dimensions of ear projection by maximizing the variance of data in
digital data is because of the existence of digital infor- low dimension. PCA is identical to Classical Multidi-
mation redundancy. It is well known that extracting ef- mensional Scaling if euclidean distance is used [1], but
ficient features from data can improve object classifi- PCA learns the projection. Linear Discriminant Anal-
cation and recognition, and simplify the visualization ysis (LDA) maximizes the ratio of between-class vari-
of data. Manifold learning theory [4, 3, 11, 12] was ance to the within-class variance to determine an ex-
introduced into dimensionality reduction field in early plicit projection as well.
20 century, which assumed that a low-dimension mani- In this paper, we are motivated by Isomap [4] and
fold is embedded in high-dimension data. Researchers Isometric projection [8], and propose a linear projec-
pay lots of attention to discovery, and prove that a low- tion method, called Orthogonal Isometric Projection,
dimension manifold exists. Isomap was proposed by which is a variation of Isometric Projection. Cai pro-
Tenenbaum et al. [4], in which geodesic distance was posed Isometric Projection [8] addressed the same pur-
used to capture the global structure in high dimension, pose as ours. However, in this paper, we constrain the
and can be solved under MDS framework. Roweis and projection is orthogonal which differs from Cai’s, and
Saul [3] proposed a nonlinear dimensionality reduc- solve a standard eigenvalue problem. The main differ-
tion method, Locally Linear Embedding (LLE), which ence between our method and Cai’s orthogonal version
aimedatpreservingthesamelocalconfigurationofeach of Isometric Projection is that: we build a reasonable
neighborhood in low dimensional space as in high di- objective function, and solve the optimization in a stan-
mensional space. He and Niyogi [2] found an optimal dard eigenvalue problem, while Cai solved the problem
linear approximations to eigenfunctions of the nonlin- in a generalized eigenvalue problem. We extend our
ear Laplacian Eigenmap, called Local Preserving Pro- method to the sparse orthogonal isometric projection.
jection (LPP), and gave the justification in the paper Wetest our method on USPS data set. In the following
978-4-9906441-1-6 ©2012 IAPR 405
section wewillbrieflyintroduceCai’sIsometricProjec- Tosolvetheproblemefficientlyincomputationcost,
tion to demonstrate the advantages of our algorithm. Cai also applied the regression in [8, 13] over Y and
Xcalled spectral regression (SR). Y is computed first,
2. A brief review of Isomap and Isometric which is the eigenvector of τ(DG), then
Projection m
X T 2 2
a=argmin (a x −y ) +αkak (4)
a i i
2.1 Isomap i=1
Isomap was proposed by Tenenbaum and et al. [4], TheconditionVTXXTV =I constrainedthatlow-
and is one of most popular manifold learning tech- dimension embedding of points is orthogonal, namely,
niques. It aims to obtain an euclidean embedding of YTY =I.
points such that the geodesic distance in high dimen-
sional space gets close to the euclidean distance be- 3. Orthogonal Isometric Projection
tween each pair of points. The mathematical formu-
lation is The main idea of orthogonal isometric projection is
X 2 to seek an orthogonal mapping over the training data set
min (d (x ,x )−d (y ,y )) (1)
G i j E i j so as to best preserve the geodesic distance on a neigh-
i,j borhood graph, and learn a linear projection under the
d is the geodesic distance, which is defined locally to general Isomap framework, but has a different and rea-
G sonable constraint that projections are orthogonal.
betheshortestpathonthemanifold,d istheeuclidean
E
distance, and in a matrix form:
minkτ(D )−τ(D )k (2) 3.1 TheObjectiveFunctionofOIP
G E 2
D isthegeodesicdistancematrix,D istheeuclidean Under the Isomap framework, it minimizes the ob-
G E jective function in Equation 2. In math τ(D ) =
distance matrix, τ is an operation which converts the G
euclidean distance into an inner product form. The −CWC/2, and C is the centering matrix defined by
T T
C = I −1/N ·e e , where e = [1,...,1] , W
problem is solved under the MDS framework. Isomap n N N N N
makes an assumption that a manifold existing in high is a Dijkstra distance matrix based on K nearest neigh-
dimension space, and applies the geodesic distance to bor graph over all data points. Let f(V ) = kτ(DG) −
XTVVTXk ,weareseekingforalinearprojection:
measure the similarity of each point pair. However, if 2
insufficient samples are given or the data is noised, the minf(V) (5)
intrinsic geometry of the data is difficult to be captured V
byconstructing the neighborhood graph. Let S = τ(D ),whichisaknownneighborhoodgraph
G
2.2 Isometric Projection constructed from the given data set. We have
f(V) = tr((S−XTVVTX)T(S−XTVVTX))
Cai et al. [8] extended Isomap algorithm to learn a = tr(STS)+tr((XTVVTX)T(XTVVTX)
linear projection by solving a spectral graph optimiza-
tion problem. Suppose that Y = V TX, they minimized −2ST(XTVVTX))
the objective function, = tr((XTVVTX)T −2ST)(XTVVTX)
minkτ(D )−XTVVTXk (3) +tr(STS)
G 2
To make the problem tractable, they imposed a con- Sotheobjective function equation 5 is equivalent to
straint V TXXTV = I, and rewrote the minimization
problem as min tr(VTX((XTVVTX)T −2ST)XTV)
s.t. VTV =I
argmax tr(VTXτ(D )XTV)
V G
Let M = X(XTX −2ST)XT, then the problem be-
s.t. VTXXTV =I comesto
whichisequivalenttoageneralizedeigenvalueproblem min tr(VTMV)
Xτ(D )XTV =λXXTV s.t. VTV =I
G
406
Algorithm 1 Orthogonal Isometric Projection v is the solution of the regression system:
p
1: Construct a neighborhood graph G over the data
n
points. Compute the Dijkstra distances matrix X T i i 2
v =argmin (v x −y ) ,
W(i,j) = dG(i,j) over the graph between every p v p
point pair. d (i,j) is the shortest path of i and j, i=1
G
otherwise dG(i,j) = inf. where yi is the ith element of y . Similar to Zou et
2: Compute τ(D ) = −CWC/2, C is the centering p p
G al. [10], we can get the sparse solutions by adding L
T 1
matrix, and C = I −1/N ·e e .
n N N regularization:
3: Compute the eigenvectors of M = X(XTX −
2τ(DG)T)XT. V is determined by eigenvectors n m
of M associated with p smallest eigenvalues. X T i i 2 X j
v =argmin (v x −y ) +α kv k
p v p p
i=1 j=1
which leads to a standard eigenvalue problem, where vj is the jth element of v . The regression can
p p
MV =λV (6) be solved efficiently using LARS algorithm.
V is determined by eigenvectors of M corresponding to 5. Experiments
psmallest eigenvalues.
3.2 TheAlgorithmofOIP USPS is a well known handwritten digits corpus
from US postal service. It contains normalized gray
First we need to construct the neighborhood graph scale images of size 16 × 16, and totally 9298 sam-
G. As all we know, there are two options: ples with 256 features. Fig. 1 shows some examples
of USPS dataset. We evaluate our algorithm on USPS,
1
• if x ∈ N(x ), N(x ) is the k nearest neighbor of which are downloaded from public website , and com-
j i i pare with PCA, LDA, LPP, IP, and IP+SR, and demon-
x ,
i strate the average accuracy and average error rates in
• if kx −x k2 ≤ ǫ, ǫ is a small number as a thresh- the section. A human error rate estimated to be 2.37%
i j
old. shows that it is a hard task over USPS dataset [9]. We
the edge between i and j weighted by gaussian kernel randomlysample25timesfromthedatasetsasthetrain-
−kx −x k2 ing sets with varying rates from 20% to 80%, and the
is defined as e i j . rest are used for testing. We map points in test sets
WesummarizethealgorithminAlg1. by projections learned from training sets, and apply the
nearest neighbor to determine categories labels. We
4 SparseOrthogonalIsometricProjection also demonstrate the dimensions versus average error
rate by half training and half testing. Assume that l is
i
Our problem essentially solves a standard optimiza- the ground truth, b is the label assigned after dimen-
i
tion in Equation 6, which also can be incorporated into sionality reduction by methods, Ntest is the number of
a regression framework in the way of Sparse PCA [10]. test samples,
The optimal V is the eigenvectors with respect to the
N P
maximum eigenvalues of Equation 6. Since M is a 1 X Ntestδ(l ,proj(b ))
real symmetric matrix, so M can be decomposed into Acc = i=1 i i ,
N N
˜ ˜T ˜ j=1 test
XX .SupposetherankofX isr,andSVD(X)is
˜ ˜˜˜T and N = 25 in our experiment.
X=UΣV , Wecompare our method with IP, IP+SR, LDA, and
˜ LPPinTable1. Ourmethodoutperformsallothermeth-
it is easy to verify that the column vectors in U are the
˜ ˜T ods. The bolds are from our method, ± shows the
eigenvectors of XX . Let Y = [y ,y ,··· ,y ] ,
1 2 r n×r square root of the variance. With the number of training
each row vector is a sample vector in r-dimensional
subspace, and V = [v ,v ,··· ,v ] . Therefore, the samples, the classification precision increases as what
1 2 r m×r weexpect. Fig. 2 shows with the number of dimensions
projective functions of OIP are solved by the linear sys-
tems: increases, the average classification error decreases.
XTv =y ,p=1,2,··· ,r. 1http://www.zjucadcg.cn/dengcai/Data/TextData.html
p p
407
Table 1. Comparison on USPS
Train ratio LDA LPP IP IP+SR OIP
20% 89.13 ± 0.34 92.95 ± 0.36 92.11 ± 0.39 93.90 ± 0.50 95.10 ± 0.20
30% 90.39 ± 0.32 94.47 ± 0.30 93.61 ± 0.33 94.96 ± 0.36 95.93 ± 0.17
40% 91.18 ± 0.36 95.20 ± 0.27 94.48 ± 0.28 95.69 ± 0.36 96.40 ± 0.20
50% 91.54 ± 0.32 95.75 ± 0.29 94.85 ± 0.42 95.91 ± 0.34 96.65 ± 0.26
60% 91.97 ± 0.32 96.14 ± 0.24 95.21 ± 0.31 96.14 ± 0.35 97.01 ± 0.25
70% 92.02 ± 0.49 96.43 ± 0.33 95.61 ± 0.35 96.59 ± 0.32 97.17 ± 0.23
80% 92.19 ± 0.58 96.71 ± 0.42 95.97 ± 0.40 96.74 ± 0.40 97.35 ± 0.34
[4] J. B. Tenenbaum, V. Silva and J. C. Lang-
ford, A global geometric framework for nonlin-
ear dimensionality reduction, Science, 290(2000),
pp. 2319–2323.
[5] D. Cai, X. He, J. Han and H.J. Zhang, Orthog-
onal laplacianfaces for face recognition, IEEE
Transactions on Image Processing, 15(11)(2006),
pp. 3608–3614.
Figure 1. USPS examples [6] E. Kokiopoulou and Y. Saad, Orthogonal neigh-
borhood preserving projections: a projection-
0.09 based dimensionality reduction technique, IEEE
IP transactions on Pattern Analysis and Machine In-
IP+SR
0.08 LDA telligence, 29(12)(2007), pp. 2143–2156.
OIP
LPP
0.07 [7] X. He, S. Yan, Y. Hu, P. Niyogi and H.J. Zhang,
IEEE Transactions on Pattern Analysis and Ma-
0.06 chine Intelligence, 27(3)(2005), pp. 328–340.
Average error rate0.05 [8] D. Cai, X. He and J. Han, Isometric projection,
Proceedings of the National Conference on Artifi-
0.04 cial Intelligence, 2007, pp. 528–533.
0.03 [9] I. Chaaban and M. Scheessele, Human perfor-
20 30 40 50 60 70 80 90 100
Dimensions manceontheUSPSdatabase,2007,Techniquere-
port, Indiana University South Bend.
Figure 2. Dimensions vs. average classi- [10] H.Zou,T.HastieandR.Tibshirani,Sparseprinci-
fication error on USPS dataset. pal component analysis, Journal of computational
andgraphicalstatistics, 15(2)(2006), pp. 265–286.
References [11] J.A. Lee and M. Verleysen, Nonlinear dimension-
ality reduction of data manifolds with essential
[1] W. S. Torgerson, Multidimensional scaling: I. loops, Neurocomputing, 67(2005), pp. 29–53.
Theory and method, Psychometrika, 17(4)(1952), [12] H. Choi and S. Choi, Robust kernel isomap, Pat-
pp. 401–419. tern Recognition, 40(3)(2007), pp. 853–862.
[2] X. He and P. Niyogi, Locality preserving projec- [13] D. Cai, X. He and J. Han, SRDA: An efficient
tions, Proceeding of Advances in Neural Informa- algorithm for large-scale discriminant analysis,
tion Processing Systems, 2003, pp. 153–160. IEEE Transactions on Knowledge and Data En-
[3] S.T.RoweisandL.K.Saul,Nonlineardimension- gineering, 20(1)(2008), pp. 1–12.
ality reduction by locally linear embedding, Sci-
ence, 290(2000), pp. 2323–2326.
408
no reviews yet
Please Login to review.