291x Filetype PDF File size 0.55 MB Source: aclanthology.org
Analysts Grammar or Japanese tn the Nu-ProJect
- A Procedural Approach to Analysts Grammar -
Jun-tcht TSUJII. Jun-tcht NAKANURA and Nakoto NAGAO
Department of Electrical Engineering
Kyoto University
Kyoto. JAPAN
Abstract CFG rules Independently describe constraints on
stngle linguistic structures, and a universal rule
Analysts grammar of Japanese tn the Mu-proJect application mechanism automatically produces a set
ts presented, It is emphasized that rules of posstble structures which satisfy the given
expressing constraints on stngle linguistic constraints. It ts well-known, however, that such
structures and rules for selecting the most sets of posstble structures often become
preferable readtngs are completely different In unmanageably large.
nature, and that rules for selecting preferale Because two separate rules such as
readings should be utilized tn analysts grammars of
practical HT systems. It ts also clatmed that
procedural control ts essential tn integrating such NP ..... • NP PREP-P
rules tnto a unified grammar. Some sample rules VP ..... • VP PREP-P
are gtven to make the points of discussion clear are usually prepared tn CFG grammars tn order to
and concrete. analyze noun and verb phrases modifted by
1. Introduction prepositional phrases. CFG grammars provide two
syntactic analyses for
The Hu-ProJect ts a Japanese nattonal project She was given flowers by her uncle.
supported by grants from the Special Coordination
Funds for Promoting Science & Technology of Furthermore. the ambiguity of the sentence ts
STA(Sctence and Technology Agency). whlch atms to doubled by the lexlcal ambiguity of "by". which can
develop Japanese-English and English-Japanese be read as etther a locattve or an agenttve
machine translation systems. Ve currently restrict preposition. Since the two syntactic structures
the domain of translation to abstracts of are recognized by compZetely independent ru]es and
scientific and technological papers. The systems the semantic interpretations of "by" are given by
are based on the transfer approach[;], and consist independent processes tn the ]ater stages. It ts
of three phases: analysts, transfer and generation. difficult to compare these four readings during the
In thts paper, we focus on the analysts grammar of anaZysts to gtve a preference to one of these four
Japanese tn the Japanese-English system. The readings.
grammar has been developed by using GRADE which ts
a programming language specially designed for thts A rule such as
project[2]. The grammar now consists of about 900
GRADE rules. The experiments so far show that the "If a sentence ts passlve and there ts a
grammar works very well and ts comprehensive enough "by"-prepostttonal phrase, tt ts often the case
to treat various linguistic phenomena tn abstracts. that the prepositional phrase ftlls the deep
In thts paper we wtll discuss some of the basic agenttve case. (try thts ana]ysts first)"
design principles of the grammar together wtth its
detatled construction. Some examples of grammar seems reasonable and quite useful for choosing the
rules and analysts results wtll be shown to make most preferable interpretation, but tt cannot be
the points of our discussion clear and concrete. expressed by refining the ordinary CFG rules. Thts
ktnd of ru]e ts quite different In nature from a
2. Procedural Grammar CFG ru]e. It ts not a rule of constraint on a
stng]e ]tngutsttc structure(in fact. the above four
There has been a prominent tendency tn recent readings are a]l ]tngulsttcal]y posstb]e), but tt
computational linguistics to re-evaluate CFG and ts a "heuristic" ru]e concerned with preference of
use tt dtrectly or augment tt to analyze readings, which compares several alternative
sentences[3.4.5]. In these systems(frameworks), analysts paths and chooses the most feastble one.
Human translaters (or humans tn general) have many
267
such preference rules based on vartous sorts of cue
such as morphological forms of words, collocations 3 Organization of Grammar
of words, text styles, word semantics, etc. These
heuristic rules are quite useful not only for In thts sectton, we will give the organization
increasing efficiency but also for preventing of the grammar necessary for understanding the
proliferation of analysts results. As Wllks[6] discuss|on |n the follow|ng sections. The matn
potnted out, we cannot use semanttc Information as components of the grammar are as follows.
constraints on stngle linguistic structures, but
Just as preference cues to choose the most feastble
Interpretations among linguistically posstble (1) Post-Morphological Analysts
Interpretations. We clatm that many sorts of (2) Determination of Scopes
preference cues other than semanttc ones exist tn (3) Analysts of Stmple Noun Phrases
real texts whtch cannot be captured by CFG rules. (4) Analysts of Stmple Sentences
We will show tn thts paper that. by utilizing (5) Analysts of Embedded Sentences (Relative
vartous sorts of preference cues. our analysts Clauses)
grammar of Japanese can work almost (6) Analysts of Relationships of SentenCes
determtntsttcally to gtve the most preferable (7) Analysts of Outer Cases
Interpretation as the ftrst output, wtthout any (8) Contextual Processing (Processing of Omttted
extensive semanttc processing (note that even case elements. Interpretation of 'Ha' . etc.)
"semant|c" processing cannot dtsambtguate the above (9) Reduction of Structures for Transfer Phase
sentence. The four readings are semantically
possible. It requtres deep understanding of Each component conststs of from 60 to 120
contexts or situations, whtch we cannot expect tn a GRADE rules.
practical MT system).
In order to Integrate heuristic rules based on 47 morpho-syntacttc categories are provtded
var|ous levels of cues tnto a untfted analysts for Japanese analysts, each of whtch has tts own
grammar, we have developed a programming langauage. lextcal description format. 12.000 lextcal entrtes
GRADE. GRADE provtdes us wtth the following have already been prepared according to the
facilities. formats. In thts classification. Japanese nouns
are categorized |nto 8 sub-classes according to
Expllctt Control of Rule Appl|cattons : thetr morpho-syntacttc behavtour, and 53 semanttc
Heuristic rules can be ordered according to thetr markers are used to characterize thetr semanttc
strength(See 4-2). behaviour. Each verb has a set of case frame
descriptions (CFD) whtch correspond to different
- Nulttple Relatton Representation : Vartous usages of the verb. A CFD g|ves mapping rules
levels of Informer|on Including morphological. between surface case markers (SCN - postpostttonal
syntactic, semantic, logtcal etc. are expressed tn case particles are used as SCN's tn Japanese) and
a s|ngle annotated tree and can be manipulated at thetr deep case interpretations (DCZ 33 deep
any ttme durtng the analysts. Thts ts requtred not cases are used). DC! of an SCM often depends on
only because many heuristic rules are based on verbs so that the mapping rules are given %o CFD's
heterogeneous levels of cues. but also because the of Individual verbs. A CFO also gtves a normal
analysts grammar should perform semantic/logical collocation between the verb and
Interpretation of sentences at the same ttme and SCM's(postpositonal case particles). Oetatled
the rules for these phases should be wrttten tn the lextcal descriptions are gtven and discussed tn
same framework as syntactic analysis rules (See another paper[7].
4-2. 4-4). The analysts results are dependency trees
- Lextcon Drtven Processing : We can wrtte whtch show the semanttc relationships among tnput
heuristic rules spectftc to a stngle or a 11mtted words.
number of words such as rules concerned wtth
collocations among words. These rules are strong
tn the sense that they almost always succeed. They 4. Typtcal Steps of Analysts Grammar
are stored tn the lextcon and tnvoked at
appropriate times durtng the analysts wtthout In the following, we w111 take some sample
decreasing efficiency (See 4-1). rules to Illustrate our points of discussion.
- Expltct% Definition of Analysts Strategies : 4-; Relative Clauses
The whole analysts phase can be dtvtded into steps.
Thts makes the whole grammar efficient, natural and Relative clause constructions in Japanese
easy %o read. Furthermore. strategic consideration express several different relationships between
plays an essential role tn preventing undesirable modifying clauses (relative clauses) and thelr
interpretations from betng generated (See 4-3). antecedents. Some relattve clause constructions
268
cannot be translated as relative clauses tn [ex-1] [Type 2]
Engltsh. Me classified Japanese relattve clauses "SHORZSOKUDO" "GA" "HAYA[" "KEISANK["
Into the followtn 9 four types, according to the (processing speed) (case (htgh) I (computer) I
relationships between clauses and their particle:
antecedents. subject
(1) Type 1 : Gaps In Cases I case) /t
One of the case elements of the relattve RelattvetClause Antecedent
clause ts deleted and the antecedent fills the gap. -->(English Translation)
(2) Type 2 : Gaps In Case Elements A computer whose processing speed ts htgh
The antecedent modifies a case element tn the (Rule 3) Nouns such as "MOKUTEKZ"(puPpose).
clause. That ts. a gap exists tn a noun phrase tn "GEN ZN"(reason), "SHUDAN"(method) etc. express
the clause. deep case relationships by themselves, and. when
these nouns appear as antecedents. |t is often the
(3) Type 3 : Apposition case that they ft11 the gaps of the corresponding
deep cases tn the relattve clauses.
The clause describes the content of the [ex-2] [Type 1]
antecedent as the Engltsh "that"-clause tn 'the
tdea that the earth ts round'. "KONO" "SOUCHI" "O" "TSUKAT" "TA" "MOKUTEK["
(4) Type 4 : Partlal Apposltlon (th,s)l(dev,c. (c.. ICpurpos.)
|part,cle:h /,ormat,ve: I J
The antecedent and the clause are related by I / °bJect l / pest) l
certain semantic/pragmatic relationships. The /case) ~ /
relative clause of thts type doesn't have any gaps. RelattvetClause Antecedent
This type cannot be translated dtrectly lnto
English relative clauses. Me have to Interpolate --> (English Translation)
In English appropriate phrases or clauses whtch are
Implicit tn Japanese. tn order to express the The purpose for wh|ch (someone) used thts devtce
semantic/pragmatic relationships between the The purpose of ustn9 thts devtce
antecedents and relative clauses explicitly. In
other words, gaps extst tn the Interpolated phrases
or clauses. (Rule 4) There ts a 11mtted number of nouns whtch
Because the above four types of relattve are often used as antecedents In Type 4 relattve
clauses have the same surface forms fn Japanese clauses. Each of such nouns requtres a specific
phrase or clause to be Interpolated tn Engltsh.
......... (verb) (noun). [ex-3] [Type 4]
RelattvefClause Antecedent "KONO" "SOUCHI" "0" "TSUKAT"-- "TA" "KEKKA"
careful processing ts requtred to d|sttngutsh them (th,s),(devlce)/~case e.~. (to use)/~tense ~'...(;esult)
(note that the "antecedents' -modified nouns- ape ...l fformat,ve:h J
located after the relat|ve clauses tn Japanese). A 1 ,object , Ipast) I 1
sophisticated analysis procedure has already been [ I case) l
developed, which fully ut|ltzes vartous levels of Rel at tve ~ Clause Antecedent
heuristic cues as follows.
(Rule 1) There are a 11mtted number of nouns whtch --> (Engllsh Translation)
are often used as antecedents of Type 3 clauses.
(Rule 2) Vhen nouns with certa|n semanttc markers The result which was obtatned by ustng thts dev|ce
appear tn the relattve clauses and those nouns are
followed by one of spectflc postpostttonal case In the above example, the clause "the result whtch
part4cles, there ts a htgh possibility that the someone obtatned (the result : gap)" ts onmitted tn
relattve clauses are Type 2. In the following Japanese. whtch relates the antecedent
example, the word "SHORISOKUDO"(processtn 9 speed) "KEKKA"(result) and the relattve clause "KONO
has the semanttc marker AO (attribute). SOUCHI 0 TSUKAT_TA"(someone used thts devtce).
269
A set of lextcal rules ts defined for (Rule 1) Stnce parttcle "TO" ts also used as a case
"KEKKA"(resulL). which basically works as follows : particle, tf It appears tn the position:
tt examines first whether the deep object case has
already been filled by a noun phrase tn the Noun 'TO" verb Noun,
relattve clause. If so, the relattve clause ts Noun 'TO' adjective Noun.
taken as type 4 and an appropriate phrase ts
Interpolated as tn [ex-3]. If not, the relattve
clause ts taken as type 1 as tn the following there are two posstble Interpretations. one tn
example where the noun *KEKKA" (result) ftlls the whlch "TO" Is a case parttcle and "noun TO
gap of object case tn the relattve clause. adjective(verb)' forms a relattve clause that
modifies the second noun. and the other one tn
[ex-4] [Type 1] which "TO" ts a conjunctive particle to form a
conJuncted noun phrase. However. it ts very 11kely
"KONO" "JIKKEN • / •GA". "TSUKAT• J"TA" l "KEKKA" that the parttcle 'TO' ts not 8 conjunctive
(thts)J(expertment)//(case~(to use)~(tense (r~ult) parttcle but a post-positional case particle, if
rParticle~ iformsttve:]l the adjective (verb) ts one of adjectives (verbs)
IsubJect I I past)| I which requtre case elements wtth surface case mark
[ _ll case) l / I "TO' and there are no extra words between "TO • end
the adjective (verb). In the following example.
Relattve Clause Antecedent "KOTONARU(to be different)" ts an adjective which
ts often collocated wtth a noun phrase followed by
-->(English Translation) case particle "TO".
The result whtch thts experiment used [ex-5]
YOSOKU-CHI "TO" KOTONARU ATAI
(predicted value) (to be different) (value)
Such lextcal rules are Invoked at the beginning of
the relattve clause analysts by a rule tn the math [dominant interpretation]
flow of processing. The noun "KEKKA • (result) is
given a mark as a lexlcal property which Indicates IYOSOKU-CHI "TO" KOTONARU ATIAI
the noun has special rules to be Invoked when tt
appears as an antecedent of a relatlve clause. A11 relattve~clause ant/cedent
the nouns which requlre speclal treatments In the
relative clause analysts are given the same marker. • the value which ts different from the
The rule tn the matn flow only checks thts mark and predicted value
Invokes the lextcal rules defined tn the lextcon.
[less domtnant Interpretation]
(Rule 5) Only the cases marked by postpostttonal
case particles 'GA'. 'WO" and 'NI" can be deleted YOSOKU-CHI "TO" KOTONARU ATAI
tn Type 1 relattve clauses, when the antecedents
are ordtnary nouns. Gaps tn Type 1 relative clauses Me N~
can have other surface case marks, only when the I I
antecedents are spectal nouns such as described tn conJuncte~ noun phrase
Rule (3).
= the predicted value and the different value
4-2 ConJuncted Noun Phrases
(Rule 2) If two "TO* particles appear tn the
ConJuncted noun phrases often appear in position:
abstracts of scientific and technological papers.
It ts Important to analyze them correctly. Noun-1 'TO' . ......... Noun-2 'TO' 'NO" NOUN-3
especially to determine scopes of conjunctions
correctly, because they often lead to proliferation the right boundary of the scope of the conJuctton
of analysis results. The particle "TO" plays ts almost always Noun-2. The second 'TO" plays a
almost the same role as the Engllsh "and" to role of a delimiter which deltmtts the right
conjunct noun phrases. There are several heuristic boundary of the conjunction. Thts 'TO" tS
rules based on various levels of information to optional, but tn real texts one often places tt to
determine the scopes. make the scope unambiguous, especially when the
second conjunct IS a long noun phrase and the scope
is highly ambiguous without tt. Because the second
a delimiter of the conjunction) and 'NO' following
a case parttcle turns the preceding phrase to a
270
no reviews yet
Please Login to review.