286x Filetype PDF File size 0.12 MB Source: www.kcgcollege.ac.in
1
A Rule Based Approach for Connective in Malayalam Language
1, 2 1 1
Kumari Sheeja S , Lakshmi S , Sobha Lalitha Devi
1
AU-KBC Research Centre, Anna University, Chrompet, Chennai, sheeja@kcgcollege.com,
slakshmi@au-kbc.org , sobha@au-kbc.org
2 KCG College of Technology, Karapakkam,Chennai,
Abstract. Discourse connectives signal the relationship between two coherent spans of text. Connective
arguments are the text spans they relate. Discourse relations link clauses in text and compose overall text
structure. Discourse connectives are an important part of modeling the Malayalam discourse structure.We
present our work on rule based approach in identifying the Discourse connective in Malayalam language.
Discourse connectives may or may not be explicitly present in the relation. In our work we have focused on the
rule based identification of particular connective in Malayalam text and showed encouraging results.
Keywords:Discourse connectives. rule based approach. Malayalam Discourse . Connective arguments
1 Introduction
Discourse relations connect clauses and sentences in the text and compose the overall text
structure. Discourse analysis is concerned with analyzing how clause or sentence level units of text
are related to each other within a larger unit of text. The two basic units of discourse relations are
discourse markers and their arguments. The discourse markers are the words or phrases which
connect two clauses or sentences and establish a relation between two discourse units.
Kamala went to hospital but doctor was not there.
In the this example the connective “but” makes a relation between two clauses or sentences and
making the text coherent. Discourse relations are used in NLP applications and it is important for
discourse analysis. Identification of discourse relation in natural language processing is a
challenging task. Discourse connectives, despite their common function of connecting the contents
of two different clauses, also acts as a conjunction [11]. So it is difficult to distinguish discourse and
non-discourse markers. The identification of argument boundaries in text is even more difficult in
large text. Malayalam is a South Indian or Dravidian language and also free word order language
but maintains the verb in final position. Discourse connectives are important for producing or
interpreting text in malayalam language . The content of the paper is organized as follows. Section2
describes the related work. Section 3 gives an overview of discourse relations and section 4 explains
the rule based approach. Finally the paper ends with the conclusion of the work.
2 Related Work
Relevant work on the annotation of discourse connectives and their arguments have been explored
in various languages such as Turkish ([12], Arabic [2], English [7], etc. PDTB is the first to follow
the lexically grounded approach to annotation of discourse relations and it is unique in adopting a
theory-neutral approach to annotation. PDTB provides argument structure of discourse relations and
sense labels of each relation in text which follows hierarchical classification scheme. Elwell et.al,
[9] worked using maximum entropy rankers and achieved 3.6% improvement over the state of art
on identifying arguments of discourse connectives. Versley [11] worked on tagging German
discourse connectives and arguments using English training data and a German_ English parallel
corpus.Versely’s approaches were to transfer a tagger for English discourse connectives.They have
done this work by annotation projection using a freely accessible list of connectives. He achieved
2
the result as F-score of 68.7% for the identification of discourse connectives. Ghosh [5] used a data
driven approach to identify arguments of explicit discourse connectives in the PDTB corpus. Al
Saif’s work [1] used machine learning algorithms for automatically identifying explicit discourse
connectives and its arguments in Arabic language. Wang et al.,[12] used sub-trees as features and
achieved a significant improvement in identifying arguments, explicit and implicit discourse
relations. Published works on discourse relation annotations in Indian languages are available for
Hindi, Malayalam and Tamil by Sobha et.al,[3].They have also worked on automatic identification
of Discourse Relations in the mentioned three Indian Languages [10] using CRFs technique. Other
published works in Indian languages are in Hindi [6];[7] and Tamil [8]. In this paper we have
explored various Discourse connectives and rule based approach for particular connective in
Malayalam language.
3 Discourse Connectives In Malayalam
Malayalam is a free-word order language and words are seen agglutinated, hence most of the
connectives are seen in agglutinated form.The discourse relation in Malayalam language can be
syntactic (a suffix) or lexical[10]. It can be within a clause, inter-clausal or inter-sentential.
Discourse connectives are an important part of modeling discourse structure. In this paper,we now
describe various connectives present in Malayalam language and a rule based approach to figure out
the connective “pakshe” (But).
3.1 Discourse Relation categorization
The discourse markers can be realized in any of the following ways. There are two major category
Explicit and Implicit relations. We also observed other types of relations.
3.2 Explicit connectives
The explicit connectives are morphemes or free words that trigger discourse relations in
Malayalam language .Explicit connectives signal the presence of discourse connectives between
sentences or clauses. The connectives can occur at the initial, final or medial position in an argument
in Malayalam language [12]. Below are the examples for explicit connectives in malayalam
language.
[prameham oru nishabdha
diabetes one silent
kolayaaLiyaaN.]/arg1
killer
ennaal [niyanthrichu nirthiyaal
but control kept if
kuzhappamilla]/arg2
no problem
(Diabetes is a silent killer. But when kept in control it is not a problem.)
In the above example, the connective “ennaal” occurs inter sententially by connecting the two
sentences. Connective occur at the initial position in the second argument. We see that the
connectives are explicitly realizing relations between two arguments. Four types of explicit
connectives have been observed.
3
3.3 Explicit connective Types
Subordinate Conjunctions. This type of conjunctions conjunctions connect the main clause with
the adverbial clause , noun or an adjectival clause. Most commonly observed subordinate
conjunctions in all three languages are since, because and when. Consider the following examples
which give the distribution of subordinate conjunctions in malayalam language.
[pachakkarikaL vevichu
Vegetables boil
kazhikkumpoL]/arg1
when eat
[athiluLLa poshakam nashtamaakum]/arg2
In that nutrients loss
(When vegetables are boiled and consumed, the nutrients in it are lost)
In the above examples both lexical and morpheme can become the connectives
Co-ordinate Conjunctions. This conjunction give equal emphasis for two clauses. They connect
two words, phrases and clauses. The most commonly observed co-ordinate conjunction in the corpus
are “but” and “and”. The conjunction is “pakshe” which is the co-ordinate conjunction.The intra
sentential coordinating conjunction can occur between the clauses.
Conjunct Adverbs. These are said to modify the clauses or sentences in which they occur. They
join independent clauses together. These are special type of conjunctions as they are part of adverbs
and conjunction. Given below are the examples of such a relation.
[kazhuth, mukham, kaiviralukal ennivitangalil
Neck, face, fingers all+these+palces
karuthaniramuNtaakaan kozhuppu
black+color+come fat
kaaraNamaakum.]/arg1 athinaal [eNNayil
reason+will+be Therefore oil
varutha aahaaram, kozhuppulla Bakshanam
fried food fatty food
enniva ozhivaakkaNam.]/arg2
all+these avoid
(Fat can make the neck, face and fingers turn to black color. Therefore we have to avoid oily foods
and fatty stuffs.)
In the above example “athinaal” is the adverbial conjunction which actually shows a cause and
effect relationship where arg1 is effect and arg2 is the cause.
Correlative conjunction. Correlative conjunctions are another type of simple pair of conjunctions
that is used in a sentence to join different words or group of words. This conjunction is not used to
connect sentences themselves.But they link two or more words or clauses of equal importance
within a sentence itself. They always occur within a sentence.
[indyayennaal innu sachin
4
india means today sachin
maathramalla,]/arg1 [pakshe innum
not only but also today
Sachinillaathe indyaye
sachin without india
sankalppikkaan prayaasam.]/arg2
think cannot
(Today India means not only Sachin, but also cannot think of an India without Sachin.)
Here “maathramalla-pakshe” is the correlative connective. But the “pakshe” is even said to be
dropped in certain cases.
Complementizer clause.This clause is considered as a special type of connective. It is a type of
conjunction which marks a complement clause.
[avare vila kalppikkunnilla]/arg1ennu [nethaakkal
they value not given that leaders
abhinayichu]/arg2
pretend
(The leaders pretended that they were not given a value.)
3.4 Implicit Connectives
An implicit relation can be inferred if there exist a relationship between adjacent pair of sentences
and explicit connective is not present in the text. We have labeled as “IMPLICIT” label where an
implicit relation was inferred[12].
(7) [pilkaalath niravadhi svadeshikal bekkarute
later many people bekkar's
paatha pinthutarnnu.]/arg1 IMPLICIT [mattu
way followed some
chilaraakatte kaayalil svadesheeyamaaya
People backwater traditional
Reethiyil kayal nikathi krishi bhoomi
style backwater filled farm land
uNdaakkiyetuthu.]/arg2
made
(Later many people followed bekkar's path. Some people in their traditional style filled up back
waters and made their farm land.)
In the above example two sentences are not explicitly connected but a relationship can be inferred
implicitly.
4 Rule Based Approach
Malayalam is a language of the Dravidian family and words are seen agglutinated. In this work,
we have collected Malayalam sentences from websites and the document consists of 3000 sentences.
no reviews yet
Please Login to review.