334x Filetype PDF File size 0.40 MB Source: troindia.in
AN ENGLISH TO ASSAMESE, BENGALI AND HINDI
MULTILINGUAL E-DICTIONARY
Md. Saiful Islam
Department of Computer Science
Assam University, Silchar, Assam, India
E-mail:sislam.mca@gmail.com
Abstract alphabetically with their meaning, synonyms,
Dictionary is a very demandable components phonetics, POS, and examples [5][6]. It is one of
of Natural Language Processing system the important tools to assist students in
nowadays. A dictionary is one of the understanding as well as enlightening the skill of
important tools that can be used for learning reading. There are two types of dictionary,
new languages. A word is basically an namely Paper dictionary which is also known as
association of linguistic sound and meaning. hard or printed dictionary and Electronic
The spelling does not always easily correlate dictionary which is also known as digital or
with the sound of a word. A dictionary helps Internet dictionary.
us both with the spelling and pronunciation of Electronic Dictionary (E-Dictionary) is one kind
such words. Electronic dictionaries are very of dictionary whose data exists in digital form
popular nowadays. It can be accessed by many and can be accessed through a number of
users simultaneously on online. The main different media. The E-Dictionary is a very
objective of this paper is to develop an English important and powerful tool for any person who
to Assamese, Bengali and Hindi (E-ABH) is learning a new language using computer on
multilingual electronic dictionary in such a both online and offline. It has the advantage of
way that it is user friendly dictionary and user providing the user to access much larger database
can easily look up the meaning of word and than a single book. The most important
other related information of the word like advantage of an E-Dictionary is that it is very
word Id, POS, synonyms and examples from convenient to use. In modern electronic form,
English to Assamese, Bengali and Hindi electronic dictionaries have tremendous potential.
languages. This dictionary will be beneficial
and must be improved the knowledge of According to the languages involve, the
Assamese, Bengali, English and Hindi
languages basically for people of North-East dictionaries are found in three categories as
India. below:
Keywords: Electronic Dictionary, Languages, 1. Monolingual Dictionary: Here, user can
Natural Language Processing, Sequential search the meaning of word and other related
Search Technique information of the word from one language to
I. INTRODUCTION same language. English-English and Bengali-
A. Electronic Dictionary Bengali are some of the examples of
Dictionary is a book of words with one or more monolingual dictionary.
specific languages and the words are listed 2. Bilingual Dictionary: Here, user can search
the meaning of word and other related
ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-9, 2016
74
INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)
information of the word from one language to C. Languages
another language. Assamese-English and In this section, we discuss briefly about the
English-Bengali are some of the examples of Assamese, Bengali, English and Hindi languages
bilingual dictionary. as follows:
3. Multilingual Dictionary: Here, user can
search the meanings of words and other related 1. Assamese Language: Assamese is an
information of the words from one language to Eastern Indo-Aryan language used mainly in
several languages. English-Assamese, Bengali the state of Assam. It is the state language as
and Hindi is an example of multilingual well as official language of Assam. The
dictionary. Assamese language is also known as Asamiya
(Axomiya). It is the mother tongue/language of
According to Al-Rabi’i, the E-Dictionary can be Assamese people. Assamese language is
divided into two different types [5] as follows: spoken mainly by the people of Assam and by
the some people of other North-Eastern states.
1. Online E-Dictionary: This dictionary is Nearly 15 to 20 million people speak the
directly used in digital form through Internet Assamese language. Assamese is one of the
using web browsers from anywhere place in the recognized languages of India [6][7]. It is
world. It is also known as Internet dictionary. evolved in the 7th century AD having its roots
Many users can be accessed it simultaneously on from the Sanskrit language. However, its
online. vocabulary, phonology and grammar have been
substantially influenced by the original
2. Offline E-Dictionary: This dictionary can be inhabitants of Assam, such as the Boros and the
used in digital computer, PDA (Personal Data Kacharis. Assamese script is derived from
Assistant), and mobile phone. It is also known as Brahmi script. The Assamese language is
portable digital dictionary. We can carry and written using Assamese scripts that are
backup Offline E-Dictionary using CD, DVD, developed from the Gupta alphabets around
HD and pen drive. We can also download this 1200 AD and which closely resemble the
type of dictionary from Internet and can be Mithilakshar and Bengali alphabets.
installed in our own computer or other devices.
2. Bengali Language: Bengali language is an
B. Natural Language Processing Indo-Aryan language spoken mostly in the East
Natural languages are most commonly used by Indian subcontinent. It is also known as Bangla
humans for communication purposes naturally. language. It has evolved from the Magadhi
Natural Language Processing (NLP) is a field of Prakrit and Sanskrit language. Bengali is one of
computer science and linguistics concerned with the recognised languages of India. It is the
the interactions between computers and natural
official language of West Bengal and Tripura. It
languages[4]. NLP deals with computer
is also a major language in the Indian Union
programs to understand human languages both in Territory of Andaman and Nicobar Islands. The
written and oral form. The major goal of the NLP Bengali is mainly spoken by the people of Indian
group is to design and build software that will states like West Bengal, Tripura and Assam. It is
analyze, understand, and generate languages that the seventh most spoken language in the world
humans use naturally. NLP is an area of research and second most spoken language in India.
and application that explores how computer can The Bengali language is written using Bengali
be used to understand and manipulate natural th
scripts and is the 6 most widely used writing
language text or speech to do useful things. Some system in the world. The script with minor
of the most common research tasks in NLP are variations is shared by Assamese and is the basis
Machine Translation, Electronic Dictionary, for the other languages like
Morphological Segmentation, Natural Language Manipuri and Bishnupriya Manipuri [6].
Generation, Optical Character Recognition, Part
of Speech (POS) Tagging, Question Answering, 3. English Language: English is the West
Speech Recognition, Information Retrieval (IR), Germanic language that was first spoken in early
and Speech Segmentation[6]. medieval England. English is spoken mainly by
the people of Canada, Australia, United
Kingdom, United States, Ireland, and New
ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-9, 2016
75
INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)
Zealand. It is an official language of almost sixty c. The Compact Oxford English Dictionary,
sovereign states. It is the third most common edited by J. A. Simpson and E. S. C. Weiner
native language in the world. It has become in 1991[15].
the leading language of international discourse d. The Oxford Dictionary of Current English,
[6]. English was introduced in India in 1830 compiled by Catherine Soanes in 2006.
during the rule of the East India Company. At the e. The Concise Oxford English Dictionary,
time of Independence of India in 1947, English edited by Angus Stevenson and Maurice
was the only functional lingua franca in the Waite in 2011 [16].
country. The Constitution of India (1951)
declared English as the associate official III. DATAFLOW DIAGRAM OF E-ABH
language of India. It has various dialects in India DICTIONARY
due to the influence of local languages.
A Data Flow Diagram (DFD) is a pictorial
4. Hindi Language: Hindi is the fourth most representation of information flows in a system.
widely spoken language in the world. It is spoken The DFD is often used as a preliminary step to
widely by the people of Indian states like Delhi, create an overview of the system [12]. It is an
Madhya Pradesh, Bihar, Uttar
attractive technique because it provides what
Pradesh, Chhattisgarh, Haryana. Himachal users do rather than what computers do. The
Pradesh, Chandigarh, and Rajasthan. It is the DFD technique is very popular, because it is very
primary spoken language of Madhya Pradesh and simple to understand and use. We have used two
Uttar Pradesh [6]. In the 2001 census of India, types of DFD to implement the E-ABH
258 million people is reported Hindi to be their dictionary which are as below:
native language. Hindi is also spoken in the other
neighbouring countries of India, such as A. Level 0 DFD
Bangladesh, Bhutan and Nepal. Hindi derives its The Level 0 DFD is also known as Context
vocabulary from several major sources like Diagram (CD).
Sanskrit, Persian and Arabic.
II. REVIEW OF RELATED LITERATURE A CD is the most basic form of the DFD. It aims
to show how the entire system works at a glance.
Lots of English paper dictionaries have been CD demonstrates the interactions between the
compiled by many lexicographers in different process and external entities. The CD of E-ABH
times. The first English dictionary was compiled dictionary is shown in figure1.
by Robert Cawdrey in 1604 [17]. It contains
about 2,543 words. The first electronic version
of Oxford English Dictionary (OED) was made
available in 1988 [14]. The digital OED was
developed by Tony Smith and published by
Oxford University Press in 1999. The online
version of OED has been available since 2000.
Presently, there are many English-Assamese[1],
English-Bengali[2], English-Hindi[3] and
English-English paper dictionaries available in Fig.1: Context Diagram of E-ABH dictionary
market. There are also a few number of English-
Assamese, English-Bengali[8][19], English- In CD, the Administrator and User are two
Hindi and English-English electronic
dictionaries available on both online and offline external entities. The Administrator can enter
nowadays. data into the database of the system, whereas the
Some examples of English dictionaries with their User can search data from the database of the
lexicographer names are mentioned as below: system.
a. A Dictionary of the English Language,
compiled by Samuel Johnson in 1755 [14]. B. Level 1 DFD
b. The Oxford English Dictionary, published by Level 1 DFD is the next level of CD that shows
Oxford University Press in 1989. the overview of the full system of the E-ABH
dictionary. It is used to describe more details on
ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-9, 2016
76
INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)
how the data are processed and what type of data Go to next step;
is needed in the system. Level 1 DFD of the E- 2. Search headword with its POS
ABH dictionary is shown in figure 2. If (found)
{
Print- headword already exists in the
dictionary;
Stop
}
Else
Go to next step;
2. Enter new word Id, headword and other
related information of the headword (POS,
synonyms and examples) of Assamese,
Bengali, English and Hindi languages.
Fig. 2: Level 1 DFD of E-ABH dictionary
3. Submit.
In Level 1 DFD, the Administrator and End- According to this algorithm, suppose, an
user are two external entities. The Administrator Administrator wants to enter a word (headword)
needs to login first; if the login is successful, then into this dictionary. The Administrator needs to
the Administrator can enter data into the E-ABH check desired word Id for the headword first. If
dictionary. The End-user can search the meaning the word Id is not available in the dictionary, then
of word. In addition, the End-user can also give the Administrator needs to also check the
feedback to the Administrator about the headword with its POS in the dictionary. If the
performance of the E-ABH dictionary. headword and its corresponding POS are not
available in the system, then the Administrator
IV. IMPLEMENTATION can enter the desired word Id, the headword and
The implementation part of E-ABH dictionary other related information of the word like word
contains three phases which are: meaning, POS, synonyms and example in the
dictionary.
A. Necessary Software
C. Word Search (or look up)
We have used PHP, HTML, CSS and JavaScript There are lots of word search techniques
as Front-End and MySQL as Back-End for the available for E-Dictionary. We have used
development of E-ABH dictionary [10][11][20]. Sequential Search Technique to look up (or
search) the meaning of the word quickly and
B. Data (or word) Entry easily in E-ABH dictionary.
In E-ABH dictionary, only the Administrator can Sequential Search Technique (SST) is the
enter data (or word). The Administrator needs to simplest and most popular word search technique
login first with proper username and password. If for electronic dictionaries It is a very useful and
the login is successful, then he/she can be able to efficient technique to look up the words easily
enter words into the dictionary based on the and quickly. If we want to search a particular
following word entry algorithm. word in a database table using SST, then the SST
checks each word one by one in sequence until
1. Enter word Id the desired word is found in the table. It starts to
If (found) compare with each word from the beginning of
the database table. In SST, the database table
{ need not be sorted. The average number of
Print- word Id already exists in the comparisons in SST is (N+1)/2, where N is the
dictionary; size of the row in the table. Its worst case cost is
Stop proportional to the number of elements in the list.
} The searching time for SST is O(n) [9][13]..
Else
ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-9, 2016
77
no reviews yet
Please Login to review.