216x Filetype PDF File size 0.46 MB Source: www.cs.cmu.edu
8.3 The Perso-Arabic Standard for iv. Numerals are placed after erabs and superscripts.
Information Interchange (This is provided only to support display for
The standard proposed by C-DAC GIST is an language specific numerals and standard
extension to the standard 8-bit ASCII. It numerals i.e. the ASCII numerals are available).
compliments the symbol set of Latin script by Standardization of Perso-Arabic Fonts
adding the symbol of the Perso-Arabic scripts. The
standard supports storage for the Perso-Arabic Characteristics of Perso-Arabic languages :
languages like Urdu, Persian, Sindhi, Kashmiri, and
Arabic. Perso-Arabic languages are written in Naskh &
Characteristics Nastaliq scripts. Urdu & Kashmiri are traditionally
written in the Nastaliq script ; while Sindhi is written
i. Its a 8-bit standard in Naskh script. Although the script employs basic
ii. Supports letters for Urdu, Arabic, Sindhi, letters of the language, the rendering of these letters
Kashmiri in a word is extremely complex. The reason for
this complexity is that the text has traditionally been
iii. Defines Perso-Arabic alphabets in the upper composed through calligraphy, a medium whose
ASCII (This leaves the lower ASCII free. The precepts are based on the aesthetic sense of the
lower ASCII can be used for English alphabets calligrapher rather than on any formula. So great
e.g. to give a bi-lingual font support). is the variation in calligraphy that many times it is
iv. Defines numerals other than ASCII numbers difficult to recognize the letters in a constituent
(48 to 57) (This may help supporting both word. This is because, in their calligraphed form,
Arabic Numerals 0-9 and language specific the individual letters partially or completely fused
numerals) into each other thereby losing their identity. A
degree of fusion is purposely introduced to make
v. Maintains the order of alphabets for Perso- the resulting fused glyph visually appealing.
Arabic languages. Another characteristic of the Perso-Arabic languages
vi. Alphabets / letters are placed in their ascending is the use of diacritics. Diacritics, although sparingly
order. Letters like “bhey” are not provided for used, help in the proper pronunciation of the
URDU but kept for languages like Sindhi. constituent word. The diacritics appear above or
Urdu may make use of the digraph “be” and below a character to specify a vowel or emphasize a
“choTi-he” for that. particular sound. These are essential for the removal
vii. Minimal erabs are provided. Tanveen, for of ambiguities, natural language processing and
example do-zabar, can be formed with the help speech synthesis.
of double zabar.
vii. Unicode compatability can be achieved by Standardization of Glyph Set
having PASCII to UNICODE & viceversa Following was taken into consideration while
converter. designing fonts for the Perso-Arabic languages.
Superscripts Considering the complexities of the script it was
i. Place for superscripts like khaRa-alif is provided not possible to accommodate all the glyphs /
ligatures in an 8 bit code space. Hence 16 – bit
ii. Place for superscripts for Arabic is provided font code space was considered.
iii. Place for superscripts like “re-ze”, “ain”, etc. is 1. Alphabet
provided.
58 October 2002
2. Numerals • Includes Beginning ligatures
3. Special characters • Includes Middle ligatures
4. Diacritics • Includes Ending ligatures.
5. Religious and linguistic symbols • Includes dotted circle glyph
6. Control characters India
The 16-bit Nastaliq font for Urdu & Kashmiri India is a paradise in the foot of the great Himalayas
Fonts developed by C-DAC for Urdu & Kashmiri in the northern end and lies cocooned by huge
are 16-bit. The Glyphs are defined in the User Area oceans on the other three sides. While the Arabian
of the Unicode range. The ASCII range is not used Sea borders the southwest side, the southeast is lulled
and can be used for different purposes (it can be by the Bay of Bengal, and the southern tip - Kanya
used to support English for example). Kumari (Cape Comorin) is washed by the Indian
Ocean. Hence protected by such natural barriers
• Includes all the basic shapes like mountains and water, it is separated from the
• Includes all the starting shapes and variations rest of Asia. For geographers, it lies to the north of
the equator between 8.4 and 37.6 degrees north
• Includes all the middle shapes and variations latitude and 68.7 and 97.25 degrees east longitude.
• Includes all the ending shapes and variations India measures 3214 km from north to south and
2933 kms from east to west. it has a land frontier
• Includes levels for erabs (short vowels) of 15,200 kms and a coastline of 7516.5 kms.
• Includes Complete ligatures India shares its political borders with Pakistan and
• Includes Beginning ligatures Afghanistan on the west; Bangladesh and Myanmar
in the east; Nepal, China, Tibet and Bhutan in the
• Includes Middle ligatures north.The Capital of India is New Delhi.
• Includes Ending ligatures Languages
• Includes dotted circle glyph India has 18 officially recognized languages among
The 16-bit Naskh font for Sindhi, Urdu & about 200 languages as enumerated in the census.
Kashmiri. Names of Languages
Font developed by C-DAC for Sindhi, Urdu & th
Following languages are listed in the 8 schedule
Kashmiri are 16-bit. The Glyphs are defined in the of the Constitution (given in Devanagri order):
User Area of the Unicode range. The ASCII range • Assamese
is not used and can be used for different purposes
(it can be used to support English for example). • Urdu
• Includes all the basic shapes • Oriya
• Includes all the starting shapes • Kannada
• Includes all the middle shapes • Kashmiri
• Includes all the ending shapes • Konkani
• Includes levels for erabs (short vowels) • Gujarati
• Includes Complete ligatures • Tamil
October 2002 59
• Telugu plexities of rendering, a number of alternate shapes
• Nepali are possible for a single letter, considering its posi-
tion in the word and the letter next to it. Due to
• Punjabi this nature of Nastaliq, it increases the glyph set for
• Bengali the language.
• Manipuri The characters of Urdu also need diacritics to help
• Marathi in a proper pronunciation of the constituent word.
There are a number of diacritics, the common ones
• Malayalam being Zabar, Zer, and Pesh.
• Sanskrit History of Urdu language
• Sindhi The word Urdu means ‘Lashkar’, derived from the
• Hindi Turkish language meaning 'armies'. In the south
Urdu Design Guide : General Information of India it flourished under the name of Dakhani
and southwest as Gurjari while in Delhi its name
Introduction changed from Hindi to Hindavi and Hindustani.
This document provides general information about Alternate names of Urdu are DAKHINI(DAKANI,
the Urdu language and some conventions of its DECCAN, DESIA, MIRGAN), PINJARI,
usage in India. REKHTA (REKHTI).
The information presented in this document is in- Population using the Urdu Language
tended to assist in understanding the nature and 48,062,000 in India (1997 IMA);
problems of Urdu implementation in the digital 10,719,000 in Pakistan (1993), or 7.57% of the
medium. It contains the generic description of population;
Urdu.
Urdu is one of the official languages of India. It is 600,000 in Bangladesh;
the official language of Pakistan, and spoken in 64,000 in Mauritius (1993 Johnstone).
various countries around the world. 170,000 in South Africa (1987).
Language Description 18,500 in Bahrain (1979 WA);
Urdu belongs to the Indo-Aryan subgroup of the 17,800 in Oman (1980 WA);
Indo-European family of languages. It has devel-
oped with the heavy influences of Arabic, Persian 15,400 in Qatar;
and Turkish languages. Urdu writing system is a 382,000 in Saudi Arabia;
super set of Arabic and Persian and contains 39
characters. Urdu is written from right side to left. 3,562 in Fiji (1980 WA);
Unlike English, the characters do not have upper 23,000 in Germany;
and lower cases. Further, the shape assumed by a
character in a word is context-sensitive i.e. the shape 14,000 in Norway;
is different depending whether the position of the Totals :
character is at the beginning, in the middle or at
the end of the constituent word. 60,290,000 or more in all countries
Urdu is traditionally written in Nastaliq, a script 104,000,000 including second language users
rich in calligraphic content. Owing to the com- (1999 WA).
60 October 2002
PASCII (Perso-Arabic Standard for Information Interchange) Version 1.0
128 144 160 176 192 208 224 240
8 9 A B C D E F
0 9 y ¶ ª k 4 -
1 Kasheeda õ |»ëm5/
2 @ _øÀÇl6;
3 + ö Åঠ7:
4 B c üÿ 8?
ý gûÈ è 9=
5 ÿ
6
ò lþµ[ !.
7 G Ê Êk» fg
8 ú n/ø Õ/±nÔÂ---------} e h
9 L È ¢ à r % cb
A Ó Å ¦ Û/Ö o / d Reserved
B ô ô p « L i à ( Reserved
C Q / l sM
0)ß
D ó u/ù ±áZ1*.
E V ±ù û {ë2+Reserved
F [w÷{j3ATR Reserved
October 2002 61
no reviews yet
Please Login to review.