Manual Page Result
0
Command: wototo | Section: 5 | Source: Digital UNIX | File: wototo.5.gz
Wototo(5) File Formats Manual Wototo(5)
NAME
Wototo, wototo - Introduction to the Thai language standard
DESCRIPTION
Wototo is the Thai language software standard. It describes Thai char-
acters and their classifications. This standard also describes the
methods used to input and output Thai characters.
Thai Character Sets
The following two character sets are defined for the Thai language: Ba-
sic character set Auxiliary character set
In the basic character set, characters are 8-bit coded and have values
from 0 to 255. Character values correspond to the characters defined in
standards as follows: Values 0 to 7F correspond to characters from the
ISO 646-1983 standard. Values A1 to FB (except for DB, DD and DE) cor-
respond to characters from the TIS 620-2533 standard. Remaining values
are reserved for future use.
The encoded form of the basic character set is called the the TACTIS
codeset, which is discussed in the TACTIS(5) reference page.
Characters in the auxiliary character set use the code values 32 to 126
and 161 to 254 only. The Wototo standard specifies that implementations
provide at least one auxiliary character set.
Character Classification
In the TACTIS codeset, characters are organized into different classes.
This classification is done only to facilitate processing is not re-
lated to Thai linguistic or grammatical rules. The codeset contains the
following character classes: Nondisplayable characters that are used
for controlling output or data communication. The sixty-six control
character values are: 00 to 1F, 7F, 80 to 9F, and FF. The Thai conso-
nants as defined in TIS 620-2533. The five leading vowels as defined
in TIS 620-2533. The six following vowels as defined in TIS 620-2533.
The two below vowels as defined in TIS 620-2533. The five above vowels
as defined in TIS 620-2533. The four tone marks as defined in TIS
620-2533. The four above diacritics as defined in TIS 620-2533. The
below diacritic as defined in TIS 620-2533. Those characters that do
not fit into preceding five character classes. This group includes 119
characters that users cannot compose with above vowels, below vowels,
tone marks, and above and below diacritics. Non-composible characters
are divided into the following seven groups: Graphic Characters
The 94 graphic defined in ISO 646-1983. These include: 52 Eng-
lish alphabetic characters 10 digits 32 special characters whose
values are 21 to 2F, 3A to 3F, and 7B to 7E Space
Character code value is 20. Nobreak space
Character code value is A0. Thai digits
The 10 Thai digits as defined in TIS 620-2533. Thai special
characters
The 6 Thai special characters as defined in TIS 620-2533. Word
separator
The word separator as defined in TIS 620-2533. Reserved code
points
6 code points reserved for future use.
To better describe Thai input and output methods, characters in the
classes FV, BV, AV, and AD are further divided into subclasses. The
following list describes character classes and subclasses by the number
of characters in the class and their encoded values: Number: 66
Values: 00 to 1F, 7F, 80 to 9F, and FF Number: 119
Values:
20 to 7E (ISO 646-1983 character codes)
A0, CF, DC, DF, E6, EF, F0 to F9, FA, and FB (TIS 620-2533 char-
acter codes)
DB, DD, DE FC, FD, and FE (Reserved code points) Number: 44
Values: A1 to C3, C5, and C7 to CE Number: 5
Values: E0, E1, E2, E3, and E4 Number: 3
Values: D0, D2, and D3 Number: 1
Value: E5 Number: 2
Values: C4 and C6
These two characters also behave as leading vowels (LV) in the
character sequence LV+CONS. Number: 1
Value: D8 Number: 1
Value: D9 Number: 1
Value: DA Number: 4
Values: E8, E9, EA, and EB Number: 2
Values: ED and EC Number: 1
Value: E7 Number: 1
Value: EE Number: 1
Value: D4 Number: 2
Values: D1 and D6 Number: 2
Values: D5 and D7
Character Levels
Thai characters are classified according to different display levels
(relative to baseline and nondisplayable). Classification by display
levels facilitates the character input procedures. There are five char-
acter classification levels. Four levels include displayable characters
and one level includes nondisplayable characters, as follows: Nondis-
playable level
Includes all control characters in the CTRL class. Base level
Includes all characters in the NON, CONS, FV, and LV classes.
Characters at this level are drawn on baseline. Above level
Includes all characters in the AD3, AV1, AV2, and AV3 classes.
Characters at this level are drawn immediately above final con-
sonants. Below level
Includes all characters in the BV1, BV2, and BD classes. Charac-
ters at this level are drawn immediately below final consonants.
Top level
Includes all characters in the TONE, AD1, and AD2 classes. Char-
acters at this level are drawn on top of the characters at the
above level. If above level characters do not exist, top level
characters are drawn at the above level. Characters at this
level also indicate the end of character cells.
The standard specifies that the properties of Thai characters can be
tested by using the following functions.
Note
These functions are not implemented in DIGITAL UNIX.
Determines the character level class that the character belongs to and
returns the numeric value 0, 1, 2, 3, or 4. These return values can be
represented by the constants NONDISP, TOP, ABOVE, BASE, or BELOW, re-
spectively. Returns TRUE if a character is alphabetic. Returns TRUE
if a character is either alphabetic or a digit. Returns TRUE if a
character belongs to the CTRL class. Returns TRUE if the character is
a digit. Returns TRUE if the character is not in the NONDISP level
class. Returns TRUE if the character is an English lowercase letter (a
to z). Returns TRUE if the character is an English uppercase letter (A
to Z). Returns TRUE if a character is not in the NONDISP level class.
Returns TRUE if the character is a space, formfeed, newline, return,
tab, vertical tab, or wordbreak character. Returns TRUE if the charac-
ter is a hexadecimal digit 0 to 9, A to F, or a to f. (Thai digits are
excluded.)
Thai Input Methods
The input method for Thai characters directly maps characters to keys,
as for English. Thai character sequences are entered character by char-
acter and display from left to right, regardless of whether the se-
quence includes forward characters (characters in the NON, CONS, LV,
FV1, FV2, FV3 classes) or dead characters (characters in all other
classes). However, the following basic rules apply to the character in-
put sequence: Every display cell must begin with a character on the
baseline (in the BASE class). A character in the BASE class that is
also in the CONS class may be followed by an above vowel, a below
vowel, a tone mark, a below diacritic, or an above diacritic.
For more detailed rules about input sequence rules, refer to the Draft
Industrial Standard - Thai Language Software Standard WTT2.0 (Part 2:
Thai Input and Output Methods)
SEE ALSO
Commands: locale(1)
Others: i18n_intro(5), i18n_printing(5), l10n_intro(5), TACTIS(5),
Thai(5)
Wototo(5)