Wototo, wototo - Introduction to the Thai language standard
Wototo is the Thai language software standard. It
describes Thai characters and their classifications. This
standard also describes the methods used to input and output
Thai characters.
Thai Character Sets [Toc] [Back]
The following two character sets are defined for the Thai
language: Basic character set Auxiliary character set
In the basic character set, characters are 8-bit coded and
have values from 0 to 255. Character values correspond to
the characters defined in standards as follows: Values 0
to 7F correspond to characters from the ISO 646-1983 standard.
Values A1 to FB (except for DB, DD and DE) correspond
to characters from the TIS 620-2533 standard.
Remaining values are reserved for future use.
The encoded form of the basic character set is called the
the TACTIS codeset, which is discussed in the TACTIS(5)
reference page.
Characters in the auxiliary character set use the code
values 32 to 126 and 161 to 254 only. The Wototo standard
specifies that implementations provide at least one auxiliary
character set.
Character Classification [Toc] [Back]
In the TACTIS codeset, characters are organized into different
classes. This classification is done only to
facilitate processing is not related to Thai linguistic or
grammatical rules. The codeset contains the following
character classes: Nondisplayable characters that are used
for controlling output or data communication. The sixtysix
control character values are: 00 to 1F, 7F, 80 to 9F,
and FF. The Thai consonants as defined in TIS 620-2533.
The five leading vowels as defined in TIS 620-2533. The
six following vowels as defined in TIS 620-2533. The two
below vowels as defined in TIS 620-2533. The five above
vowels as defined in TIS 620-2533. The four tone marks as
defined in TIS 620-2533. The four above diacritics as
defined in TIS 620-2533. The below diacritic as defined
in TIS 620-2533. Those characters that do not fit into
preceding five character classes. This group includes 119
characters that users cannot compose with above vowels,
below vowels, tone marks, and above and below diacritics.
Non-composible characters are divided into the following
seven groups: Graphic Characters
The 94 graphic defined in ISO 646-1983. These
include: 52 English alphabetic characters 10 digits
32 special characters whose values are 21 to 2F, 3A
to 3F, and 7B to 7E Space
Character code value is 20. Nobreak space
Character code value is A0. Thai digits
The 10 Thai digits as defined in TIS 620-2533.
Thai special characters
The 6 Thai special characters as defined in TIS
620-2533. Reserved code points
6 code points reserved for future use.
To better describe Thai input and output methods, characters
in the classes FV, BV, AV, and AD are further divided
into subclasses. The following list describes character
classes and subclasses by the number of characters in the
class and their encoded values: Number: 66
Values: 00 to 1F, 7F, 80 to 9F, and FF Number: 119
Values:
20 to 7E (ISO 646-1983 character codes)
A0, CF, DC, DF, E6, EF, F0 to F9, FA, and FB (TIS
620-2533 character codes)
DB, DD, DE FC, FD, and FE (Reserved code points)
Number: 44
Values: A1 to C3, C5, and C7 to CE Number: 5
Values: E0, E1, E2, E3, and E4 Number: 3
Values: D0, D2, and D3 Number: 1
Value: E5 Number: 2
Values: C4 and C6
These two characters also behave as leading vowels
(LV) in the character sequence LV+CONS. Number: 1
Value: D8 Number: 1
Value: D9 Number: 1
Value: DA Number: 4
Values: E8, E9, EA, and EB Number: 2
Values: ED and EC Number: 1
Value: E7 Number: 1
Value: EE Number: 1
Value: D4 Number: 2
Values: D1 and D6 Number: 2
Values: D5 and D7
Character Levels [Toc] [Back]
Thai characters are classified according to different display
levels (relative to baseline and nondisplayable).
Classification by display levels facilitates the character
input procedures. There are five character classification
levels. Four levels include displayable characters and one
level includes nondisplayable characters, as follows:
Nondisplayable level
Includes all control characters in the CTRL class.
Base level
Includes all characters in the NON, CONS, FV, and
LV classes. Characters at this level are drawn on
baseline. Above level
Includes all characters in the AD3, AV1, AV2, and
AV3 classes. Characters at this level are drawn
immediately above final consonants. Below level
Includes all characters in the BV1, BV2, and BD
classes. Characters at this level are drawn immediately
below final consonants. Top level
Includes all characters in the TONE, AD1, and AD2
classes. Characters at this level are drawn on top
of the characters at the above level. If above
level characters do not exist, top level characters
are drawn at the above level. Characters at this
level also indicate the end of character cells.
The standard specifies that the properties of Thai characters
can be tested by using the following functions.
Note
These functions are not implemented in Tru64 UNIX.
Determines the character level class that the character
belongs to and returns the numeric value 0, 1, 2, 3, or 4.
These return values can be represented by the constants
NONDISP, TOP, ABOVE, BASE, or BELOW, respectively.
Returns TRUE if a character is alphabetic. Returns TRUE
if a character is either alphabetic or a digit. Returns
TRUE if a character belongs to the CTRL class. Returns
TRUE if the character is a digit. Returns TRUE if the
character is not in the NONDISP level class. Returns TRUE
if the character is an English lowercase letter (a to z).
Returns TRUE if the character is an English uppercase letter
(A to Z). Returns TRUE if a character is not in the
NONDISP level class. Returns TRUE if the character is a
space, formfeed, newline, return, tab, or vertical tab.
Returns TRUE if the character is a hexadecimal digit 0 to
9, A to F, or a to f. (Thai digits are excluded.)
Thai Input Methods [Toc] [Back]
The input method for Thai characters directly maps characters
to keys, as for English. Thai character sequences are
entered character by character and display from left to
right, regardless of whether the sequence includes forward
characters (characters in the NON, CONS, LV, FV1, FV2, FV3
classes) or dead characters (characters in all other
classes). However, the following basic rules apply to the
character input sequence: Every display cell must begin
with a character on the baseline (in the BASE class). A
character in the BASE class that is also in the CONS class
may be followed by an above vowel, a below vowel, a tone
mark, a below diacritic, or an above diacritic.
For more detailed rules about input sequence rules, refer
to the Draft Industrial Standard - Thai Language Software
Standard WTT2.0 (Part 2: Thai Input and Output Methods)
Commands: locale(1)
Others: i18n_intro(5), i18n_printing(5), l10n_intro(5),
TACTIS(5), Thai(5)
Wototo(5)
[ Back ] |