*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->Tru64 Unix man pages -> eucTW (5)              
Title
Content
Arch
Section
 

eucTW(5)

Contents


NAME    [Toc]    [Back]

       eucTW  -  A character encoding system (codeset) for Traditional
 Chinese

DESCRIPTION    [Toc]    [Back]

       The Taiwanese EUC (Extended UNIX Code), or eucTW,  codeset
       consists  of the following character sets: ASCII CNS 11643
       (Plane 1 to Plane 16)

       Taiwanese EUC uses a combination of single-byte  data  and
       2-byte  data  to  represent ASCII characters, symbols, and
       ideographic characters. Because too many character  planes
       were  included, Taiwanese EUC uses different leading codes
       to designate different character planes.

       ASCII characters are represented in  the  form  of  single
       byte  7-bit  data in Taiwanese EUC; that is, the most significant
 bit (MSB) of the byte that  represents  an  ASCII
       character  is  always set off. For more information, refer
       to ascii(5).

       Although the standard Taiwanese EUC codeset  includes  all
       characters  defined  by  the  CNS 11643-1992 standard, the
       operating system's eucTW implementation currently supports
       the  following: Characters defined in the first and second
       planes of CNS 11643 The  EDPC  Recommended  Character  Set
       (refer to dechanyu(5) for more information) CNS 11643-1986
       and DTSCS characters that  have  been  remapped  into  the
       third  and  fourth  character planes by the CNS 11643-1992
       standard

       Characters that were added to CNS 11643-1986  by  the  CNS
       11643-1992 standard are not supported.

       The  characters that are defined in plane 1 and plane 2 of
       CNS 11643-1992 and that are the same as those  defined  in
       CNS 11643-1986 are as follows:

       --------------------------------------------------------------------
       Character Plane   Character Type               Number of Characters
       --------------------------------------------------------------------
       1                 Special characters           651
                         Control characters           33
                         Frequently-used characters   5401
       2                 Less frequently-used char-   7650
                         acters
       --------------------------------------------------------------------

       The characters defined in plane  3  and  plane  4  of  CNS
       11643-1992 are as follows:

       --------------------------------------------------------------------------
       Character Plane   Character Type                           Number      of
                                                                  Characters
       --------------------------------------------------------------------------
       3                 Rarely-used characters (EDPC Part I)     6148
       4                 Used for  residency  system,  ISO  2nd   7298
                         edition  DIS 10646 Han characters, 171
                         EDPC Part II Characters
       --------------------------------------------------------------------------

       The characters that have been remapped into the third  and
       fourth  character planes of CNS 11643-1992 as specified by
       the EDPC are as follows:

       ---------------------------------------------------------
       EDPC Characters   Character Plane   Number of Characters
       ---------------------------------------------------------
       Part I            Plane 3           6148
       Part II           Plane 4           171
       ---------------------------------------------------------


   Taiwanese EUC Encoding    [Toc]    [Back]
       Except  for  characters  in  the  first   plane   of   CNS
       11643-1986, Taiwanese EUC makes use of a leading code (the
       8-bit Single-Shift 2 control character (SS2) and an  additional
 byte) to designate characters to a character plane.

       The position of a character on a plane is specified by two
       bytes.  The first byte determines the character's row number
 and the second byte determines the character's  column
       number. The MSB of both bytes is set on.

       The  following  table  shows the encoding of Taiwanese EUC
       characters:

       -------------------------------------------------------
       CNS 11643-1986 Code Plane   Leading Code   Code Range
       -------------------------------------------------------
       1                           [nil]          A1A1 - FEFE
       2                           SS2 A2         A1A1 - FEFE
       3                           SS2 A3         A1A1 - FEFE
       4                           SS2 A4         A1A1 - FEFE
       5                           SS2 A5         A1A1 - FEFE
       6                           SS2 A6         A1A1 - FEFE
       7                           SS2 A7         A1A1 - FEFE
       8                           SS2 A8         A1A1 - FEFE
       9                           SS2 A9         A1A1 - FEFE
       10                          SS2 AA         A1A1 - FEFE
       11                          SS2 AB         A1A1 - FEFE
       12                          SS2 AC         A1A1 - FEFE
       13                          SS2 AD         A1A1 - FEFE
       14                          SS2 AE         A1A1 - FEFE
       15                          SS2 AF         A1A1 - FEFE
       16                          SS2 B0         A1A1 - FEFE
       -------------------------------------------------------


   Codeset Conversion    [Toc]    [Back]
       The following codeset converter pairs  are  available  for
       converting  Traditional  Chinese  characters between eucTW
       and other encoding formats.  Refer to  iconv_intro(5)  for
       an  introduction  to codeset conversion. For more information
 about the other codeset for which eucTW is the  input
       or  output,  see  the reference page specified in the list
       item.  big5_eucTW, eucTW_big5

              Converting from and to the Big-5 codeset:  big5(5).

              Note  that  Big-5  encoding  is  equivalent  to the
              Microsoft code-page format used on PCs  for  Traditional
  Chinese.  You can therefore use this set of
              converters  to  convert  Traditional  Chinese  text
              between  the  eucTW  and  PC code-page formats. For
              information about how the operating system supports
              PC  code  pages, see code_page(5).  dechanyu_eucTW,
              eucTW_dechanyu

              Converting from  and  to  the  DEC  Hanyu  codeset:
              dechanyu(5).  dechanzi_eucTW, eucTW_dechanzi

              Converting  from  and  to  the  DEC  Hanzi codeset:
              dechanzi(5).  sbig5_eucTW, eucTW_sbig5

              Converting from and to  the  Shift  Big-5  codeset:
              sbig5(5).  telecode_eucTW, eucTW_telecode

              Converting  from and to the Telecode codeset: telecode(5).  UTF-16_eucTW, eucTW_UTF-16

              Converting from and to UTF-16  format:  Unicode(5).
              UCS-4_eucTW, eucTW_UCS-4

              Converting  from  and  to UCS-4 format: Unicode(5).
              UTF-8_eucTW, eucTW_UTF-8

              Converting from and to UTF--8 format: Unicode(5).

   Fonts for Taiwanese EUC    [Toc]    [Back]
       For both display devices and printers, the operating  system
  supports Taiwanese EUC through internal conversion to
       DEC  Hanyu  code  and  use  of  DEC   Hanyu   fonts   (see
       dechanyu(5)).

       For  general  information  on  printing  non-English text,
       refer to i18n_printing(5).

SEE ALSO    [Toc]    [Back]

      
      
       Commands: locale(1)

       Others:  ascii(5),  big5(5),   Chinese(5),   code_page(5),
       dechanzi(5),    GBK(5),   iconv_intro(5),   i18n_intro(5),
       i18n_printing(5),  l10n_intro(5),  sbig5(5),  telecode(5),
       Unicode(5)



                                                         eucTW(5)
[ Back ]
 Similar pages
Name OS Title
gbk Tru64 A character encoding system (codeset) for Simplified Chinese
GBK Tru64 A character encoding system (codeset) for Simplified Chinese
dechanzi Tru64 A character encoding system (codeset) for Simplified Chinese
big5 FreeBSD ``Big Five'' encoding for Traditional Chinese text
iso8859-1 Tru64 A character encoding system (codeset)
ISO8859-1 Tru64 A character encoding system (codeset)
ISO8859-7 Tru64 A character encoding system (codeset) for Greek
iso-2022-jp Tru64 A character encoding system (codeset) for Japanese
ISO-2022-JP Tru64 A character encoding system (codeset) for Japanese
jiskanji Tru64 A character encoding system (codeset) for Japanese
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service