*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->Tru64 Unix man pages -> dechanzi (5)              
Title
Content
Arch
Section
 

dechanzi(5)

Contents


NAME    [Toc]    [Back]

       dechanzi  - A character encoding system (codeset) for Simplified
 Chinese

DESCRIPTION    [Toc]    [Back]

       The DEC Hanzi (dechanzi) codeset consists of the following
       character sets: ASCII GB2312-80 Extended GB

       DEC  Hanzi  uses  a 2-byte data representation for symbols
       and ideographic characters that are defined in  GB2312-80.

   ASCII Characters    [Toc]    [Back]
       All  ASCII  characters are represented in the form of single-byte,
 7-bit data in the DEC Hanzi  codeset;  that  is,
       the most significant bit (MSB) of the byte that represents
       an ASCII character is always set off. For more information
       on ASCII characters, refer to ascii(5).

   GB2312-80 Characters    [Toc]    [Back]
       The code table for GB2312-80 characters is divided into 94
       rows(Qu),  numbered  from  1  to  94.  Each  row  has   94
       columns(Wei),  also  numbered from 1 to 94. The code table
       defines a total of 7445 characters, of which 6763 are Chinese
  characters.  Chinese  characters are grouped as follows:
 Graphic symbols

              There are 682 graphic symbols, which occupy rows  1
              to  9 in the code table.  Frequently used (Level 1)
              characters

              There are 3755 frequently  used  characters,  which
              occupy  rows 16 to 55 in the code table.  Less frequently
 used (Level 2) characters

              There are 3008  less  frequently  used  characters,
              which occupy rows 56-87 in the code table.

       To  differentiate GB2312-80 character codes from ASCII and
       Extended GB character  codes,  the  most  significant  bit
       (MSB)  of  both the first byte and the second byte are set
       on. The following formulas show how to calculate the value
       for a GB2312-80 character from its row and column numbers:

       1st byte = A0 + Row number
       2nd byte = A0 + Column number

       For example, if a GB2312-80 character is in the first column
  of the 16th row, the character's value is B0A1, which
       is calculated as follows:

       1st byte = A0(hex) + 16 = B0(hex)
       2nd byte = A0(hex) + 01 = A1(hex)


   Extended GB Characters    [Toc]    [Back]
       The Extended GB code table is similar to the  GB2312  code
       table  and  is  divided  into 94 rows and 94 columns (8894
       code points). However, the Extended GB code table provides
       code  points  for  user-defined characters (UDC). The 8836
       code points in this table  are  divided  into  two  areas:
       User-defined area

              This area spans rows 1 to 87 and provides 8178 code
              points.  User-defined (reserved) area

              This area spans rows 88 to 94 and provides 658 code
              points. This area is where users can define special
              and long-lasting user-defined characters.

       To differentiate Extended GB codes from  ASCII  codes  and
       GB2312-80  codes,  the  most  significant bit (MSB) of the
       first byte is set on while that of the second byte is  set
       off.  The following formulas show how the code value of an
       Extended GB character is calculated from its row and  column
 numbers:

       1st byte = A0 + Row number
       2nd byte = 20 + Column number

       For  example,  if  a  character is positioned at the first
       column of the 16th row on the GB2312-80  code  plane,  the
       character's value is B021, which is calculated as follows:

       1st byte = A0(hex) + 16 = B0(hex)
       2nd byte = 20(hex) + 01 = 21(hex)


   Codeset Conversion    [Toc]    [Back]
       The following codeset converter pairs  are  available  for
       converting  Simplified Chinese characters between dechanzi
       and other encoding formats. Refer to iconv_intro(5) for an
       introduction  to  codeset conversion. For more information
       about the other codeset for which dechanzi is the input or
       output, see the reference page specified in the list item.
       big5_dechanzi, dechanzi_big5

              Converting from and to the Big-5  codeset:  big5(5)
              dechanyu_dechanzi, dechanzi_dechanyu

              Converting  from  and  to  the  DEC  Hanyu codeset:
              dechanyu(5) eucTW_dechanzi, dechanzi_eucTW

              Converting from  and  to  Taiwanese  Extended  UNIX
              Code: eucTW(5) UTF-16_dechanzi, dechanzi_UTF-16

              Converting  from  and  to UTF-16 format: Unicode(5)
              UCS-4_dechanzi, dechanzi_UCS-4

              Converting from and  to  UCS-4  format:  Unicode(5)
              UTF-8_dechanzi, dechanzi_UTF-8

              Converting from and to UTF-8 format: Unicode(5)

       DEC Hanzi encoding is identical to the Microsoft code-page
       format (cp936) used for Simplified Chinese  characters  on
       PC  systems.  However, DEC Hanzi supports fewer characters
       than supported by the code page. Therefore, using converters
 with dechanzi in the converter name to convert between
       cp936 and other formats can  result  in  some  data  loss.
       Refer  to  code_page(5) for more information about PC code
       pages.








   DEC Hanzi Fonts    [Toc]    [Back]
       The operating system  provides  both  screen  and  printer
       fonts  for DEC Hanzi characters. The operating system also
       provides bit map fonts in addition to the  TrueType  fonts
       described  in  this section. For a complete description of
       DEC Hanzi fonts, see the document, Technical Reference for
       Using Chinese Features.

       The following set of Simplified Chinese TrueType fonts are
       installed as the operating system default  fonts  for  DEC
       Hanzi:                 -css_dongwen-fangsong-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
      -css_dongwen-fangsongmedium-r-normal--0-0-0-0-c-0-gb2312.1980-1
   -css_dongwenfangsong-medium-r-normal--0-0-0-0-c-0-iso8859-1


       -css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
  -css_dongwen-heiti-mediumr-normal--0-0-0-0-c-0-gb2312.1980-1
    -css_dongwen-heitimedium-r-normal--0-0-0-0-c-0-iso8859-1


       -css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
  -css_dongwen-kaiti-mediumr-normal--0-0-0-0-c-0-gb2312.1980-1
    -css_dongwen-kaitimedium-r-normal--0-0-0-0-c-0-iso8859-1


       -css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
 -css_dongwen-songti-mediumr-normal--0-0-0-0-c-0-gb2312.1980-1
   -css_dongwen-songtimedium-r-normal--0-0-0-0-c-0-iso8859-1




       The following set of Simplified Chinese TrueType fonts are
       available as an  installation  option:  -huatian-fangsongmedium-r-normal--0-0-0-0-c-0-gb2312.1980-0
  -huatian-fangsong-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
  -huatianfangsong-medium-r-normal--0-0-0-0-m-0-iso8859-1


       -huatian-heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
       -huatian-heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
       -huatian-heiti-medium-r-normal--0-0-0-0-m-0-iso8859-1

       -huatian-kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
       -huatian-kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
       -huatian-kaiti-medium-r-normal--0-0-0-0-m-0-iso8859-1

       -huatian-songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
       -huatian-songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
       -huatian-songti-medium-r-normal--0-0-0-0-m-0-iso8859-1


       With  either  the default or optional font sets installed,
       the SongTi fonts are the default screen fonts for the  DEC
       Hanzi codeset.

       The  operating  system  provides  the following PostScript
       printer fonts  for  DEC  Hanzi  characters:  Hei-GB2312-80
       XiSong-GB2312-80

       For  general  information on printing Asian language text,
       refer to i18n_printing(5).




SEE ALSO    [Toc]    [Back]

      
      
       Commands: locale(1)

       Others:  ascii(5),  big5(5),   Chinese(5),   code_page(5),
       dechanyu(5),  eucTW(5), GB18030(5), GBK(5), i18n_intro(5),
       i18n_printing(5), iconv_intro(5), l10n_intro(5), sbig5(5),
       telecode(5), Unicode(5)



                                                      dechanzi(5)
[ Back ]
 Similar pages
Name OS Title
sbig5 Tru64 A character encoding system (codeset) for Traditional Chinese
telecode Tru64 A character encoding system (codeset) for Traditional Chinese
dechanyu Tru64 A character encoding system (codeset) for Traditional Chinese
big5 Tru64 A character encoding system (codeset) for Traditional Chinese
eucTW Tru64 A character encoding system (codeset) for Traditional Chinese
iso8859-1 Tru64 A character encoding system (codeset)
ISO8859-1 Tru64 A character encoding system (codeset)
ISO8859-7 Tru64 A character encoding system (codeset) for Greek
eucJP Tru64 A character encoding system (codeset) for Japanese
ISO8859-5 Tru64 A character encoding system (codeset) for Russian
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service