iconv_KEIS - Tru64

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->Tru64 Unix man pages -> iconv_KEIS (5)

iconv_KEIS(5)

NAME [Toc] [Back]

       iconv_KEIS  -  Specification  for  controlling  conversion
       between Hitachi KEIS and Tru64 UNIX Japanese codesets

DESCRIPTION [Toc] [Back]

       The iconv utility supports  the  ability  to  convert  the
       encoding  of  characters  between Hitachi KEIS (Kanji processing
 Extended Information System) code and one  of  the
       following Tru64 UNIX codesets: DEC Kanji, Super DEC Kanji,
       Japanese EUC, or Shift JIS. You choose the type of conversion
  by  specifying  the appropriate values for the utility's
 from-code and to-code parameters, as follows:

       ------------------------------------------------
       Type of Code Conversion   from-code   to-code
       ------------------------------------------------
       KEIS to DEC Kanji         KEIS        deckanji
       KEIS to Super DEC Kanji   KEIS        sdeckanji
       KEIS to Japanese EUC      KEIS        eucJP
       KEIS to Shift JIS         KEIS        SJIS
       DEC Kanji to KEIS         deckanji    KEIS
       Super DEC Kanji to KEIS   sdeckanji   KEIS
       Japanese EUC to KEIS      eucJP       KEIS
       Shift JIS to KEIS         SJIS        KEIS
       ------------------------------------------------

       Conversion behavior for the following items is affected by
       the definition of environment variables or profile entries
       in the user's environment. For more information,  see  the
       "Environment  Variables"  and "Profile" sections.  The UDC
       (User-Defined Character) mapping table that  is  used  for
       UDC conversion

              This table must be an ASCII text file that contains
              UDC mapping information.  The table affects conversion
  of  user-defined characters between the codesets.
  The EBCDIC  to/from  ISO  code  (ASCII,  JIS
              Roman  characters)  mapping  table that is used for
              conversion

              This table must be ASCII text  file  that  contains
              information on how to map characters between EBCDIC
              and ISO code.  The K-shift code

              This is a one- or two-byte  hexadecimal  code  that
              marks  the  beginning  of  Kanji mode.  The A-shift
              code

              This is a one- or two-byte  hexadecimal  code  that
              marks  the beginning of EBCDIC mode.  The status of
              the initial mode (Kanji  or  EBCDIC)  at  the  time
              iconv  command starts or the first time the iconv()
              function is called after calling  the  iconv_open()
              function  that  initializes the converter in a program


              The  status  keywords  are  either  kanji_mode   or
              ebcdic_mode.   How  to  treat  undefined characters
              when these are detected in Kanji mode

              Specify this action by using one of  the  following
              keywords:  Stop  codeset  conversion.   Output  the
              undefined characters  without  any  processing  and
              continue  codeset conversion.  Output padding characters
 instead of the undefined characters and continue
  codeset  conversion.   Ignore  the undefined
              characters and continue  codeset  conversion.   The
              two-byte padding character used in Kanji mode

              This value is meaningful when replace is chosen for
              the processing of  undefined  characters  in  Kanji
              mode. Specify the padding character by its hexadecimal
 value.  How to treat undefined characters when
              these are detected in EBCDIC mode

              Specify  this  action by using one of the following
              keywords:  Stop  codeset  conversion.   Output  the
              undefined  characters  without  any  processing and
              continue codeset conversion.  Output padding  characters
 instead of the undefined characters and continue
 codeset  conversion.   Ignore  the  undefined
              characters  and  continue  codeset conversion.  The
              one-byte padding character used in EBCDIC mode

              This value is meaningful when replace is chosen for
              the  processing  of  undefined characters in EBCDIC
              mode. Specify the padding character by its hexadecimal
 value.

       When the to-code parameter for the conversion is KEIS, you
       can also specify the following items for conversion behavior:
 Whether the initial shift code is output at the start
       of conversion if the status of the initial mode (Kanji  or
       EBCDIC)  is  different  from  the  mode of the first input
       character

              The start of conversion is the time the iconv utility
 starts processing, or when the iconv() function
              is called just after  opening  the  converter  with
              iconv_open().  Keyword values for this item are yes
              or no.  Whether or not the utility outputs the last
              shift  code  when  iconv()  is  called  with a zero
              length input string, and the current mode (Kanji or
              EBCDIC) is different from the mode specified by the
              last shift state

              Keyword values for this item are yes  or  no.   The
              last status (Kanji mode or EBCDIC mode)

              Specify  kanji_mode  or ebcdic_mode for this value.
              It is meaningful only when yes is the  setting  for
              whether the utility outputs the last shift code.

       If  the  items that control conversion behavior are specified
 by both environment variables and the  profile  file,
       values set by environment variables override values set by
       comparable entries in the profile. Note  that  values  for
       all  conversion  control items are case-sensitive, whether
       they are set by environment variables or in  the  profile.
       The  following  table contains the default values for each
       conversion control item:







       ----------------------------------------------------
       Conversion Control Item               Default Value
       ----------------------------------------------------
       UDC mapping table                     None
       K shift code                          0x0a42
       A shift code                          0x0a41
       Initial state                         ebcdic_mode
       Processing for undefined characters
       in Kanji mode                         abort
       Processing for undefined characters
       in EBCDIC mode                        pass
       ----------------------------------------------------

       The default padding characters  are  white  spaces,  whose
       code  values for each destination codeset are noted in the
       following table. These padding characters are output  when
       you specify replace for processing of undefined characters
       and do not explicitly specify the padding character.

       ---------------------------------------------------
       Mode          Default Value   Destination Codeset
       ---------------------------------------------------
       Kanji mode    0xa1a1          KEIS, deckanji,
                                     sdeckanji, or eucJP
                     0x8140          SJIS
       EBCDIC mode   0x40            KEIS
                     0x20            deckanji, sdeckanji,
                                     eucJP, or SJIS
       ---------------------------------------------------

       The default EBCDIC-ISO mapping table is  as  follows;  For
       conversion     from     KEIS     to     other    codesets:
       /usr/lib/nls/loc/iconv/data/ebcdic_kana.tbl For conversion
       from         other         codesets        to        KEIS:
       /usr/lib/nls/loc/iconv/data/kana_ebcdic.tbl

       These mapping tables map both EBCDIC and ISO  code,  which
       includes JIS Roman characters. The kana_ebcdic.tbl mapping
       table also maps ISO lowercase characters to EBCDIC  uppercase
 characters.

       The  following default values for conversion control items
       are meaningful when the iconv utility's to-code conversion
       parameter is KEIS:

       ---------------------------------------------
       Conversion Control Item          Default
       ---------------------------------------------
       Output the initial shift code?   yes
       Output the last shift code?      yes
       Output the last status?          ebcdic_mode
       ---------------------------------------------


   Environment Variables    [Toc]    [Back]
       This  section discusses the environment variables that you
       can set to control  conversion  behavior.  The  names  for
       these variables adhere to the following format:

       fromcode_tocode_controlitem

       The name segments for fromcode or tocode can be one of the
       following key words:



       ----------------------------
       For Codeset:      Use:
       ----------------------------
       Hitachi KEIS      KEIS
       DEC Kanji         DECKANJI
       Super DEC Kanji   SDECKANJI
       Japanese EUC      EUCJP
       Shift JIS         SJIS
       ----------------------------

       The name segments for controlitem can be one of  the  following
 keywords:

       --------------------------------------------------------
       For Control Item:                    Use:
       --------------------------------------------------------
       UDC mapping table                    UDC_TABLE
       EBCDIC-ISO mapping table             EBCDIC_TABLE
       K shift code                         K_SHIFT_CODE
       A shift code                         A_SHIFT_CODE
       Initial state                        INITIAL_STATE
       Processing of undefined characters
       in Kanji mode                        KANJI_EXCEPT_PROC
       Processing of undefined characters
       in EBCDIC mode                       EBCDIC_EXCEPT_PROC
       Padding characters
       in Kanji mode                        PADDING_2BYTE_CHAR
       Padding characters
       in EBCDIC mode                       PADDING_1BYTE_CHAR
       Output initial
       shift code                           INITIAL_SHIFT_CODE
       Output last
       shift code                           TRAILER_SHIFT_CODE
       Last status                          LAST_STATE
       File path of the profile             PROFILE
       --------------------------------------------------------

       Following are examples of using the setenv C shell command
       to define  environment  variables  to  control  conversion
       behavior.  In  these  examples,  the fromcode name segment
       indicates Japanese EUC and the tocode name  segment  indicates
 KEIS:

       setenv   EUCJP_KEIS_UDC_TABLE   eucjp_keis_udc.tbl  setenv
       EUCJP_KEIS_EBCDIC_TABLE       ebcdic_kana.tbl       setenv
       EUCJP_KEIS_K_SHIFT_CODE            0x0a42           setenv
       EUCJP_KEIS_A_SHIFT_CODE  0x0a41   setenv   EUCJP_KEIS_INITIAL_STATE
 ebcdic_mode setenv EUCJP_KEIS_KANJI_EXCEPT_PROC
       replace   setenv   EUCJP_KEIS_EBCDIC_EXCEPT_PROC   replace
       setenv    EUCJP_KEIS_PADDING_2BYTE_CHAR    0xa1a1   setenv
       EUCJP_KEIS_PADDING_1BYTE_CHAR 0x40 setenv  EUCJP_KEIS_INITIAL_SHIFT_CODE
  yes  setenv EUCJP_KEIS_TRAILER_SHIFT_CODE
       yes  setenv   EUCJP_KEIS_LAST_STATE   ebcdic_mode   setenv
       EUCJP_KEIS_INITIAL_SHIFT_CODE          yes          setenv
       EUCJP_KEIS_TRAILER_SHIFT_CODE          yes          setenv
       EUCJP_KEIS_LAST_STATE  ebcdic_mode  setenv EUCJP_KEIS_PROFILE
 .eucjp_keis_profile










   Directory Search Path    [Toc]    [Back]
       When you specify a file  name  without  a  directory,  the
       iconv  utility searches the following directories and uses
       the first file found: Current directory Home directory The
       subdirectory  iconv/data of the directory specified by the
       environment variable  LOCPATH  /usr/lib/nls/loc/iconv/data
       /usr/i18n/lib/nls/loc/iconv/data

       If  you  specify a relative directory path for a file, the
       utility searches these same directories in the same  order
       and uses the first file found.

   Profile File    [Toc]    [Back]
       Entry  lines  in  the profile file adhere to the following
       format:

       entry_name        string_value

       The entry_name and string_value fields  are  separated  by
       spaces   or   tabs.  Do  not  append  a  colon  (:)  after
       entry_name. The file can also include blank lines and comment
 entries, which begin with the # character.

       Following  are the entry_name values for different conversion
 control items:

       ------------------------------------------------------------
       Conversion Control Item           entry_name
       ------------------------------------------------------------
       UDC mapping table                 udc_mapping_table
       EBCDIC-ISO mapping table          ebcdic_mapping_table
       K shift code                      k_shift_code
       A shift code                      a_shift_code
       Initial state                     initial_state
       Processing undefined characters
       in Kanji mode                     kanji_except_proc
       Processing undefined characters
       in EBCDIC mode                    ebcdic_except_proc
       Padding character
       in Kanji mode                     padding_2byte_char
       Padding character
       in EBCDIC mode                    padding_1byte_char
       Output initial
       shift code                        output_initial_shift_code
       Output last
       shift code                        output_trailer_shift_code
       Last state                        last_state
       ------------------------------------------------------------

       Following is a sample profile for converting from Japanese
       EUC to Hitachi KEIS:

       #  #   sample  profile  for eucJP_KEIS # udc_mapping_table
       eucjp_keis_udc.tbl                    ebcdic_mapping_table
       kana_ebcdic.tbl   k_shift_code                      0x0a42
       # ebcdic -> kanji  a_shift_code                     0x0a41
       #        kanji        ->        ebcdic       initial_state
       ebcdic_mode    kanji_except_proc                   replace
       ebcdic_except_proc              replace padding_2byte_char
       0xa1a1            #    kanji    mode    padding_1byte_char
       0x40             #  ebcdic  mode output_initial_shift_code
       yes   output_trailer_shift_code          yes    last_state
       ebcdic_mode

       The default file names for the profile are as follows;

       --------------------------------------------------
       Code Conversion           Default Profile Name
       --------------------------------------------------
       KEIS to DEC Kanji         .keis_deckanji_profile
       KEIS to Super DEC Kanji   .keis_sdeckanji_profile
       KEIS to Shift JIS         .keis_sjis_profile
       KEIS to Japanese EUC      .keis_eucjp_profile
       DEC Kanji to KEIS         .deckanji_keis_profile
       Super DEC Kanji to KEIS   .sdeckanji_keis_profile
       Shift JIS to KEIS         .sjis_keis_profile
       Japanese EUC to KEIS      .eucjp_keis_profile
       --------------------------------------------------

       By  default, the iconv utility checks the directory search
       path mentioned in the "Directory Search Path" section  and
       uses  the  first  profile  it finds. However, you can also
       specify an arbitrary file path for your profile instead of
       the  default  names  by defining the following environment
       variables:

       ------------------------------------------------------------
       Code Conversion           Profile Path Environment Variable
       ------------------------------------------------------------
       KEIS to DEC Kanji         KEIS_DECKANJI_PROFILE
       KEIS to Super DEC Kanji   KEIS_SDECKANJI_PROFILE
       KEIS to Shift JIS         KEIS_SJIS_PROFILE
       KEIS to Japanese EUC      KEIS_EUCJP_PROFILE
       DEC Kanji to KEIS         DECKANJI_KEIS_PROFILE
       Super DEC Kanji to KEIS   SDECKANJI_KEIS_PROFILE
       Shift JIS to KEIS         SJIS_KEIS_PROFILE
       Japanese EUC to KEIS      EUCJP_KEIS_PROFILE
       ------------------------------------------------------------


   UDC Mapping Table    [Toc]    [Back]
       Entries in a UDC mapping table  adhere  to  the  following
       format:

       fromcode      tocode

       Each  of these values is a two-byte hexadecimal number. In
       the case of Super DEC Kanji and Japanese  EUC,  three-byte
       hexadecimal  values  that  begin  with SS3 (0x8f), such as
       0x8fxxxx, are also valid.

       You can specify ranges of UDC from and to  values  in  the
       same  file  entry  by using a hyphen to separate the codes
       that start and end each range:

       start_fromcode-end_fromcode   start_tocode-end_tocode

       When specifying entries that include ranges of values, the
       number  of  codes  in the from range must always equal the
       number of codes in the to range. A UDC mapping  table  can
       also  include  blank  lines and comment lines, which begin
       with the # character. Following is an  example  of  a  UDC
       mapping table:

       # KEIS                  eucJP

       0x81a1-0x8afe             0xf5a1-0xfefe             #  udc
       0x8ba1-0x94fe            0x8ff5a1-0x8ffefe         #   udc
       0x95a1-0x9afe             0x8feea1-0x8ff3fe         #  udc
       0x9ba1-0x9bfe           0x8ff4a1-0x8ff4fe       # udc

       The first entry in this file specifies  a  range  of  KEIS
       values  from  0x80a1 to 0x8afe that are mapped to Japanese
       EUC code values in the range 0xf5a1  to  0xfefe.  You  can
       find  additional  sample  UDC  mapping  table files in the
       /usr/i18n/examples/iconv/data directory.

   EBCDIC-ISO Mapping Table    [Toc]    [Back]
       Entries in an EBCDIC-ISO mapping table adhere to the  following
 format:

       fromcode       tocode

       Each  code is a one-byte hexadecimal number. You can specify
 a range of character codes as follows:

       start_fromcode-end_fromcode     start_tocode-end_tocode

       When using the range format, the number of hex  values  in
       the  from range must be the same as the number of hex values
 in the to range.

       The EBCDIC-/ISO mapping table can also include blank lines
       and comment entries, which begin with the # character.

       Following is an example of EBCDIC-ISO code mapping table:

       # EBCDIC                Kana

       0x40                      0x20              #  space  0x4f
       0x21              #   '!'   0x7f                      0x22
       # '"'
         .                       .
         .                       .
         .                                 .            0xc1-0xc9
       0x41-0x49           #     'A'     -     'I'      0xd1-0xd9
       0x4a-0x52            #     'J'     -     'R'     0xe2-0xe9
       0x53-0x5a       # 'S' - 'Z'
         .                       .
         .                       .
         .                       .

       In this example, the first column of values are from codes
       and  the  second column of values are to codes.  The first
       three value entry lines specify mapping for single characters,
  whereas  the  last  three value entry lines specify
       mapping for ranges of characters.  You can find additional
       sample     EBCDIC-ISO     mapping     tables     in    the
       /usr/i18n/lib/nls/loc/iconv/data directory.

NOTES [Toc] [Back]

       This reference page contains  code  conversion  specifications
  that  apply only to conversion between Hitachi KEIS
       code and the DEC Kanji, Super DEC Kanji, Japanese EUC, and
       Shift  JIS  codesets.  Refer to iconv_ibmkanji(5) for code
       conversion specifications between IBM Kanji System characters
 and the DEC Kanji, Super DEC Kanji, Japanese EUC, and
       Shift JIS codesets. Refer to iconv_JEF(5) for code conversion
 specifications between Fujitsu JEF characters and the
       DEC Kanji, Super DEC Kanji, Japanese EUC,  and  Shift  JIS
       codesets.   Refer  to iconv_intro(5) for information about
       conversion between DEC Kanji, Super  DEC  Kanji,  Japanese
       EUC, Shift JIS, and other Tru64 UNIX codesets.

iconv_KEIS(5)

Contents

NAME [Toc] [Back]

DESCRIPTION [Toc] [Back]

NOTES [Toc] [Back]

SEE ALSO [Toc] [Back]