*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->Tru64 Unix man pages -> iconv_intro (5)              
Title
Content
Arch
Section
 

iconv_intro(5)

Contents


NAME    [Toc]    [Back]

       iconv_intro, iconv - Introduction to codeset conversion

DESCRIPTION    [Toc]    [Back]

       Conversion  of character encoding from one coded character
       set (codeset) to another is an operation that often has to
       be  performed  by  the  operating system and some applications.
 For example, the man command supports codeset  conversion
  to  allow one set of reference page files to meet
       the needs of locales that support the  same  language  and
       territory but different codesets (see man(1)).

       The  following  commands and library interfaces give users
       and application developers direct access to  codeset  conversion
  operations: The iconv command converts characters
       in a data file from one codeset to another (see iconv(1)).
       The  iconv(),  iconv_open(),  and  iconv_close() functions
       convert a string of characters from one codeset to another
       (see  iconv(3),  iconv_open(3),  and iconv_close(3)).  The
       iconv command uses these interfaces to convert characters.

       There are two types of codeset converters: algorithmic and
       table.  Algorithmic  converters,  which  reside   in   the
       /usr/lib/nls/loc/iconv  directory,  are  shared  libraries
       with a predefined entry point for invocation by  functions
       in  the  libiconv.so  library.  Algorithmic converters are
       needed for the conversion of multibyte codesets,  in  part
       because table converters cannot handle the required number
       of character values and also because some of  these  codesets
  require  complex  handling  (see NOTES). Algorithmic
       converters are supplied as part of  the  operating  system
       product; the internal interfaces that they require are not
       published for external use.

       Table     converters,     which     reside     in      the
       /usr/lib/nls/loc/iconvTable  directory,  can be created by
       using the genxlt command (see genxlt(1)). These converters
       can  support  single-byte  codesets  and up to 256 encoded
       character values.

       Names of codeset converters are in the following form:

       from-codeset_to-codeset

       For example, the following converter converts values  from
       Super DEC Kanji to Japanese Extended UNIX Code:

       sdeckanji_eucJP

       The  codeset converters produce an invalid character error
       in response to characters that cannot  be  converted  from
       the  source codeset to the destination codeset. This error
       is always produced for character codes that are invalid in
       the  source  codeset.  However,  if the error results from
       characters that are valid in the source codeset  but  have
       no counterparts in the destination codeset, you can eliminate
 the error by defining  the  ICONV_DEFSTR  environment
       variable  to  specify  a substitute output string. See the
       ENVIRONMENT VARIABLES section for more  information  about
       using the ICONV_DEFSTR variable.

       It  is  possible  to  convert  data  directly  between two
       codesets or by way of an  intermediate  codeset,  such  as
       UTF-16, UCS-4, or UTF-8. For conversion of Chinese characters,
 be aware that the results  of  converting  a  Traditional
  Chinese  codeset  directly to a Simplified Chinese
       codeset may not be the same as the results  of  converting
       Traditional  Chinese  first to UTF-16, UCS-4, or UTF-8 and
       then to Simplified Chinese.

ENVIRONMENT VARIABLES    [Toc]    [Back]

       Some codeset converters require  more  complex  algorithms
       than  can  be provided through tables. The following environment
 variables provide control over conversion behavior
       for different kinds of codeset converters:

       Controls  the  behavior  for the many-to-one value conversions
 for conversion of Traditional  Chinese  (except  for
       Traditional  Chinese  encoded  in  Telecode) to Simplified
       Chinese.  The valid settings for this environment variable
       are as follows: Specifies that the preferred mapping value
       (the first one in the one-to-many mapping list) is  always
       taken.   The  batch  setting  is the ICONV_ACTION default.
       Specifies that all the possible values are printed to  the
       standard  output,  enclosed  by  braces ({ }), so that the
       user can later manually edit the converted file and select
       the  one  to  use.  Specifies that all the possible values
       are printed to the standard output except for  punctuation
       symbols,  for  which  only  the preferred mapping value is
       printed. As is true for conv-all, the conv_all_nosym  setting
  prints  value choices enclosed by braces so that the
       converted file can later be edited.   Sets  byte  ordering
       for UTF-16 or UCS-4 (UTF-32) converters only. Valid values
       are little-endian or big-endian.

              If ICONV_NOBOM is set  to  a  non-null  value,  the
              default byte ordering is big-endian. If ICONV_NOBOM
              is not set, the default byte  ordering  is  littleendian.
     Setting    the    ICONV_BYTEORDER   and
              ICONV_NOBOM environment variables may be  necessary
              when  producing UTF-16 or UCS-4 output that will be
              processed by codeset converters on platforms  other
              than  Tru64 UNIX.  Defines the default string to be
              substituted in output for  valid  input  characters
              that cannot be converted from the source codeset to
              the destination codeset. The variable value can  be
              an  arbitrary string or a code number. If the value
              is a code number (for example, 10,  07,  0x10,  or,
              for  Unicode converters, U+1234), the corresponding
              character in the  output  codeset  (to-codeset)  is
              printed.

              For  a given type of codeset conversion, a matching
              ICONV_DEFSTR_from-codeset_to-codeset  variable  has
              precedence  over  the ICONV_DEFSTR variable without
              the from-codeset_to-codeset suffix.  When  defining
              the  variable  with  the suffix, replace from-codeset_to-codeset
 with the name of  the  codeset  converter
   to   which   the   variable  applies.  The
              ICONV_DEFSTR variable (defined without the  suffix)
              is  used  by a converter when no ICONV_DEFSTR_fromcodeset_to-codeset
  variable   has   been   defined
              specifically for the type of conversion being done.

              If these variables are not defined or  are  set  to
              the null string, the characters that cannot be converted
 are skipped and have  no  representation  in
              converted output.

              The following converter-specific restrictions apply
              to ICONV_DEFSTR* variables: ICONV_DEFSTR*  environment
 variables do not work for converters that convert
 between Japanese codesets  or  between  Korean
              codesets.  For converters that handle UTF-16, UCS-4
              or UTF-8 format, the only valid variable value is a
              code  number  (such  as U+1234 or 0x10) or a string
              whose value is a single ASCII  character  (such  as
              ?).  For  these  converters, any string value other
              than a single ASCII character is  ignored  and  any
              characters  that cannot be converted have no representation
 in output.  For  converters  that  handle
              output in UTF-16, UCS-4 or UTF-8 format, characters
              that cannot be converted and  for  which  no  valid
              ICONV_DEFSTR*  value  has  been  defined produce an
              error condition that aborts the conversion process.
              Disables  generation  of the byte-order mark at the
              beginning of UTF-16 or UCS-4  (UTF-32)  output.   A
              valid  setting  is  any  value  other  than  a null
              string. If ICONV_NOBOM is set, big-endian is established
 as the default byte ordering and BOM generation
 is disabled. If ICONV_NOBOM is not  set,  little-endian
  is  established  as  the  default  byte
              ordering and BOM generation is enabled.

              Codeset converters that  process  UTF-16  or  UCS-4
              data  on  platforms  other  than Tru64 UNIX usually
              require the byte-order mark.  The  ICONV_NOBOM  and
              ICONV_BYTEORDER  environment  variables provide you
              with the means to control the generation of a byteorder
  mark and byte ordering. Thus, you can establish
 codeset conversion that is appropriate to  the
              requirements  of  other  platforms or is compatible
              with output produced  by  codeset  converters  that
              were  included  in  versions of Tru64 UNIX prior to
              Version 4.0D.  Activates phrase conversion for converters
  that  convert  from  a Traditional Chinese
              codeset (except for Traditional Chinese encoded  in
              Telecode)  to  a  Simplified Chinese codeset or the
              reverse. When phrase  conversion  is  activated,  a
              whole phrase in Traditional Chinese is converted to
              a different phrase in  Simplified  Chinese  or  the
              reverse.

              If  ICONV_PHRCONV  is  set  to  mark, the converted
              phrases are be bracketed by [ and  ]  to  highlight
              the conversion result for visual checking.

              The    phrase    conversion    databases   in   the
              /usr/share/phrdb directory are  normal  text  files
              with  the same file names as those of the algorithmic
 converters in /usr/lib/nls/loc/iconv/*.   These
              phrase  conversion  databases  contain  entries for
              phrase conversion pairs.

FILES    [Toc]    [Back]

       Algorithmic converters Table converters Phrase  conversion
       databases







SEE ALSO    [Toc]    [Back]

      
      
       Commands: genxlt(1), iconv(1), phrase(1)

       Functions: iconv(3), iconv_close(3), iconv_open(3)

       Others: i18n_intro(5), l10n_intro(5)



                                                   iconv_intro(5)
[ Back ]
 Similar pages
Name OS Title
genxlt Tru64 Generates a codeset conversion table
HKSCS Tru64 Codeset conversion support for the Hong Kong Supplementary Character Set
intro_conversion IRIX Introduction to conversion routines
iconv Tru64 Convert a string of characters from one codeset to another codeset
iconv_close Tru64 Close a specified codeset converter
ISO8859-1 Tru64 A character encoding system (codeset)
iso8859-1 Tru64 A character encoding system (codeset)
iconv_open Tru64 Open a character codeset converter
iconv Tru64 Converts encoded characters to another codeset
sdeckanji Tru64 A character encoding system (codeset) for Japanese
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service