iconv_JEF - Specification for controlling conversion
between Fujitsu JEF and Tru64 UNIX Japanese codesets
The iconv utility supports the ability to convert the
encoding of characters between Fujitsu JEF (Japanese processing
Extended Feature) code and one of the following
Tru64 UNIX codesets: DEC Kanji, Super DEC Kanji, Japanese
EUC, or Shift JIS. You choose the type of conversion by
specifying the appropriate values for the utility's fromcode
and to-code parameters, as follows:
------------------------------------------------
Type of Code Conversion from-code to-code
------------------------------------------------
JEF to DEC Kanji JEF deckanji
JEF to Super DEC Kanji JEF sdeckanji
JEF to Japanese EUC JEF eucJP
JEF to Shift JIS JEF SJIS
DEC Kanji to JEF deckanji JEF
Super DEC Kanji to JEF sdeckanji JEF
Japanese EUC to JEF eucJP JEF
Shift JIS to JEF SJIS JEF
------------------------------------------------
Conversion behavior for the following items is affected by
the definition of environment variables or profile entries
in the user's environment. For more information, see the
"Environment Variables" and "Profile" sections. The UDC
(User-Defined Character) mapping table that is used for
UDC conversion
This table must be an ASCII text file that contains
UDC mapping information. The table affects conversion
of user-defined characters between the codesets.
The EBCDIC to/from ISO code (ASCII, JIS
Roman characters) mapping table that is used for
conversion
This table must be ASCII text file that contains
information on how to map characters between EBCDIC
and ISO code. The K-shift code
This is a one- or two-byte hexadecimal code that
marks the beginning of Kanji mode. The A-shift
code
This is a one- or two-byte hexadecimal code that
marks the beginning of EBCDIC mode. The status of
the initial mode (Kanji or EBCDIC) at the time
iconv command starts or the first time the iconv()
function is called after calling the iconv_open()
function that initializes the converter in a program
The status keywords are either kanji_mode or
ebcdic_mode. How to treat undefined characters
when these are detected in Kanji mode
Specify this action by using one of the following
keywords: Stop codeset conversion. Output the
undefined characters without any processing and
continue codeset conversion. Output padding characters
instead of the undefined characters and continue
codeset conversion. Ignore the undefined
characters and continue codeset conversion. The
two-byte padding character used in Kanji mode
This value is meaningful when replace is chosen for
the processing of undefined characters in Kanji
mode. Specify the padding character by its hexadecimal
value. How to treat undefined characters when
these are detected in EBCDIC mode
Specify this action by using one of the following
keywords: Stop codeset conversion. Output the
undefined characters without any processing and
continue codeset conversion. Output padding characters
instead of the undefined characters and continue
codeset conversion. Ignore the undefined
characters and continue codeset conversion. The
one-byte padding character used in EBCDIC mode
This value is meaningful when replace is chosen for
the processing of undefined characters in EBCDIC
mode. Specify the padding character by its hexadecimal
value.
When the to-code parameter for the conversion is JEF, you
can also specify the following items for conversion behavior:
Whether the initial shift code is output at the start
of conversion if the status of the initial mode (Kanji or
EBCDIC) is different from the mode of the first input
character
The start of conversion is the time the iconv utility
starts processing, or when the iconv() function
is called just after opening the converter with
iconv_open(). Keyword values for this item are yes
or no. Whether or not the utility outputs the last
shift code when iconv() is called with a zero
length input string, and the current mode (Kanji or
EBCDIC) is different from the mode specified by the
last shift state
Keyword values for this item are yes or no. The
last status (Kanji mode or EBCDIC mode)
Specify kanji_mode or ebcdic_mode for this value.
It is meaningful only when yes is the setting for
whether the utility outputs the last shift code.
If the items that control conversion behavior are specified
by both environment variables and the profile file,
values set by environment variables override values set by
comparable entries in the profile. Note that values for
all conversion control items are case-sensitive, whether
they are set by environment variables or in the profile.
The following table contains the default values for each
conversion control item:
----------------------------------------------------
Conversion Control Item Default Value
----------------------------------------------------
UDC mapping table None
K shift code 0x28
A shift code 0x29
Initial state ebcdic_mode
Processing for undefined characters
in Kanji mode abort
Processing for undefined characters
in EBCDIC mode pass
----------------------------------------------------
The default padding characters are white spaces, whose
code values for each destination codeset are noted in the
following table. These padding characters are output when
you specify replace for processing of undefined characters
and do not explicitly specify the padding character.
---------------------------------------------------
Mode Default Value Destination Codeset
---------------------------------------------------
Kanji mode 0x4040 JEF
0xa1a1 deckanji, sdeckanji,
or eucJP
0x8140 SJIS
EBCDIC mode 0x40 JEF
0x20 deckanji, sdeckanji,
eucJP, or SJIS
---------------------------------------------------
The default EBCDIC-ISO mapping table is as follows; For
conversion from JEF to other codesets:
/usr/lib/nls/loc/iconv/data/kana_ebcdic.tbl For conversion
from other codesets to JEF:
/usr/lib/nls/loc/iconv/data/kana_ebcdic.tbl
These mapping tables map both EBCDIC and ISO code, which
includes JIS Roman characters. The kana_ebcdic.tbl mapping
table also maps ISO lowercase characters to EBCDIC uppercase
characters.
The following default values for conversion control items
are meaningful when the iconv utility's to-code conversion
parameter is JEF:
---------------------------------------------
Conversion Control Item Default
---------------------------------------------
Output the initial shift code? yes
Output the last shift code? yes
Output the last status? ebcdic_mode
---------------------------------------------
Environment Variables [Toc] [Back]
This section discusses the environment variables that you
can set to control conversion behavior. The names for
these variables adhere to the following format:
fromcode_tocode_controlitem
The name segments for fromcode or tocode can be one of the
following key words:
----------------------------
For Codeset: Use:
----------------------------
Fujitsu JEF JEF
DEC Kanji DECKANJI
Super DEC Kanji SDECKANJI
Japanese EUC EUCJP
Shift JIS SJIS
----------------------------
The name segments for controlitem can be one of the following
keywords:
--------------------------------------------------------
For Control Item: Use:
--------------------------------------------------------
UDC mapping table UDC_TABLE
EBCDIC-ISO mapping table EBCDIC_TABLE
K shift code K_SHIFT_CODE
A shift code A_SHIFT_CODE
Initial state INITIAL_STATE
Processing of undefined characters
in Kanji mode KANJI_EXCEPT_PROC
Processing of undefined characters
in EBCDIC mode EBCDIC_EXCEPT_PROC
Padding characters
in Kanji mode PADDING_2BYTE_CHAR
Padding characters
in EBCDIC mode PADDING_1BYTE_CHAR
Output initial
shift code INITIAL_SHIFT_CODE
Output last
shift code TRAILER_SHIFT_CODE
Last status LAST_STATE
File path of the profile PROFILE
--------------------------------------------------------
Following are examples of using the setenv C shell command
to define environment variables to control conversion
behavior. In these examples, the fromcode name segment
indicates Japanese EUC and the tocode name segment indicates
JEF:
setenv EUCJP_JEF_UDC_TABLE eucjp_jef_udc.tbl setenv
EUCJP_JEF_EBCDIC_TABLE ebcdic_kana.tbl setenv
EUCJP_JEF_K_SHIFT_CODE 0x28 setenv EUCJP_JEF_A_SHIFT_CODE
0x29 setenv EUCJP_JEF_INITIAL_STATE ebcdic_mode setenv
EUCJP_JEF_KANJI_EXCEPT_PROC replace setenv
EUCJP_JEF_EBCDIC_EXCEPT_PROC replace setenv
EUCJP_JEF_PADDING_2BYTE_CHAR 0x4040 setenv
EUCJP_JEF_PADDING_1BYTE_CHAR 0x40 setenv EUCJP_JEF_INITIAL_SHIFT_CODE
yes setenv EUCJP_JEF_TRAILER_SHIFT_CODE
yes setenv EUCJP_JEF_LAST_STATE ebcdic_mode setenv
EUCJP_JEF_INITIAL_SHIFT_CODE yes setenv
EUCJP_JEF_TRAILER_SHIFT_CODE yes setenv
EUCJP_JEF_LAST_STATE ebcdic_mode setenv EUCJP_JEF_PROFILE
.eucjp_jef_profile
Directory Search Path [Toc] [Back]
When you specify a file name without a directory, the
iconv utility searches the following directories and uses
the first file found: Current directory Home directory The
iconv/data subdirectory of the directory specified by the
environment variable LOCPATH /usr/lib/nls/loc/iconv/data
/usr/i18n/lib/nls/loc/iconv/data
If you specify a relative directory path for a file, the
utility searches these same directories in the same order
and uses the first file found.
Profile File [Toc] [Back]
Entry lines in the profile file adhere to the following
format:
entry_name string_value
The entry_name and string_value fields are separated by
spaces or tabs. Do not append a colon (:) after
entry_name. The file can also include blank lines and comment
entries, which begin with the # character.
Following are the entry_name values for different conversion
control items:
------------------------------------------------------------
Conversion Control Item entry_name
------------------------------------------------------------
UDC mapping table udc_mapping_table
EBCDIC-ISO mapping table ebcdic_mapping_table
K shift code k_shift_code
A shift code a_shift_code
Initial state initial_state
Processing undefined characters
in Kanji mode kanji_except_proc
Processing undefined characters
in EBCDIC mode ebcdic_except_proc
Padding character
in Kanji mode padding_2byte_char
Padding character
in EBCDIC mode padding_1byte_char
Output initial
shift code output_initial_shift_code
Output last
shift code output_trailer_shift_code
Last state last_state
------------------------------------------------------------
Following is a sample profile for converting from Japanese
EUC to Fujitsu JEF:.
# # sample profile for eucJP_JEF # udc_mapping_table
eucjp_jef_udc.tbl ebcdic_mapping_table
kana_ebcdic.tbl k_shift_code 0x28
# ebcdic -> kanji a_shift_code 0x29
# kanji -> ebcdic initial_state
ebcdic_mode kanji_except_proc replace
ebcdic_except_proc replace padding_2byte_char
0x4040 # kanji mode padding_1byte_char
0x40 # ebcdic mode output_initial_shift_code
yes output_trailer_shift_code yes last_state
ebcdic_mode
The default file names for the profile are as follows;
------------------------------------------------
Code Conversion Default Profile Name
------------------------------------------------
JEF to DEC Kanji .jef_deckanji_profile
JEF to Super DEC Kanji .jef_sdeckanji_profile
JEF to Shift JIS .jef_sjis_profile
JEF to Japanese EUC .jef_eucjp_profile
DEC Kanji to JEF .deckanji_jef_profile
Super DEC Kanji to JEF .sdeckanji_jef_profile
Shift JIS to JEF .sjis_jef_profile
Japanese EUC to JEF .eucjp_jef_profile
------------------------------------------------
By default, the iconv utility checks the directory search
path mentioned in the "Directory Search Path" section and
uses the first profile it finds. However, you can also
specify an arbitrary file path for your profile instead of
the default names by defining the following environment
variables:
-----------------------------------------------------------
Code Conversion Profile Path Environment Variable
-----------------------------------------------------------
JEF to DEC Kanji JEF_DECKANJI_PROFILE
JEF to Super DEC Kanji JEF_SDECKANJI_PROFILE
JEF to Shift JIS JEF_SJIS_PROFILE
JEF to Japanese EUC JEF_EUCJP_PROFILE
DEC Kanji to JEF DECKANJI_JEF_PROFILE
Super DEC Kanji to JEF SDECKANJI_JEF_PROFILE
Shift JIS to JEF SJIS_JEF_PROFILE
Japanese EUC to JEF EUCJP_JEF_PROFILE
-----------------------------------------------------------
UDC Mapping Table [Toc] [Back]
Entries in a UDC mapping table adhere to the following
format:
fromcode tocode
Each of these values is a two-byte hexadecimal number. In
the case of Super DEC Kanji and Japanese EUC, three-byte
hexadecimal values that begin with SS3 (0x8f), such as
0x8fxxxx, are also valid.
You can specify ranges of UDC from and to values in the
same file entry by using a hyphen to separate the codes
that start and end each range:
start_fromcode-end_fromcode start_tocode-end_tocode
When specifying entries that include ranges of values, the
number of codes in the from range must always equal the
number of codes in the to range. A UDC mapping table can
also include blank lines and comment lines, which begin
with the # character. Following is an example of a UDC
mapping table:
# JEF eucJP
0x80a1-0x89fe 0xf5a1-0xfefe # udc
0x8aa1-0x93fe 0x8ff5a1-0x8ffefe # udc
0x94a1-0x99fe 0x8feea1-0x8ff3fe # udc
0x9aa1-0x9afe 0x8ff4a1-0x8ff4fe # udc
The first entry in this file specifies a range of JEF values
from 0x80a1 to 0x89fe that are mapped to Japanese EUC
code values in the range 0xf5a1 to 0xfefe. You can find
additional sample UDC mapping table files in the
/usr/i18n/examples/iconv/data directory.
EBCDIC-ISO Mapping Table [Toc] [Back]
Entries in an EBCDIC-ISO mapping table adhere to the following
format:
fromcode tocode
Each code is a one-byte hexadecimal number. You can specify
a range of character codes as follows:
start_fromcode-end_fromcode start_tocode-end_tocode
When using the range format, the number of hex values in
the from range must be the same as the number of hex values
in the to range.
The EBCDIC-/ISO mapping table can also include blank lines
and comment entries, which begin with the # character.
Following is an example of EBCDIC-ISO code mapping table:
# EBCDIC Kana
0x40 0x20 # space 0x4f
0x21 # '!' 0x7f 0x22
# '"'
. .
. .
. . 0xc1-0xc9
0x41-0x49 # 'A' - 'I' 0xd1-0xd9
0x4a-0x52 # 'J' - 'R' 0xe2-0xe9
0x53-0x5a # 'S' - 'Z'
. .
. .
. .
In this example, the first column of values are from codes
and the second column of values are to codes. The first
three value entry lines specify mapping for single characters,
whereas the last three value entry lines specify
mapping for ranges of characters. You can find additional
sample EBCDIC-ISO mapping tables in the
/usr/i18n/lib/nls/loc/iconv/data directory.
This reference page contains code conversion specifications
that apply only to conversion between Fujitsu JEF
code and the DEC Kanji, Super DEC Kanji, Japanese EUC, and
Shift JIS codesets. Refer to iconv_ibmkanji(5) for code
conversion specifications between IBM Kanji System characters
and the DEC Kanji, Super DEC Kanji, Japanese EUC, and
Shift JIS codesets. Refer to iconv_KEIS(5) for code conversion
specifications between Hitachi KEIS characters and
the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift
JIS codesets. Refer to iconv_intro(5) for information
about conversion between DEC Kanji, Super DEC Kanji,
Japanese EUC, Shift JIS, and other Tru64 UNIX codesets.
Commands: iconv(1)
Functions: iconv(3), iconv_close(3), iconv_open(3)
Others: deckanji(5), eucJP(5), iconv_ibmkanji(5),
iconv_intro(5), iconv_KEIS(5), Japanese(5), sdeckanji(5),
SJIS(5)
iconv_JEF(5)
[ Back ] |