utf2 -- Universal character set Transformation Format encoding of wide
characters
ENCODING "UTF2"
The UTF2 encoding has been deprecated in favour of UTF-8. New applications
should not use UTF2.
The UTF2 encoding is based on a proposed X-Open multibyte FSS-UCS-TF
(File System Safe Universal Character Set Transformation Format) encoding
as used in Plan 9 from Bell Labs. Although it is capable of representing
more than 16 bits, the current implementation is limited to 16 bits as
defined by the Unicode Standard.
UTF2 representation is backwards compatible with ASCII, so 0x00-0x7f
refer to the ASCII character set. The multibyte encodings of wide characters
between 0x0080 and 0xffff consist entirely of bytes whose high
order bit is set. The actual encoding is represented by the following
table:
[0x0000 - 0x007f] [00000000.0bbbbbbb] -> 0bbbbbbb
[0x0080 - 0x07ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
[0x0800 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
If more than a single representation of a value exists (for example,
0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
used (but the longer ones will be correctly decoded).
The final three encodings provided by X-Open:
[00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
which provides for the entire proposed ISO-10646 31 bit standard are currently
not implemented.
mklocale(1), setlocale(3), utf8(5)
FreeBSD 5.2.1 October 11, 2002 FreeBSD 5.2.1 [ Back ] |