euc - FreeBSD

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->FreeBSD man pages -> euc (5)

EUC(5)

NAME [Toc] [Back]

     euc -- EUC encoding of wide characters

SYNOPSIS [Toc] [Back]

     ENCODING "EUC"

     VARIABLE len1 mask1 len2 mask2 len3 mask3 len4 mask4 mask

DESCRIPTION [Toc] [Back]

     EUC implements a system of 4 multibyte codesets.  A multibyte character
     in the first codeset consists of len1 bytes starting with a byte in the
     range of 0x00 to 0x7f.  To allow use of ASCII, len1 is always 1.  A
     multibyte character in the second codeset consists of len2 bytes starting
     with a byte in the range of 0x80-0xff excluding 0x8e and 0x8f.  A multibyte
 character in the third codeset consists of len3 bytes starting with
     the byte 0x8e.  A multibyte character in the fourth codeset consists of
     len4 bytes starting with the byte 0x8f.

     The wchar_t encoding of EUC multibyte characters is dependent on the len
     and mask arguments.  First, the bytes are moved into a wchar_t as follows:


     byte0 << ((lenN-1) * 8) | byte1 << ((lenN-2) * 8) | ... | bytelenN-1

     The result is then ANDed with ~mask and ORed with maskN.  Codesets 2 and
     3 are special in that the leading byte (0x8e or 0x8f) is first removed
     and the lenN argument is reduced by 1.

     For example, the ja_JP.eucJP locale has the following VARIABLE line:

     VARIABLE	     1 0x0000 2 0x8080 2 0x0080 3 0x8000 0x8080

     Codeset 1 consists of the values 0x0000 - 0x007f.

     Codeset 2 consists of the values who have the bits 0x8080 set.

     Codeset 3 consists of the values 0x0080 - 0x00ff.

     Codeset 4 consists of the values 0x8000 - 0xff7f excluding the values
     which have the 0x0080 bit set.

     Notice that the global mask is set to 0x8080, this implies that from
     those 2 bits the codeset can be determined.

SEE ALSO [Toc] [Back]

     mklocale(1), setlocale(3)


FreeBSD 5.2.1		       November 8, 2003 		 FreeBSD 5.2.1

[ Back ]

Similar pages

Name	OS	Title
utf2	FreeBSD	Universal character set Transformation Format encoding of wide characters
wmemset	Linux	fill an array of wide-characters with a constant wide character
wcsspn	Linux	advance in a wide-character string, skipping any of a set of wide characters
wcscspn	Linux	search a wide-character string for any of a set of wide characters
wcspbrk	Linux	search a wide-character string for any of a set of wide characters
pfb2pfa	IRIX	convert PostScript Type 1 font from binary encoding to ASCII encoding
wctrans	Tru64	Map wide characters to a property
fgetws	FreeBSD	get a line of wide characters from a stream
wctype	Tru64	Get a handle to classify wide characters
wmemcmp	Linux	compare two arrays of wide-characters

newsletter delivery service

EUC(5)

Contents

NAME [Toc] [Back]

SYNOPSIS [Toc] [Back]

DESCRIPTION [Toc] [Back]

SEE ALSO [Toc] [Back]