| Current File : //usr/local/share/man/man3/Unicode::String.3 |
.\" Automatically generated by Pod::Man 2.28 (Pod::Simple 3.28)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings. \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote. \*(C+ will
.\" give a nicer C++. Capital omega is used to do unbreakable dashes and
.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
. ds -- \(*W-
. ds PI pi
. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
. ds L" ""
. ds R" ""
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds -- \|\(em\|
. ds PI \(*p
. ds L" ``
. ds R" ''
. ds C`
. ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{
. if \nF \{
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. if !\nF==2 \{
. nr % 0
. nr F 2
. \}
. \}
.\}
.rr rF
.\"
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear. Run. Save yourself. No user-serviceable parts.
. \" fudge factors for nroff and troff
.if n \{\
. ds #H 0
. ds #V .8m
. ds #F .3m
. ds #[ \f1
. ds #] \fP
.\}
.if t \{\
. ds #H ((1u-(\\\\n(.fu%2u))*.13m)
. ds #V .6m
. ds #F 0
. ds #[ \&
. ds #] \&
.\}
. \" simple accents for nroff and troff
.if n \{\
. ds ' \&
. ds ` \&
. ds ^ \&
. ds , \&
. ds ~ ~
. ds /
.\}
.if t \{\
. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
. ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
.\}
. \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
. \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
. \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
\{\
. ds : e
. ds 8 ss
. ds o a
. ds d- d\h'-1'\(ga
. ds D- D\h'-1'\(hy
. ds th \o'bp'
. ds Th \o'LP'
. ds ae ae
. ds Ae AE
.\}
.rm #[ #] #H #V #F C
.\" ========================================================================
.\"
.IX Title "String 3"
.TH String 3 "2005-10-26" "perl v5.20.0" "User Contributed Perl Documentation"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
Unicode::String \- String of Unicode characters (UTF\-16BE)
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 1
\& use Unicode::String qw(utf8 latin1 utf16be);
\&
\& $u = utf8("string");
\& $u = latin1("string");
\& $u = utf16be("\e0s\e0t\e0r\e0i\e0n\e0g");
\&
\& print $u\->utf32be; # 4 byte characters
\& print $u\->utf16le; # 2 byte characters + surrogates
\& print $u\->utf8; # 1\-4 byte characters
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
A \f(CW\*(C`Unicode::String\*(C'\fR object represents a sequence of Unicode
characters. Methods are provided to convert between various external
formats (encodings) and \f(CW\*(C`Unicode::String\*(C'\fR objects, and methods are
provided for common string manipulations.
.PP
The functions \fIutf32be()\fR, \fIutf32le()\fR, \fIutf16be()\fR, \fIutf16le()\fR, \fIutf8()\fR,
\&\fIutf7()\fR, \fIlatin1()\fR, \fIuhex()\fR, \fIuchr()\fR can be imported from the
\&\f(CW\*(C`Unicode::String\*(C'\fR module and will work as constructors initializing
strings of the corresponding encoding.
.PP
The \f(CW\*(C`Unicode::String\*(C'\fR objects overload various operators, which means
that they in most cases can be treated like plain strings.
.PP
Internally a \f(CW\*(C`Unicode::String\*(C'\fR object is represented by a string of 2
byte numbers in network byte order (big-endian). This representation
is not visible by the \s-1API\s0 provided, but it might be useful to know in
order to predict the efficiency of the provided methods.
.SS "\s-1METHODS\s0"
.IX Subsection "METHODS"
.SS "Class methods"
.IX Subsection "Class methods"
The following class methods are available:
.IP "Unicode::String\->stringify_as" 4
.IX Item "Unicode::String->stringify_as"
.PD 0
.ie n .IP "Unicode::String\->stringify_as( $enc )" 4
.el .IP "Unicode::String\->stringify_as( \f(CW$enc\fR )" 4
.IX Item "Unicode::String->stringify_as( $enc )"
.PD
This method is used to specify which encoding will be used when
\&\f(CW\*(C`Unicode::String\*(C'\fR objects are implicitly converted to and from plain
strings.
.Sp
If an argument is provided it sets the current encoding. The argument
should have one of the following: \*(L"ucs4\*(R", \*(L"utf32\*(R", \*(L"utf32be\*(R",
\&\*(L"utf32le\*(R", \*(L"ucs2\*(R", \*(L"utf16\*(R", \*(L"utf16be\*(R", \*(L"utf16le\*(R", \*(L"utf8\*(R", \*(L"utf7\*(R",
\&\*(L"latin1\*(R" or \*(L"hex\*(R". The default is \*(L"utf8\*(R".
.Sp
The \fIstringify_as()\fR method returns a reference to the current encoding
function.
.ie n .IP "$us = Unicode::String\->new" 4
.el .IP "\f(CW$us\fR = Unicode::String\->new" 4
.IX Item "$us = Unicode::String->new"
.PD 0
.ie n .IP "$us = Unicode::String\->new( $initial_value )" 4
.el .IP "\f(CW$us\fR = Unicode::String\->new( \f(CW$initial_value\fR )" 4
.IX Item "$us = Unicode::String->new( $initial_value )"
.PD
This is the object constructor. Without argument, it creates an empty
\&\f(CW\*(C`Unicode::String\*(C'\fR object. If an \f(CW$initial_value\fR argument is given, it
is decoded according to the specified \fIstringify_as()\fR encoding, \s-1UTF\-8\s0
by default.
.Sp
In general it is recommended to import and use one of the encoding
specific constructor functions instead of invoking this method.
.SS "Encoding methods"
.IX Subsection "Encoding methods"
These methods get or set the value of the \f(CW\*(C`Unicode::String\*(C'\fR object by
passing strings in the corresponding encoding. If a new value is
passed as argument it will set the value of the \f(CW\*(C`Unicode::String\*(C'\fR,
and the previous value is returned. If no argument is passed then the
current value is returned.
.PP
To illustrate the encodings we show how the 2 character sample string
of \*(L"Xm\*(R" (micro meter) is encoded for each one.
.ie n .IP "$us\->utf32be" 4
.el .IP "\f(CW$us\fR\->utf32be" 4
.IX Item "$us->utf32be"
.PD 0
.ie n .IP "$us\->utf32be( $newval )" 4
.el .IP "\f(CW$us\fR\->utf32be( \f(CW$newval\fR )" 4
.IX Item "$us->utf32be( $newval )"
.PD
The string passed should be in the \s-1UTF\-32\s0 encoding with bytes in big
endian order. The sample \*(L"Xm\*(R" is \*(L"\e0\e0\e0\exB5\e0\e0\e0m\*(R" in this encoding.
.Sp
Alternative names for this method are \fIutf32()\fR and \fIucs4()\fR.
.ie n .IP "$us\->utf32le" 4
.el .IP "\f(CW$us\fR\->utf32le" 4
.IX Item "$us->utf32le"
.PD 0
.ie n .IP "$us\->utf32le( $newval )" 4
.el .IP "\f(CW$us\fR\->utf32le( \f(CW$newval\fR )" 4
.IX Item "$us->utf32le( $newval )"
.PD
The string passed should be in the \s-1UTF\-32\s0 encoding with bytes in little
endian order. The sample \*(L"Xm\*(R" is is \*(L"\exB5\e0\e0\e0m\e0\e0\e0\*(R" in this encoding.
.ie n .IP "$us\->utf16be" 4
.el .IP "\f(CW$us\fR\->utf16be" 4
.IX Item "$us->utf16be"
.PD 0
.ie n .IP "$us\->utf16be( $newval )" 4
.el .IP "\f(CW$us\fR\->utf16be( \f(CW$newval\fR )" 4
.IX Item "$us->utf16be( $newval )"
.PD
The string passed should be in the \s-1UTF\-16\s0 encoding with bytes in big
endian order. The sample \*(L"Xm\*(R" is \*(L"\e0\exB5\e0m\*(R" in this encoding.
.Sp
Alternative names for this method are \fIutf16()\fR and \fIucs2()\fR.
.Sp
If the string passed to \fIutf16be()\fR starts with the Unicode byte order
mark in little endian order, the result is as if \fIutf16le()\fR was called
instead.
.ie n .IP "$us\->utf16le" 4
.el .IP "\f(CW$us\fR\->utf16le" 4
.IX Item "$us->utf16le"
.PD 0
.ie n .IP "$us\->utf16le( $newval )" 4
.el .IP "\f(CW$us\fR\->utf16le( \f(CW$newval\fR )" 4
.IX Item "$us->utf16le( $newval )"
.PD
The string passed should be in the \s-1UTF\-16\s0 encoding with bytes in
little endian order. The sample \*(L"Xm\*(R" is is \*(L"\exB5\e0m\e0\*(R" in this
encoding. This is the encoding used by the Microsoft Windows \s-1API.\s0
.Sp
If the string passed to \fIutf16le()\fR starts with the Unicode byte order
mark in big endian order, the result is as if \fIutf16le()\fR was called
instead.
.ie n .IP "$us\->utf8" 4
.el .IP "\f(CW$us\fR\->utf8" 4
.IX Item "$us->utf8"
.PD 0
.ie n .IP "$us\->utf8( $newval )" 4
.el .IP "\f(CW$us\fR\->utf8( \f(CW$newval\fR )" 4
.IX Item "$us->utf8( $newval )"
.PD
The string passed should be in the \s-1UTF\-8\s0 encoding. The sample \*(L"Xm\*(R" is
\&\*(L"\exC2\exB5m\*(R" in this encoding.
.ie n .IP "$us\->utf7" 4
.el .IP "\f(CW$us\fR\->utf7" 4
.IX Item "$us->utf7"
.PD 0
.ie n .IP "$us\->utf7( $newval )" 4
.el .IP "\f(CW$us\fR\->utf7( \f(CW$newval\fR )" 4
.IX Item "$us->utf7( $newval )"
.PD
The string passed should be in the \s-1UTF\-7\s0 encoding. The sample \*(L"Xm\*(R" is
\&\*(L"+ALU\-m\*(R" in this encoding.
.Sp
The \s-1UTF\-7\s0 encoding only use plain US-ASCII characters for the
encoding. This makes it safe for transport through 8\-bit stripping
protocols. Characters outside the US-ASCII range are base64\-encoded
and '+' is used as an escape character. The \s-1UTF\-7\s0 encoding is
described in \s-1RFC 1642.\s0
.Sp
If the (global) variable \f(CW$Unicode::String::UTF7_OPTIONAL_DIRECT_CHARS\fR
is \s-1TRUE,\s0 then a wider range of characters are encoded as themselves.
It is even \s-1TRUE\s0 by default. The characters affected by this are:
.Sp
.Vb 1
\& ! " # $ % & * ; < = > @ [ ] ^ _ \` { | }
.Ve
.ie n .IP "$us\->latin1" 4
.el .IP "\f(CW$us\fR\->latin1" 4
.IX Item "$us->latin1"
.PD 0
.ie n .IP "$us\->latin1( $newval )" 4
.el .IP "\f(CW$us\fR\->latin1( \f(CW$newval\fR )" 4
.IX Item "$us->latin1( $newval )"
.PD
The string passed should be in the \s-1ISO\-8859\-1\s0 encoding. The sample \*(L"Xm\*(R" is
\&\*(L"\exB5m\*(R" in this encoding.
.Sp
Characters outside the \*(L"\ex00\*(R" .. \*(L"\exFF\*(R" range are simply removed from
the return value of the \fIlatin1()\fR method. If you want more control
over the mapping from Unicode to \s-1ISO\-8859\-1,\s0 use the \f(CW\*(C`Unicode::Map8\*(C'\fR
class. This is also the way to deal with other 8\-bit character sets.
.ie n .IP "$us\->hex" 4
.el .IP "\f(CW$us\fR\->hex" 4
.IX Item "$us->hex"
.PD 0
.ie n .IP "$us\->hex( $newval )" 4
.el .IP "\f(CW$us\fR\->hex( \f(CW$newval\fR )" 4
.IX Item "$us->hex( $newval )"
.PD
The string passed should be plain \s-1ASCII\s0 where each Unicode character
is represented by the \*(L"U+XXXX\*(R" string and separated by a single space
character. The \*(L"U+\*(R" prefix is optional when setting the value. The
sample \*(L"Xm\*(R" is \*(L"U+00b5 U+006d\*(R" in this encoding.
.SS "String Operations"
.IX Subsection "String Operations"
The following methods are available:
.ie n .IP "$us\->as_string" 4
.el .IP "\f(CW$us\fR\->as_string" 4
.IX Item "$us->as_string"
Converts a \f(CW\*(C`Unicode::String\*(C'\fR to a plain string according to the
setting of \fIstringify_as()\fR. The default \fIstringify_as()\fR encoding is
\&\*(L"utf8\*(R".
.ie n .IP "$us\->as_num" 4
.el .IP "\f(CW$us\fR\->as_num" 4
.IX Item "$us->as_num"
Converts a \f(CW\*(C`Unicode::String\*(C'\fR to a number. Currently only the digits
in the range 0x30 .. 0x39 are recognized. The plan is to eventually
support all Unicode digit characters.
.ie n .IP "$us\->as_bool" 4
.el .IP "\f(CW$us\fR\->as_bool" 4
.IX Item "$us->as_bool"
Converts a \f(CW\*(C`Unicode::String\*(C'\fR to a boolean value. Only the empty
string is \s-1FALSE. A\s0 string consisting of only the character U+0030 is
considered \s-1TRUE,\s0 even if Perl consider \*(L"0\*(R" to be \s-1FALSE.\s0
.ie n .IP "$us\->repeat( $count )" 4
.el .IP "\f(CW$us\fR\->repeat( \f(CW$count\fR )" 4
.IX Item "$us->repeat( $count )"
Returns a new \f(CW\*(C`Unicode::String\*(C'\fR where the content of \f(CW$us\fR is repeated
\&\f(CW$count\fR times. This operation is also overloaded as:
.Sp
.Vb 1
\& $us x $count
.Ve
.ie n .IP "$us\->concat( $other_string )" 4
.el .IP "\f(CW$us\fR\->concat( \f(CW$other_string\fR )" 4
.IX Item "$us->concat( $other_string )"
Concatenates the string \f(CW$us\fR and the string \f(CW$other_string\fR. If
\&\f(CW$other_string\fR is not an \f(CW\*(C`Unicode::String\*(C'\fR object, then it is first
passed to the Unicode::String\->new constructor function. This
operation is also overloaded as:
.Sp
.Vb 1
\& $us . $other_string
.Ve
.ie n .IP "$us\->append( $other_string )" 4
.el .IP "\f(CW$us\fR\->append( \f(CW$other_string\fR )" 4
.IX Item "$us->append( $other_string )"
Appends the string \f(CW$other_string\fR to the value of \f(CW$us\fR. If
\&\f(CW$other_string\fR is not an \f(CW\*(C`Unicode::String\*(C'\fR object, then it is first
passed to the Unicode::String\->new constructor function. This
operation is also overloaded as:
.Sp
.Vb 1
\& $us .= $other_string
.Ve
.ie n .IP "$us\->copy" 4
.el .IP "\f(CW$us\fR\->copy" 4
.IX Item "$us->copy"
Returns a copy of the current \f(CW\*(C`Unicode::String\*(C'\fR object. This
operation is overloaded as the assignment operator.
.ie n .IP "$us\->length" 4
.el .IP "\f(CW$us\fR\->length" 4
.IX Item "$us->length"
Returns the length of the \f(CW\*(C`Unicode::String\*(C'\fR. Surrogate pairs are
still counted as 2.
.ie n .IP "$us\->byteswap" 4
.el .IP "\f(CW$us\fR\->byteswap" 4
.IX Item "$us->byteswap"
This method will swap the bytes in the internal representation of the
\&\f(CW\*(C`Unicode::String\*(C'\fR object.
.Sp
Unicode reserve the character U+FEFF character as a byte order mark.
This works because the swapped character, U+FFFE, is reserved to not
be valid. For strings that have the byte order mark as the first
character, we can guaranty to get the byte order right with the
following code:
.Sp
.Vb 1
\& $ustr\->byteswap if $ustr\->ord == 0xFFFE;
.Ve
.ie n .IP "$us\->unpack" 4
.el .IP "\f(CW$us\fR\->unpack" 4
.IX Item "$us->unpack"
Returns a list of integers each representing an \s-1UCS\-2\s0 character code.
.ie n .IP "$us\->pack( @uchr )" 4
.el .IP "\f(CW$us\fR\->pack( \f(CW@uchr\fR )" 4
.IX Item "$us->pack( @uchr )"
Sets the value of \f(CW$us\fR as a sequence of \s-1UCS\-2\s0 characters with the
characters codes given as parameter.
.ie n .IP "$us\->ord" 4
.el .IP "\f(CW$us\fR\->ord" 4
.IX Item "$us->ord"
Returns the character code of the first character in \f(CW$us\fR. The \fIord()\fR
method deals with surrogate pairs, which gives us a result-range of
0x0 .. 0x10FFFF. If the \f(CW$us\fR string is empty, undef is returned.
.ie n .IP "$us\->chr( $code )" 4
.el .IP "\f(CW$us\fR\->chr( \f(CW$code\fR )" 4
.IX Item "$us->chr( $code )"
Sets the value of \f(CW$us\fR to be a string containing the character assigned
code \f(CW$code\fR. The argument \f(CW$code\fR must be an integer in the range 0x0
\&.. 0x10FFFF. If the code is greater than 0xFFFF then a surrogate pair
created.
.ie n .IP "$us\->name" 4
.el .IP "\f(CW$us\fR\->name" 4
.IX Item "$us->name"
In scalar context returns the official Unicode name of the first
character in \f(CW$us\fR. In array context returns the name of all characters
in \f(CW$us\fR. Also see Unicode::CharName.
.ie n .IP "$us\->substr( $offset )" 4
.el .IP "\f(CW$us\fR\->substr( \f(CW$offset\fR )" 4
.IX Item "$us->substr( $offset )"
.PD 0
.ie n .IP "$us\->substr( $offset, $length )" 4
.el .IP "\f(CW$us\fR\->substr( \f(CW$offset\fR, \f(CW$length\fR )" 4
.IX Item "$us->substr( $offset, $length )"
.ie n .IP "$us\->substr( $offset, $length, $subst )" 4
.el .IP "\f(CW$us\fR\->substr( \f(CW$offset\fR, \f(CW$length\fR, \f(CW$subst\fR )" 4
.IX Item "$us->substr( $offset, $length, $subst )"
.PD
Returns a sub-string of \f(CW$us\fR. Works similar to the builtin \fIsubstr()\fR
function.
.ie n .IP "$us\->index( $other )" 4
.el .IP "\f(CW$us\fR\->index( \f(CW$other\fR )" 4
.IX Item "$us->index( $other )"
.PD 0
.ie n .IP "$us\->index( $other, $pos )" 4
.el .IP "\f(CW$us\fR\->index( \f(CW$other\fR, \f(CW$pos\fR )" 4
.IX Item "$us->index( $other, $pos )"
.PD
Locates the position of \f(CW$other\fR within \f(CW$us\fR, possibly starting the
search at position \f(CW$pos\fR.
.ie n .IP "$us\->chop" 4
.el .IP "\f(CW$us\fR\->chop" 4
.IX Item "$us->chop"
Chops off the last character of \f(CW$us\fR and returns it (as a
\&\f(CW\*(C`Unicode::String\*(C'\fR object).
.SH "FUNCTIONS"
.IX Header "FUNCTIONS"
The following functions are provided. None of these are exported by default.
.ie n .IP "byteswap2( $str, ... )" 4
.el .IP "byteswap2( \f(CW$str\fR, ... )" 4
.IX Item "byteswap2( $str, ... )"
This function will swap 2 and 2 bytes in the strings passed as
arguments. If this function is called in void context,
then it will modify its arguments in-place. Otherwise, the swapped
strings are returned.
.ie n .IP "byteswap4( $str, ... )" 4
.el .IP "byteswap4( \f(CW$str\fR, ... )" 4
.IX Item "byteswap4( $str, ... )"
The byteswap4 function works similar to byteswap2, but will reverse
the order of 4 and 4 bytes.
.ie n .IP "latin1( $str )" 4
.el .IP "latin1( \f(CW$str\fR )" 4
.IX Item "latin1( $str )"
.PD 0
.ie n .IP "utf7( $str )" 4
.el .IP "utf7( \f(CW$str\fR )" 4
.IX Item "utf7( $str )"
.ie n .IP "utf8( $str )" 4
.el .IP "utf8( \f(CW$str\fR )" 4
.IX Item "utf8( $str )"
.ie n .IP "utf16le( $str )" 4
.el .IP "utf16le( \f(CW$str\fR )" 4
.IX Item "utf16le( $str )"
.ie n .IP "utf16be( $str )" 4
.el .IP "utf16be( \f(CW$str\fR )" 4
.IX Item "utf16be( $str )"
.ie n .IP "utf32le( $str )" 4
.el .IP "utf32le( \f(CW$str\fR )" 4
.IX Item "utf32le( $str )"
.ie n .IP "utf32be( $str )" 4
.el .IP "utf32be( \f(CW$str\fR )" 4
.IX Item "utf32be( $str )"
.PD
Constructor functions for the various Unicode encodings. These return
new \f(CW\*(C`Unicode::String\*(C'\fR objects. The provided argument should be
encoded correspondingly.
.ie n .IP "uhex( $str )" 4
.el .IP "uhex( \f(CW$str\fR )" 4
.IX Item "uhex( $str )"
Constructs a new \f(CW\*(C`Unicode::String\*(C'\fR object from a string of hex
values. See \fIhex()\fR method above for description of the format.
.ie n .IP "uchar( $num )" 4
.el .IP "uchar( \f(CW$num\fR )" 4
.IX Item "uchar( $num )"
Constructs a new one character \f(CW\*(C`Unicode::String\*(C'\fR object from a
Unicode character code. This works similar to perl's builtin \fIchr()\fR
function.
.SH "SEE ALSO"
.IX Header "SEE ALSO"
Unicode::CharName,
Unicode::Map8
.PP
<http://www.unicode.org/>
.PP
perlunicode
.SH "COPYRIGHT"
.IX Header "COPYRIGHT"
Copyright 1997\-2000,2005 Gisle Aas.
.PP
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
.SH "POD ERRORS"
.IX Header "POD ERRORS"
Hey! \fBThe above document had some coding errors, which are explained below:\fR
.IP "Around line 600:" 4
.IX Item "Around line 600:"
Non-ASCII character seen before =encoding in '\*(L"Xm\*(R"'. Assuming \s-1ISO8859\-1\s0