Manual Page Result
0
Command: mbrtoc16 | Section: 3 | Source: OpenBSD | File: mbrtoc16.3
MBRTOC16(3) FreeBSD Library Functions Manual MBRTOC16(3)
NAME
mbrtoc16 - convert one UTF-8 encoded character to UTF-16
SYNOPSIS
#include <uchar.h>
size_t
mbrtoc16(char16_t * restrict pc16, const char * restrict s, size_t n,
mbstate_t * restrict mbs);
DESCRIPTION
The mbrtoc16() function examines at most n bytes of the multibyte
character byte string pointed to by s, converts those bytes to a wide
character, and encodes the wide character using UTF-16. In some cases,
it is necessary to call this function twice to convert a single
character.
Conversion happens in accordance with the conversion state *mbs, which
must be initialized to zero before the application's first call to
mbrtoc16(). For this function, *mbs stores information about both the
state of the UTF-8 input encoding and the state of the UTF-16 output
encoding. If the previous call did not return (size_t)-1, mbs can safely
be reused without reinitialization.
The input encoding that mbrtoc16() uses for s is determined by the
LC_CTYPE category of the current locale. If the locale is changed
without reinitialization of *mbs, the behaviour is undefined.
Unlike mbtowc(3), mbrtoc16() accepts an incomplete byte sequence pointed
to by s which does not form a complete character but is potentially part
of a valid character. In this case, the function consumes all such
bytes. The conversion state saved in *mbs will be used to restart the
suspended conversion during the next call.
On systems other than OpenBSD that support state-dependent encodings, s
may point to a special sequence of bytes called a "shift sequence"; see
mbrtowc(3) for details.
The following arguments cause special processing:
pc16 == NULL The conversion from a multibyte character to a wide
character is performed and the conversion state may be
affected, but the resulting wide character is discarded.
s == NULL The arguments pc16 and n are ignored and starting or
continuing the conversion with an empty string is
attempted, discarding the conversion result.
mbs == NULL An internal mbstate_t object specific to the mbrtoc16()
function is used instead of the mbs argument. This
internal object is automatically initialized at program
startup and never changed by any libc function except
mbrtoc16().
If mbrtoc16() is called with a NULL mbs argument and that
call returns (size_t)-1, the internal conversion state of
mbrtoc16() becomes permanently undefined and there is no
way to reset it to any defined state. Consequently, after
such a mishap, it is not safe to call mbrtoc16() with a
NULL mbs argument ever again until the program is
terminated.
RETURN VALUES
0 The bytes pointed to by s form a terminating NUL character.
If pc16 is not NULL, a NUL wide character has been stored
in *pc16.
positive s points to a valid character, and the value returned is
the number of bytes completing the character. If pc16 is
not NULL, the first UTF-16 code unit of the corresponding
wide character has been stored in *pc16. If it is an
UTF-16 high surrogate, the function needs to be called
again to retrieve a second UTF-16 code unit, the low
surrogate. On OpenBSD, this happens if and only if the
return value is 4, but this equivalence does not hold on
other operating systems that support input encodings other
than UTF-8.
(size_t)-1 s points to an illegal byte sequence which does not form a
valid multibyte character in the current locale, or mbs
points to an invalid or uninitialized object. errno is set
to EILSEQ or EINVAL, respectively. The conversion state
object pointed to by mbs is left in an undefined state and
must be reinitialized before being used again.
(size_t)-2 s points to an incomplete byte sequence of length n which
has been consumed and contains part of a valid multibyte
character. The character may be completed by calling the
same function again with s pointing to one or more
subsequent bytes of the multibyte character and mbs
pointing to the conversion state object used during
conversion of the incomplete byte sequence.
(size_t)-3 The second 16-bit code unit resulting from a previous call
has been stored into *pc16, without consuming any
additional bytes from s.
ERRORS
mbrtoc16() causes an error in the following cases:
[EILSEQ] s points to an invalid multibyte character.
[EINVAL] mbs points to an invalid or uninitialized mbstate_t
object.
SEE ALSO
c16rtomb(3), mbrtowc(3), setlocale(3)
STANDARDS
mbrtoc16() conforms to ISO/IEC 9899:2011 ("ISO C11").
HISTORY
mbrtoc16() has been available since OpenBSD 7.4.
CAVEATS
On operating systems other than OpenBSD that support input encodings
other than UTF-8, inspecting the return value is insufficient to tell
whether the function needs to be called again. If the return value is
positive, inspecting *pc16 is also required to make that decision.
Consequently, passing a NULL pointer for the pc16 argument is discouraged
because it can result in a well-defined but unknown output encoding
state. The simplest way to recover from such an unknown state is to
reinitialize the object pointed to by mbs.
The C11 standard only requires the pc16 argument to be encoded according
to UTF-16 if the predefined environment macro __STDC_UTF_16__ is defined
with a value of 1. On OpenBSD, <uchar.h> provides this definition.
Other operating systems which do not define __STDC_UTF_16__ could
theoretically use a different, implementation-defined output encoding for
pc16 instead of UTF-16. Writing portable code for an arbitrary output
encoding is impossible because the rules when and how often the function
needs to be called again depend on the output encoding; the rules
explained above are specific to UTF-16. Using UTF-16 as the output
encoding of wcrtoc16() becomes mandatory in C23.
FreeBSD 14.1-RELEASE-p8 August 20, 2023 FreeBSD 14.1-RELEASE-p8