Manual Page Result
0
Command: mbrtowc | Section: 3 | Source: OpenBSD | File: mbrtowc.3
MBRTOWC(3) FreeBSD Library Functions Manual MBRTOWC(3)
NAME
mbrtowc, mbrtoc32 - convert a multibyte character to a wide character
(restartable)
SYNOPSIS
#include <wchar.h>
size_t
mbrtowc(wchar_t * restrict wc, const char * restrict s, size_t n,
mbstate_t * restrict mbs);
#include <uchar.h>
size_t
mbrtoc32(char32_t * restrict wc, const char * restrict s, size_t n,
mbstate_t * restrict mbs);
DESCRIPTION
The mbrtowc() and mbrtoc32() functions examine at most n bytes of the
multibyte character byte string pointed to by s, convert those bytes to a
wide character, and store the wide character into *wc if wc is not NULL
and s points to a valid character.
Conversion happens in accordance with the conversion state *mbs, which
must be initialized to zero before the application's first call to
mbrtowc() or mbrtoc32(). If the previous call did not return (size_t)-1,
mbs can safely be reused without reinitialization.
The input encoding that mbrtowc() and mbrtoc32() use for s is determined
by the LC_CTYPE category of the current locale. If the locale is changed
without reinitialization of *mbs, the behaviour is undefined.
Unlike mbtowc(3), mbrtowc() and mbrtoc32() accept an incomplete byte
sequence pointed to by s which does not form a complete character but is
potentially part of a valid character. In this case, both functions
consume all such bytes. The conversion state saved in *mbs will be used
to restart the suspended conversion during the next call.
On systems other than OpenBSD that support state-dependent encodings, s
may point to a special sequence of bytes called a "shift sequence".
Shift sequences switch between character code sets available within an
encoding scheme. One encoding scheme using shift sequences is ISO/IEC
2022-JP, which can switch e.g. from ASCII (which uses one byte per
character) to JIS X 0208 (which uses two bytes per character). Shift
sequence bytes correspond to no individual wide character, so mbrtowc()
and mbrtoc32() treat them as if they were part of the subsequent
multibyte character. Therefore they do contribute to the number of bytes
in the multibyte character.
The following arguments cause special processing:
wc == NULL The conversion from a multibyte character to a wide
character is performed and the conversion state may be
affected, but the resulting wide character is discarded.
This can be used to find out how many bytes are contained
in the multibyte character pointed to by s.
s == NULL The arguments wc and n are ignored and starting or
continuing the conversion with an empty string is
attempted, discarding the conversion result. If conversion
succeeds, this call always returns zero. Unlike mbtowc(3),
the value returned does not indicate whether the current
encoding of the locale is state-dependent, i.e. uses shift
sequences.
mbs == NULL mbrtowc() and mbrtoc32() each use their own internal state
object instead of the mbs argument. Both internal state
objects are initialized at startup time of the program, and
no other libc function ever changes either of them.
If mbrtowc() or mbrtoc32() is called with a NULL mbs
argument and that call returns (size_t)-1, the internal
conversion state of the respective function becomes
permanently undefined and there is no way to reset it to
any defined state. Consequently, after such a mishap, it
is not safe to call the same function with a NULL mbs
argument ever again until the program is terminated.
RETURN VALUES
0 The bytes pointed to by s form a terminating NUL character.
If wc is not NULL, a NUL wide character has been stored in
the wchar_t object pointed to by wc.
positive s points to a valid character, and the value returned is
the number of bytes completing the character. If wc is not
NULL, the corresponding wide character has been stored in
the wchar_t object pointed to by wc.
(size_t)-1 s points to an illegal byte sequence which does not form a
valid multibyte character in the current locale, or mbs
points to an invalid or uninitialized object. errno is set
to EILSEQ or EINVAL, respectively. The conversion state
object pointed to by mbs is left in an undefined state and
must be reinitialized before being used again.
Because applications using mbrtowc() or mbrtoc32() are
shielded from the specifics of the multibyte character
encoding scheme, it is impossible to repair byte sequences
containing encoding errors. Such byte sequences must be
treated as invalid and potentially malicious input.
Applications must stop processing the byte string pointed
to by s and either discard any wide characters already
converted, or cope with truncated input.
(size_t)-2 s points to an incomplete byte sequence of length n which
has been consumed and contains part of a valid multibyte
character. The character may be completed by calling the
same function again with s pointing to one or more
subsequent bytes of the multibyte character and mbs
pointing to the conversion state object used during
conversion of the incomplete byte sequence.
(size_t)-3 The next character resulting from a previous call has been
stored into wc, without consuming any additional bytes from
s. This never happens for mbrtowc(), and on OpenBSD, it
never happens for mbrtoc32() either.
ERRORS
mbrtowc() and mbrtoc32() cause an error in the following cases:
[EILSEQ] s points to an invalid multibyte character.
[EINVAL] mbs points to an invalid or uninitialized mbstate_t
object.
SEE ALSO
mbrlen(3), mbtowc(3), setlocale(3), wcrtomb(3)
STANDARDS
mbrtowc() conforms to ISO/IEC 9899/AMD1:1995 ("ISO C90, Amendment 1").
The restrict qualifier was added at ISO/IEC 9899:1999 ("ISO C99").
mbrtoc32() conforms to ISO/IEC 9899:2011 ("ISO C11").
HISTORY
mbrtowc() has been available since OpenBSD 3.8 and has provided support
for UTF-8 since OpenBSD 4.8.
mbrtoc32() has been available since OpenBSD 7.4.
CAVEATS
mbrtowc() and mbrtoc32() are not suitable for programs that care about
internals of the character encoding scheme used by the byte string
pointed to by s.
It is possible that these functions fail because of locale configuration
errors. An "invalid" character sequence may simply be encoded in a
different encoding than that of the current locale.
The special cases for s == NULL and mbs == NULL do not make any sense.
Instead of passing NULL for mbs, mbtowc(3) can be used.
Earlier versions of this man page implied that calling mbrtowc() with a
NULL s argument would always set mbs to the initial conversion state.
But this is true only if the previous call to mbrtowc() using mbs did not
return (size_t)-1 or (size_t)-2. It is recommended to zero the mbstate_t
object instead.
FreeBSD 14.1-RELEASE-p8 September 12, 2023 FreeBSD 14.1-RELEASE-p8