Manual Page Result
0
Command: mbtowc | Section: 3 | Source: OpenBSD | File: mbtowc.3
MBTOWC(3) FreeBSD Library Functions Manual MBTOWC(3)
NAME
mbtowc - converts a multibyte character to a wide character
SYNOPSIS
#include <stdlib.h>
int
mbtowc(wchar_t * restrict pwc, const char * restrict s, size_t n);
DESCRIPTION
The mbtowc() function converts the multibyte character pointed to by s to
a wide character, and stores it in the wchar_t object pointed to by pwc.
This function may inspect at most n bytes of the array pointed to by s.
Unlike mbrtowc(3), the first n bytes pointed to by s need to form an
entire multibyte character. Otherwise, this function returns an error
and the internal state will be undefined.
If a call to mbtowc() results in an undefined internal state, parsing of
the string starting at s cannot continue, not even at a later byte, and
mbtowc() must be called with s set to NULL to reset the internal state
before it can safely be used again on a different string.
The behaviour of mbtowc() is affected by the LC_CTYPE category of the
current locale. Calling any other functions in libc never changes the
internal state of mbtowc(), except for calling setlocale(3) with the
LC_CTYPE category set to a different locale. Such setlocale(3) calls
cause the internal state of this function to be undefined.
In state-dependent encodings such as ISO/IEC 2022-JP, s may point to the
special sequence of bytes to change the shift-state. Because such
sequence bytes do not correspond to any individual wide character,
mbtowc() treats them as if they were part of the subsequent multibyte
character.
The following special cases apply to the arguments:
s == NULL mbtowc() initializes its own internal state to the initial
state, and determines whether the current encoding is
state-dependent. mbtowc() returns 0 if the encoding is
state-independent, otherwise non-zero. pwc is ignored.
pwc == NULL mbtowc() behaves just as if pwc was not NULL, including
modifications to internal state, except that the result of
the conversion is discarded. This can be used to determine
the size of the wide character representation of a
multibyte string. Another use case is a check for illegal
or incomplete multibyte sequences.
n == 0 In this case, the first n bytes of the array pointed to by
s never form a complete character and mbtowc() always
fails.
RETURN VALUES
Normally, mbtowc() returns:
0 s points to a null byte (`\0').
positive Number of bytes for the valid multibyte character pointed
to by s. There are no cases where the value returned is
greater than the value of the MB_CUR_MAX macro.
-1 s points to an invalid or an incomplete multibyte
character. errno is set to indicate the error.
When s is NULL, mbtowc() returns:
0 The current encoding is state-independent.
non-zero The current encoding is state-dependent.
EXAMPLES
The following program parses a UTF-8 string and reports encoding errors:
#include <limits.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
char s[LINE_MAX];
wchar_t wc;
int i, len;
setlocale(LC_CTYPE, "C.UTF-8");
if (fgets(s, sizeof(s), stdin) == NULL)
*s = '\0';
for (i = 0, len = 1; len != 0; i += len) {
switch (len = mbtowc(&wc, s + i, MB_CUR_MAX)) {
case 0:
printf("byte %d end of string 0x00\n", i);
break;
case -1:
printf("byte %d invalid 0x%0.2hhx\n", i, s[i]);
len = 1;
break;
default:
printf("byte %d U+%0.4X %lc\n", i, wc, wc);
break;
}
}
return 0;
}
Recovering from encoding errors and continuing to parse the rest of the
string as shown above is only possible for state-independent character
encodings. For full generality, the error handling can be modified to
reset the internal state. In that case, the rest of the string has to be
skipped if the encoding is state-dependent:
case -1:
printf("byte %d invalid 0x%0.2hhx\n", i, s[i]);
len = !mbtowc(NULL, NULL, MB_CUR_MAX);
break;
ERRORS
mbtowc() will set errno in the following cases:
[EILSEQ] s points to an invalid or incomplete multibyte
character.
SEE ALSO
mblen(3), mbrtowc(3), setlocale(3)
STANDARDS
The mbtowc() function conforms to ANSI X3.159-1989 ("ANSI C89"). The
restrict qualifier is added at ISO/IEC 9899:1999 ("ISO C99"). Setting
errno is an IEEE Std 1003.1-2008 ("POSIX.1") extension.
CAVEATS
On error, callers of mbtowc() cannot tell whether the multibyte character
was invalid or incomplete. To treat incomplete data differently from
invalid data the mbrtowc(3) function can be used instead.
FreeBSD 14.1-RELEASE-p8 November 11, 2023 FreeBSD 14.1-RELEASE-p8