Function bstr::decode_utf8 [−][src]
pub fn decode_utf8<B: AsRef<[u8]>>(slice: B) -> (Option<char>, usize)
UTF-8 decode a single Unicode scalar value from the beginning of a slice.
When successful, the corresponding Unicode scalar value is returned along with the number of bytes it was encoded with. The number of bytes consumed for a successful decode is always between 1 and 4, inclusive.
When unsuccessful, None
is returned along with the number of bytes that
make up a maximal prefix of a valid UTF-8 code unit sequence. In this case,
the number of bytes consumed is always between 0 and 3, inclusive, where
0 is only returned when slice
is empty.
Examples
Basic usage:
use bstr::decode_utf8; // Decoding a valid codepoint. let (ch, size) = decode_utf8(b"\xE2\x98\x83"); assert_eq!(Some('☃'), ch); assert_eq!(3, size); // Decoding an incomplete codepoint. let (ch, size) = decode_utf8(b"\xE2\x98"); assert_eq!(None, ch); assert_eq!(2, size);
This example shows how to iterate over all codepoints in UTF-8 encoded bytes, while replacing invalid UTF-8 sequences with the replacement codepoint:
use bstr::{B, decode_utf8}; let mut bytes = B(b"\xE2\x98\x83\xFF\xF0\x9D\x9E\x83\xE2\x98\x61"); let mut chars = vec![]; while !bytes.is_empty() { let (ch, size) = decode_utf8(bytes); bytes = &bytes[size..]; chars.push(ch.unwrap_or('\u{FFFD}')); } assert_eq!(vec!['☃', '\u{FFFD}', '𝞃', '\u{FFFD}', 'a'], chars);