Microsoft Store
 

UTF-8


 

UTF-8 (8-bit Unicode Transformation Format) is a lossless, variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. It uses groups of bytes to represent the Unicode standard for the alphabets of many of the world's languages. UTF-8 is especially useful for transmission over 8-bit Electronic Mail systems.

Related Topics:
Bit - Unicode Transformation Format - Variable-length - Character encoding - Unicode - Ken Thompson - Rob Pike - Electronic Mail

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

It uses one to four bytes per character, depending on the Unicode symbol. For example, only one UTF-8 byte is needed to encode the 128 US-ASCII characters in the Unicode range U+0000 to U+007F.

Related Topics:
Byte - US-ASCII

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Four bytes may seem like a lot for one character (code point); however, this is required only for code points outside the Basic Multilingual Plane, which are generally very rare. Furthermore, UTF-16 (the main alternative to UTF-8) also needs four bytes for these code points. Which is more efficient, UTF-8 or UTF-16, depends on the range of code points being used. However, the differences between different encoding schemes can become negligible with the use of traditional compression systems like DEFLATE. For short items of text where traditional algorithms do not perform well and size is important, the Standard Compression Scheme for Unicode could be considered instead.

Related Topics:
Basic Multilingual Plane - UTF-16 - DEFLATE - Standard Compression Scheme for Unicode

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The IETF (Internet Engineering Task Force) requires all Internet protocols to identify the encoding used for character data with UTF-8 as at least one supported encoding.

Related Topics:
IETF - Internet - Encoding

~ ~ ~ ~ ~ ~ ~ ~ ~ ~