UTF-8
UTF-8 (8-bit Unicode Transformation Format) is a lossless, variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. It uses groups of bytes to represent the Unicode standard for the alphabets of many of the world's languages. UTF-8 is especially useful for transmission over 8-bit Electronic Mail systems. ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
\n\");}
//-->
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ It uses one to four bytes per character, depending on the Unicode symbol. For example, only one UTF-8 byte is needed to encode the 128 US-ASCII characters in the Unicode range U+0000 to U+007F. ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Four bytes may seem like a lot for one character (code point); however, this is required only for code points outside the Basic Multilingual Plane, which are generally very rare. Furthermore, UTF-16 (the main alternative to UTF-8) also needs four bytes for these code points. Which is more efficient, UTF-8 or UTF-16, depends on the range of code points being used. However, the differences between different encoding schemes can become negligible with the use of traditional compression systems like DEFLATE. For short items of text where traditional algorithms do not perform well and size is important, the Standard Compression Scheme for Unicode could be considered instead. ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ The IETF (Internet Engineering Task Force) requires all Internet protocols to identify the encoding used for character data with UTF-8 as at least one supported encoding. ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Bit: :This article is about the unit of information. See Bit (disambiguation) for other meanings.... Unicode Transformation Format: Unicode Transformation Format may refer to one of several forms:... Character encoding: A character encoding consists of a code that pairs a set of characters (representations of graphemes or grapheme-like units, such as might appear in an alphabet or syllabary for the communication of a natural language) with a set of something else, such as numbers or electrical pulses, in order to f... | ~ Table of Content ~
\n\");}
//-->
~ Related Subjects ~Bit (2) - Natural language (1) - Syllabary (1) - Storage (1) - Number (1) - Characters (1) - Set (1) - Alphabet (1) - Grapheme (1) - ASCII (1) - Telegraph key (1) - Binary (1) - Integer (1) - Computer (1) - Text (1) -~ Community ~
|
Lexicon - Contact us/Report abuse - Privacy Policy - Spiritus-Temporis.com ©2005. - stvers1 - 2012-02-11 - evol2 - 0.35