Microsoft Store
 

Character encoding


 

A character encoding consists of a code that pairs a set of characters (representations of graphemes or grapheme-like units, such as might appear in an alphabet or syllabary for the communication of a natural language) with a set of something else, such as numbers or electrical pulses, in order to facilitate the storage of text in computers and the transmission of text through telecommunication networks. Common examples include Morse code, which encodes letters of the Latin alphabet as series of long and short depressions of a telegraph key; and ASCII, which encodes letters, numerals, and other symbols, both as integers and as 7-bit binary versions of those integers.

Character repertoire

In some contexts, especially computer storage and communication, it makes sense to distinguish a character repertoire (a full set of abstract characters that a system supports) from a coded character set or character encoding (which specifies how to represent characters from that set using a number of integer codes).

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

In earlier days of computing, the introduction of character repertoires such as ASCII (1963) and EBCDIC (1964) began the process of standardisation. The limitations of such sets soon became apparent, and a number of ad-hoc methods developed to extend them. The need to support multiple writing systems, including the CJK family of East Asian scripts, required support for a far larger number of characters and demanded a systematic approach to character encoding rather than the previous ad hoc approaches.

Related Topics:
EBCDIC - Writing system - CJK

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

For example, the full repertoire of Unicode encompasses over 100,000 characters. Each of these characters has a unique integer code in the range 0 to hexadecimal 10FFFF (a little over 1.1 million, so not all integers in that range represent coded characters). Other common repertoires include ASCII and ISO 8859-1, which mirror exactly the first 128 and 256 coded characters of Unicode respectively.

Related Topics:
Unicode - Hexadecimal - ISO 8859-1

~ ~ ~ ~ ~ ~ ~ ~ ~ ~