Microsoft Store
 

Data compression


 

In computer science, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than a more obvious representation would use, through use of specific encoding schemes. For example, this article could be encoded with fewer bits if we accept the convention that the word "compression" is encoded as "comp".

Related Topics:
Computer science - Information - Bit - Encoding schemes

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

One popular instance of compression that many computer users are familiar with is the ZIP file format, which, as well as providing compression, acts as an archiver, storing many files in a single output file.

Related Topics:
ZIP file format - Archiver

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

As is the case with any form of communication, compressed data communication only works when both the sender and receiver of the information understand the encoding scheme. For example, this text makes sense only if the receiver understands that it is intended to be interpreted as characters representing the English language. Similarly, compressed data can only be understood if the decoding method is known by the receiver.

Related Topics:
Sender - Receiver - Information

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Compression is possible because most real-world data are very statistically redundant. When represented in its human-interpretable form (or in the case of text to be printed on a computer screen, a simple machine-interpretable form such as ASCII), the data are represented in a non-concise way. For example, the letter 'e' is much more common in English text than the letter 'z', and the likelihood of the letter 'q' being followed by the letter 'z' is rather remote. Analysis of these statistical behaviors can allow the same information to be represented much more concisely.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Further compression is possible if some loss of fidelity is allowable. For example, a person viewing a picture or television video scene might not notice if some of its finest details are removed or not represented perfectly. Similarly, two strings of samples representing an audio recording may sound the same but actually not be exactly the same under detailed computer analysis. Specialized signal processing techniques can take advantage of allowing relatively minor differences in order to enable representing the picture, video, or audio using fewer bits.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Compression is important because it helps reduce the consumption of expensive resources, such as disk space or connection bandwidth. However, compression requires information processing power, which can also be expensive. The design of data compression schemes therefore involves trade-offs between various factors including compression capability, any amount of introduced distortion, computational resource requirements, and often other considerations as well.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Some schemes are reversible so that the original data can be reconstructed (lossless data compression), while others accept some loss of data in order to achieve higher compression (lossy data compression).

Related Topics:
Lossless data compression - Lossy data compression

~ ~ ~ ~ ~ ~ ~ ~ ~ ~