Data Compression and Decompression Algorithms

Category: Computer Science, Data
Last Updated: 27 Jan 2021
Pages: 6 Views: 205
Table of contents

INTRODUCTION

Data compression is a common requirement for most of the computerized applications. There are a number of data compression algorithms, which are dedicated to compressing different data formats. Even for a single data type, there are a number of different compression algorithms, which use different approaches. This paper examines lossless data compression algorithms.

1. DATA COMPRESSION: In computer science, data compression involves encoding information using fewer bits than the original representation. Compression is useful because it helps reduce the consumption of resources such as data space or transmission capacity. Because compressed data must be decompressed to be used, this extra processing imposes computational or other costs through decompression.

Order custom essay Data Compression and Decompression Algorithms with free plagiarism report

feat icon 450+ experts on 30 subjects feat icon Starting from 3 hours delivery
Get Essay Help

1. 1 Classification of Compression:

  • a) Static/non-adaptive compression.
  • b) Dynamic/adaptive compression.
  • c) Static/Non-adaptive. Compression: A static method is one in which the mapping from the set of messages to the set of codewords is fixed before transmission begins so that a given message is represented by the same codeword every time it appears in the message ensemble. The classic static defined-word scheme is Huffman coding.
  • d) Dynamic/adaptive compression: A code is dynamic if the mapping from the set of messages to the set of codewords changes over time.

2. 2 Data Compression Methods:

  • a) Lossless Compression: Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in Lossless compression is possible because most real-world data has statistical redundancy. For example, an image may have areas of color that do not change over several pixels; instead of coding "red pixel, red pixel, ... the data may be encoded as "279 red pixels". Lossless compression is used in cases where it is important that the original and the decompressed data be identical, or where deviations from the original data could be deleterious. Typical examples are executable programs, text documents, and source code. Some image file formats, like PNG or GIF, use only lossless compression
  • b) Lossy Compression: In information technology, lossy compression is a data encoding method that compresses data by discarding (losing) some of it. The procedure aims to minimize the amount of data that needs to be held, handled, and/or transmitted by a computer. Lossy compression is most commonly used to compress multimedia data (audio, video, and still images), especially in applications such as streaming media and internet telephony. If we take a photo of a sunset over the sea, for example, there are going to be groups of pixels with the same color value, which can be reduced. Lossy algorithms tend to be more complex, as a result, they achieve better results for bitmaps and can accommodate for the loss of data. The compressed file is an estimation of the original data. One of the disadvantages of lossy compression is that if the compressed file keeps being compressed, then the quality will be degraded drastically.

3. Lossless Compression Algorithms: Run-Length Encoding(RLE): RLE stands for Run Length Encoding. It is a lossless algorithm that only offers decent compression ratios in specific types of data. How RLE works: RLE is probably the easiest compression algorithm. It replaces sequences of the same data values within a file by a count number and a single value. It is important to know that there are many different run-length encoding schemes. The above example has just been used to demonstrate the basic principle of RLE encoding. Sometimes the implementation of RLE is adapted to the type of data that is being compressed.

4. Complexity and Data Compression: We’re used to talking about the complexity of an algorithm measuring time and we usually try to find the fastest implementation, like in search algorithms. Here it is not so important to compress data quickly but to compress as much as possible so the output is as small as possible without losing data. A great feature of run-length encoding is that this algorithm is easy to implement.

5. Advantages and disadvantages: This algorithm is very easy to implement and does not require much CPU horsepower. RLE compression is only efficient with files that contain lots of repetitive data. These can be text files if they contain lots of spaces for indenting but line-art images that contain large white or black areas are far more suitable. Computer-generated color images (e. g. architectural drawings) can also give fair compression ratios. Where is RLE compression used? RLE compression can be used in the following file formats: PDF files

6. HUFFMAN CODING: Huffman coding is a popular method for compressing data with variable-length codes. Given a set of data symbols (an alphabet) and their frequencies of occurrence (or, equivalently, their probabilities), the method constructs a set of variable-length codewords with the shortest average length and assigns them to the symbols. Huffman coding serves as the basis for several applications implemented on popular platforms. Some programs use just the Huffman method, while others use it as one step in a multistep compression process.

7. Huffman Encoding: The Huffman encoding algorithm starts by constructing a list of all the alphabet symbols in descending order of their probabilities. It then constructs, from the bottom up, a binary tree with a symbol at every leaf. This is done in steps, where at each step two symbols with the smallest probabilities are selected, added to the top of the partial tree, deleted from the list, and replaced with an auxiliary symbol representing the two original symbols. When the list is reduced to just one auxiliary symbol (representing the entire alphabet), the tree is complete. The tree is then traversed to determine the codewords of the symbols.  BCA is in the Dictionary. BCAA is not in the Dictionary; insert it.

8. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is in the Dictionary. BCAAB is not in the Dictionary; insert it. LZ78 Compression : No of bits transmitted: Uncompressed String: ABBCBCABABCAABCAAB

Number of bits = Total number of characters * 8 = 18 * 8 = 144 bits

Suppose the codewords are indexed starting from 1:

Compressed string( codewords): (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B)

Codeword index 1 2 3 4 5 6 7.

Each code word consists of an integer and a character:

The character is represented by 8 bits. The number of bits n required to represent the integer part of the codeword with index i is given by:

Codeword (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) index 1 2 3 4 5 6 7

Bits: (1 + 8) + (1 + 8) + (2 + 8) + (2 + 8) + (3 + 8) + (3 + 8) + (3 + 8) = 71 bits

The actual compressed message is: 0A0B10C11A010A100A110B

9. Decompression Algorithm: Dictionary empty

Published by Terry Welch in 1984it basically applies the LZSS principle of not explicitly transmitting the next nonmatching symbol to the LZ78 algorithm. The only remaining output of this improved algorithm is fixed-length references to the dictionary (indexes). If the message to be encoded consists of only one character, LZW outputs the code for this character; otherwise, it inserts two- or multi-character, overlapping, distinct patterns of the message to be encoded in a Dictionary. Overlapping: The last character of a pattern is the first character of the next pattern.

10. Algorithm:

Initialize Dictionary with 256 single character strings and their corresponding ASCII codes; Prefix first input character; CodeWord 256; while(not end of character stream){ Char next input character; if(Prefix + Char exists in the Dictionary) Prefix Prefix + Char; else{ Output: the code for Prefix; insertInDictionary( (CodeWord , Prefix + Char) ) ; CodeWord++; Prefix Char; } } Output: the code for Prefix; Example : Compression using LZW Encode the string BABAABAAA by the LZW encoding algorithm. 1. BA is not in the Dictionary; insert BA, output the code for its prefix: code(B) 2.

AB is not in the Dictionary; insert AB, output the code for its prefix: code(A) 3. BA is in the Dictionary. BAA is not in Dictionary; insert BAA, output the code for its prefix: code(BA) 4. AB is in the Dictionary. ABA is not in the Dictionary; insert ABA, output the code for its prefix: code(AB) 5. AA is not in the Dictionary; insert AA, output the code for its prefix: code(A) 6. AA is in the Dictionary and it is the last pattern; output its code: code(AA) Compressed message: The compressed message is: <66><65><256><257><65><260> LZW: Number of bits transmitted

11. Decoding algorithm: Initialize Dictionary with 256 ASCII codes and corresponding single character strings as their translations; PreviousCodeWord first input code; Output: string(PreviousCodeWord) ;

Char character(first input code); CodeWord 256; while(not end of code stream){ CurrentCodeWord next input code ; if(CurrentCodeWord exists in the Dictionary) String string(CurrentCodeWord) ; else String string(PreviousCodeWord) + Char ; Output: String; Char first character of String ; insertInDictionary( (CodeWord , string(PreviousCodeWord) + Char ) ); PreviousCodeWord CurrentCodeWord ; CodeWord++ ; } Summary of LZW decoding algorithm: output: string(first CodeWord); while(there are more CodeWords){ if(CurrentCodeWord is in the Dictionary) output: string(CurrentCodeWord); else utput: PreviousOutput + PreviousOutput first character; insert in the Dictionary: PreviousOutput + CurrentOutput first character; } Example : LZW Decompression Use LZW to decompress the output sequence <66> <65> <256> <257> <65> <260> 1. 66 is in Dictionary; output string(66) i. e. B 2. 65 is in Dictionary; output string(65) i. e. A, insert BA 3. 256 is in Dictionary; output string(256) i. e. BA, insert AB 4. 257 is in Dictionary; output string(257) i. e. AB, insert BAA 5. 65 is in Dictionary; output string(65) i. e. A, insert ABA 6. 60 is not in Dictionary; output previous output + previous output first character: AA, insert AA

Reference

  1. http://www.sqa.org.uk/e-learning/BitVect01CD/page_86.htm
  2. http://www.gukewen.sdu.edu.cn/panrj/courses/mm08.pdf
  3. http://www.cs.cmu.edu/~guyb/realworld/compression.pdf
  4. http://www.stoimen.com/blog/2012/01/09/computer-algorithms-data-compression-with-run-length-encoding/
  5. http://www.ics.uci.edu/~dan/pubs/DC-Sec1.html#Sec_1
  6. http://www.prepressure.com/library/compression_algorithms/flatedeflate
  7. http://en.wikipedia.org/wiki/Data_compression

Cite this Page

Data Compression and Decompression Algorithms. (2016, Dec 21). Retrieved from https://phdessay.com/data-compression-and-decompression-algorithms/

Don't let plagiarism ruin your grade

Run a free check or have your essay done for you

plagiarism ruin image

We use cookies to give you the best experience possible. By continuing we’ll assume you’re on board with our cookie policy

Save time and let our verified experts help you.

Hire writer