Friday, 16 November 2012

Huffman Coding in Python

Here is a simple explanation for the code to encode and decode the string which you have entered by using Huffman data compression. Huffman coding is a lossless data compression based on variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol.
                          Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code that expresses the most common source symbols using shorter strings of bits that are used for less common source symbols.
    The main part of code is the tree where on each instance creates a leaf node or the internal node which depends on the parameters you are passing.
  Here I have created a tree data structure which as a value named "cargo" then a "left" and a "right" which will be "None" when the node is a leaf node.

class Tree:
  def __init__(self, cargo, left=None, right=None):
    self.cargo = cargo
    self.left  = left
    self.right = right

The creation of tree is a bottom up procedure by starting with leaf nodes
left=Tree(1)
right=Tree(2) 
Then we can assign an intermediate node by using left and right as the second and third argument for Class Tree
Tree(1,left,right)
This Tree structure can be called recursively to get a multilevel binary tree
 Tree(1,Tree(1),Tree(2))
Here is an step by step description for the huffman code:

Step  1: First create a Tree data structure as explained earlier

Step  2: Read the string character by character and create a List of tuples which   contains the character and its number of occurrence which is its weight.

Step  3: The tuples can be made the leaf node by creating instance for each tuple without specifying the left and right parameters.

Step  4: After creating leaf node sort list of nodes with weights as the key. This can be done by using sorted method passing key as second element of the tuple

Step  5: Pop least two elements (tree) and add the value and the concatenate the characters.

Step  6: Again sort to get the least two elements.

Step  7: Repeat the process from 4 to 6 until all the characters without repetition are received

By previous 7 steps we have created the Tree for the given string. Now by using the Tree we can traverse to encode the string.

Algorithm for function encode

Step  1: Pass the tree, empty string and each characters to the function
Step  2: Check weather the code is in the root if "yes" then check the left or right node traverse the tree as per the condition gets true until reaches the leaf node.

Step  3: For the left node traversal concatenate the empty string with "0" and for the right concatenate "1"

Step  4: The functions returns the code when reaches the leaf node.

Algorithm for function decode:

Step  1: Pass the tree, the code and the empty string to the function.

Step  2: The copy for the tree is created so that after finding each characters then the tree can be reassigned so as to traverse for the next symbol.

Step  3: The statements are similar as that of decoding but the condition checks for each 1's and 0's in the string.

Step  4: For finding character of each code the empty string is joined.

Step  5: returns the string for the corresponding code.  

This simple code can be tested for the huffman code for a string or a sentence. The checking for decode can be found from the tree for which it has been used for encoding.For each character a unique code is assigned and that code can be used to decode to get the corresponding characters.



                                                   Thank you

No comments:

Post a Comment