Huffman code.

First of all, consult a good book on discrete mathematics or algorithms for a detailed description of Huffman codes!

We suppose a set of symbols with their frequencies, given as a list of fr(S,F) terms. Example: [fr(a,45),fr(b,13),fr(c,16),fr(e,9),fr(f,5)]. Our objective is to construct a list hc(S,C) terms, where C is the Huffman code word for the symbol S. In our example, the result could be Hs = [hc(a,'0'),hc(b,'101'),hc(c,'100'),hc(e,'1101'),hc(f,'1100')][hc(a,'10'),...etc.]. The task shall be performed by the predicate huffman/2 defined as follows:

% huffman(Fs,Hs) :- Hs is the Huffman code table for the frequency table Fs

Huffman code

Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol). Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when sich a code is not produced by Huffman's algorithm.

Informal description

Given A set of symbols and their weights (usually proportional to probabilities). Find A prefix-free binary code (a set of codewords) with minimum expected codeword length (equivalently, a tree with minimum weighted path length from the root).

Formalized description

Input Alphabet , which is the symbol alphabet of size . Tuple , which is the tuple of the (positive) symbol weights (usually proportional to probabilities), i.e. .

Output. Code , which is the tuple of (binary) codewords, where is the codeword for .

Goal. Let be the weighted path length of code . Condition: for any code .

单元测试

def test_huffman(): assert huffman([ ('a', 45), ('b', 13), ('c', 12), ('d', 16), ('e', 9), ('f', 5) ]) == [ ('a', '11'), ('b', '011'), ('c', '010'), ('d', '10'), ('e', '001'), ('f', '000') ]

解题思路

首先,将符号按权重排序。然后,构造一个二叉树,使权重小的符号处于权重大的符号的下层。最后,从树的根开始遍历,每遍历一层就在前缀一位代码。

举个例子,假设有符号a,b,c,d,e,f,其权重分别为45,13,12,16,9,5,即符号权重对列表[(a,45),(b,13),(c,12),(d,16),(e,9),(f,5)]

首先,按权重从小到大排序,得到[(f,5),(e,9),(c,12),(b,13),(d,16),(a,45)]

然后,构造二叉树。当使用多层嵌套的列表存储二叉树时,二叉树的构建就等于递归合併列表中相邻的两个元素为二叉树,直至仅剩一个元素即根。

先将f和e、c和b、d和a分别合併成二叉树,得到了三棵二叉树;

然后,将头两毎棵二叉树合併,第三棵则落单。结果得到两棵二叉树;

最后,将两个二叉树合併,得到最终的一棵二叉树。

最后,后序遍历二叉树。每向下一层就加一位代码,向左子树遍历加0,向右子树遍历则加1。以(d,16)为例,从根开始到达(d,16)需:

  1. 先访问根的左子树,所以第一位代码为0
  2. 再访问左子树,代码再加一位0,为00
  3. 到达(d,16),所以最终代码为00

代码实现

# Huffman code def huffman(fs): sortedFs = sorted(fs,key=lambda f: f[1]) tree = constructTree(sortedFs) return sorted(traversal(tree[0]),key=lambda x:x[0]) def constructTree(sortedList): if len(sortedList) ==1: return sortedList result =[] for i in range(0, len(sortedList),2): e = sortedList[i] if i+1 >= len(sortedList): result.append(e) else: result.append([e,sortedList[i+1]]) return constructTree(result) def traversal(tree): if type(tree) is tuple: # leaf return [(tree[0],'')] leftHs = [] rightHs = [] if len(tree) > 0: # has left child leftHs = [(lambda e: (e[0],'0'+e[1]))(e) for e in traversal(tree[0])] if len(tree) > 1: # has right child rightHs = [(lambda e: (e[0],'1'+e[1]))(e) for e in traversal(tree[1])] return leftHs + rightHs

参考文献

results matching ""

    No results matching ""