Sort and Swap

Sort and Swap technique is used to make smaller ASCIIs value occur more often then bigger ASCIIs value.

This technique is also supposedly making file content more compressible like Flattening, the compression methods which take advantage of this technique are Variable Bit Count (VBC) and Grouping. The other methods, such as RLE, Huffman, Sliding Window, and LZW do not take any advantage of the this technique, because they do not take advantage of smaller ASCII value.

This technique is simple to encode and decode, but it has a small dictionary or bytes to be stored in the resulted target, this is used for later in decoding.

The encoding and decoding algorithm is very simple, just a few steps:

Encoding algorithm:
Example (ASCII): 2 - 1 - 2 - 3 - 0 - 2 - 3 - 1 - 1 - 2 = 10 bytes

  1. Get the Frequency of all ASCII.
  2. Original ASCII Frequency (occurrence)
    0 1
    1 3
    2 4
    3 2

  3. Sort the Frequency and the ASCII according to the frequency, highest on top.
  4. Sorted ASCII Sorted Frequency
    2 4
    1 3
    3 2
    0 1

  5. Use the sorted ASCII to make a swap ASCII table.
  6. Sorted ASCII Swap ASCII table
    2 0
    1 1
    3 2
    0 3

  7. Loop all the way through content, replace all occurrence of the SortedASCII with Swap ASCII table
    Original ASCII New ASCII Value (replaced)
    2 0
    1 1
    2 0
    3 2
    0 3
    2 0
    3 2
    1 1
    1 1
    2 0

  8. Now the new ASCII value (new updated content) frequency looks like this:
  9. Original ASCII Frequency (occurrence)
    0 4
    1 3
    2 3
    3 1

  10. Store the Sorted ASCII as dictionary for decoding later, so our new file content is =
    Dictionary + New ASCII Value =
    2 - 1 - 3 - 0 + 0 - 1 -0 - 2 - 3 - 0 - 2 -1 - 1 - 0 = 4 + 10 = 14 bytes

Decoding algorithm:

  1. Get the dictionary (in this example is 4 bytes, put a dictionary length if needed).
  2. Get the content bytes (in this example is 10 bytes, put a content length if needed).
  3. Direct replace the content according dictionary.
    Every 0 replace by 2,
    Every 1 replace by 1,
    Every 2 replace by 3,
    Every 3 replace by 0.

That's it, very easy really.

You got any question about this technique ? or maybe suggestion ? please contact me.

 


Author Site Map Disclaimer
HMaxF Ultimate Recursive Lossless Compression Research
2001 - 2003 (c) All Rights Reserved.