Network Systems DesignLine | Compression/decompression tradeoffs for data networking and storage

Get the latest news, products and how-to information on network systems. Sign up for the Network Systems DesignLine newsletter, a weekly e-mail guide dedicated to the needs of engineers developing networking equipment and components. Here is our RSS feed.








 Network Systems DesignLine » How-To » Enterprise Networking

 
 HOW-TO : Enterprise Networking

Compression/decompression tradeoffs for data networking and storage

Several design trade-offs to exist when building a high performance lossless data compression engine. Each of these trade-offs can vary the gate area of the end design greatly and have significant impact on the overall efficiency of the compressor (compression ratio).
Print This Story Send As Email Discuss This Story Reprints

Page 1 of 2

Network Systems Designline

Rate this article
WORSE | BETTER
1 2 3 4 5
Hardware vs. software compression
Compression, as the name implies, squeezes or "compresses" the size of a file or data set. Compression techniques are used for voice, video, audio, text, and program data in hundreds of different applications and product types. Much of the compression and subsequent decompression processing performed in products today is accomplished by a CPU running any number of different algorithms that have been specifically designed to efficiently reduce the size of a file or data set.

Unfortunately, all compression algorithms require the entire file to be examined and processed. Scanning through a file, especially a large file, is something that CPUs are generally not that good at doing. For applications that do not require compression and decompression to be accomplished particularly fast, a CPU is sufficient. When it comes to higher performance applications such as those involved with Internet file transfers or large data backups, CPU-based compression is not sufficient.

Hardware compression offers a significant improvement in the rate at which data can be compressed and decompressed. For example, benchmarks of the popular GZIP data compression routine on an 3Ghz Pentium class CPU result in maximum data rates of approximately 200Mb/s. Comparatively, hardware compression can achieve data rates of 2Gb/s or greater. With a 10X or more speedup in compression processing, it's clear that dedicating a hardware engine or co-processor to perform this function will result in much greater performance.

There is typically an overhead associated with deploying dedicated processing hardware, however. This overhead is most often observed in the form of additional power and area. There are a number of trade-offs that designers can make when deploying dedicated compression hardware that will help minimize the overhead while achieving the desired improvements in compression processing performance. This paper examines some of the specific trade-offs that a designer can make when deploying hardware for lossless data compression.

Lossless data compression
Lossless data compression is a form or class of compression algorithms that allows a stream of data bits to be compressed and then later uncompressed such that the original data stream is fully recovered. Lossless data compression is frequently used in data networking and storage applications where exact reconstruction of the original data is critical to the application. There are several examples of lossless data compression utilities with one of the most popular examples being GNU ZIP (GZIP). The GZIP utility and many other similar utilities such as PKZIP, WINZIP, and PNG are based on the "deflate" algorithm with the format defined in Internet Engineering Task Force RFC1951: DEFLATE Compressed Data Format Specification version 1.3 . The standard references the use of the LZ77 (Lempil-Ziv, 1977) compression algorithm combined with Huffman encoding.

The deflate standard is in contrast to lossy compression techniques that are often used in audio and video decoders where human perception of potential data loss is leveraged to improve the compression efficiency. Popular compression algorithms such as MPEG, JPEG, and MP3 are examples of lossy compression.

How "deflate" works
Compression using deflate is performed on a block-by-block basis. Uncompressed blocks can be an arbitrary length up to 64K bytes long. The algorithm uses a sliding 32K byte window to examine the data stream for repeating patterns. The sliding window is often referred to as the history window, or history table, and may cross block boundaries. When a character string matches a pattern previously detected in the 32KB window, the string is replaced with a pointer representing the backward distance of the matching string in the history table, and the length of the matching string. The matched strings are limited to a maximum of 258 bytes in length and substitution only occurs for strings of length greater than three bytes. When a matching string is not found, the original byte, called a literal, is left in the compressed data stream. The resulting output of this compression step is a series of literals and length/distance pairs (See Figure 1).


Figure 1. LZ77 history-based compression

Huffman trees are a very bit-efficient mechanism for encoding this information. Huffman encoding is used to encode the literals as well as the lengths and distances and uses the rate of occurrence of a particular symbol to define a unique variable-length bit code. Two Huffman tables are used in the deflate algorithm: one to encode the lengths and literals and the other to encode the distance values. The two Huffman tables themselves are encoded (compressed) using Huffman encoding and the resulting Huffman tables are pre-pended to the compressed data stream to form the compressed block output of a deflate- compliant compressor. The Huffman tables generated for any one compressed block are independent of the previous block's Huffman tables.

Where "deflate" is used While the deflate algorithm's compression efficiency is variable based on the uncompressed data stream, common benchmarks using text-based files show compression ratios in the 2.5:1 to 3.5:1 range. Compression of this magnitude has clear operational advantages in both storage and data networking.

In storage, the reduction of the stored data file size corresponds to a reduction in the media sizes required to store the file. This is especially useful in data storage backup systems such as tape backup, VTL (virtual tape libraries), disk backup systems and Flash-based disk drives.

In data networking, the reduction of transmitted file size corresponds directly to a reduction in the network bandwidth required to transmit the file. This is useful in Web servers, Web server appliances, and even routers where HTTP (hypertext transfer protocol) compression is employed. HTTP compression provides for the dynamic negotiation of compression to be used between a Web client and a Web server. Enterprise networks with remote offices using low bandwidth connections can benefit greatly from using HTTP compression.



Page 2: next page Print This Story Send As Email Discuss This Story Reprints

Page 1 | 2


 
eSearch  

 Top 5 Most Read
 How-To Stories
1. 2. 3. 4. 5.

 Top 5 Most Read
 News Stories
1. 2. 3. 4. 5.

  • Introduction to Optical Transmission Systems

  • Optimizing Embedded Systems for Broadband 10 Gigabit Ethernet Connectivity

  • Interfacing a DS3231 with an 8051-Type Microcontroller

  • The entire library >>  

     
     Top 5 Most Read
     Product Stories
    1. 2. 3. 4. 5.

     Sponsor

    EE Times TechCareers
    Search Jobs

    Enter Keyword(s):


    Function:


    State:
      

    Post Your Resume
    -----------------
    Employers Area
    Most Recent Posts More career-related news, resources and job postings for technology professionals

     Tech Library
    ¤ Looking for the appropriate Industry Association? This comprehensive, up-to-date list will take you to the right Web site for the help you need.

    ¤ Got a question about a standard? Here are direct links to resources detailing the industry's most important communications standards.

    ¤ Freshen up on technology, new and old, with these links to interesting and informative tutorials.

    More from TechLibrary

    Welcome to our DesignLine network of web communities. On these sites, we provide practical how-to technical information for engineers and engineering managers involved in Automotive,audio, DSP, DTV, EDA, Industrial Control, Mobile Handset, Power Management, Programmable Logic,RF,Video, and Wireless networking design. Check out the sites and let us know your thoughts.
     



    Career Center | CommsDesign.com | Embedded.com | EE Times | TechOnline
    Planet Analog | DeepChip | eeProductCenter | Electronic Supply & Manufacturing | Webinars