JPEG baseline compression

and a DAKX comparison

By far the most common type of JPEG compression in current use is the 1992 CCITT (Consultative Committee International on Telephone and Telegraphy, now called ITU-T) T.81 standard "Baseline Sequential" method. The T.81 standard, available in engineering libraries or by mail from any of the member-country standards organizations (ANSI in the US) provides for variable-quality compression of 8 or 12 bit color or black and white images. Practically all continuous-tone images on the World Wide Web are compressed using this method. A lossless compression method is also specified in the T.81 standard, which has not proved to particularly popular. Recently a new "low complexity" lossless method has been specified, based on patented arithmetic techniques.

In the baseline sequential method, an image is broken into tiles, 8 pixels wide and tall. Each 64 pixel tile is fourier transformed to obtain 64 fourier coefficients. If these coefficients were transmitted and reverse-transformed at the receiving end, the original tile would be reconstructed exactly. In most natural images, however, many of the coefficients will be nearly zero. Replacement of these by zero will have little noticeable effect on the reconstructed tile. Also, some of the coefficients are more important than others. For example, the brightness (the DC coefficient) or shading across the tile (the 3 lowest AC coefficients) must be reproduced fairly accurately, whereas the texture (middle frequency AC coefficients) can be represented less accurately and the very fine detail (highest frequency AC coefficients) is often invisible to the casual observer of the image.

The coefficients are made more equally important by dividing each by a specified "quantizer". The quantizers, reflecting both the "psychovisual sensitivity curve" and the desired image quality, can be the defaults established for each color component of an image, or custom quantizers can be included with each image. Larger quantizers will result in greater compression but poorer reconstructed image quality. The T.81 standard makes no recommendations regarding quality, and hence does not specify any default quantizer tables.

The coefficients are then ordered in increasing frequency, the so-called "zig-zag" order. This is also somewhat in order of decreasing importance, even after the quantizing stage, so that transmission or storage of only the first few coefficients can result in a recognizable image. Provision is made for "progressive update" of images by first transmitting only the lower coefficients for all tiles in the image, then successively sending middle and upper "frequency bands" of coefficients, each of which progressively improves the image quality.

The T.81 standard specifies that these coefficients will be "entropy encoded" through either Huffman or "Arithmetic" encoding. The intent here is to indicate that the coding is to be lossless, and not, for example, lossy DAPCM. 3.1.56 of the standard defines entropy encoding as "A lossless procedure which converts a sequence of input symbols into a sequence of bits such that the average number of bits per symbol approaches the entropy of the input symbols." Entropy is not further defined, and must be presumed to mean the entropy calculated from the symbol probabilities within each single image, independently of their ordering. Greater compression may be possible through lossless symbol-order-dependent "non-entropy" encoding, e.g. LZW, external dictionaries, prediction-correction, or additional transformation (DAKX second-differencing). However, the standard was designed to allow low-cost hardware implementations, precluding most of these.

The DC coefficient for each tile is coded separately from the AC coefficients, in the following manner. The DC coefficient from the previous tile is first subtracted from it (a "backward difference"). Since the DC coefficients are usually highly correlated, this reduces the entropy of the DC data stream (within the class of images usually seen, in any pixel ordering). The result is a series of numbers, e.g., 1,2,-1,0,2,3,-3,..., one number for each tile in the image, which is to be compressed with a lossless entropy encoding method. Consider the following table from the T.81 standard:

                                   TABLE F.1

       ...-8  -7  -6  -5  -4  -3 -2 -1 0 1  2  3  4  5  6  7 ...
    --+----------------------------------------------------------
    0 |                                0
    1 |                             -1 1
    2 |                          -3 -2 2 3
    3 |                 -7  -6 -5 -4 4 5  6  7
    4 |  -15 -14 -13 -12 -11 -10 -9 -8 8 9 10 11 12 13 14 15
    5 |                         ...
    ...                     ...
whose rows are to be extended up to the number 11. Any positive or negative integer in the range -2047 to +2047 can be specified using its row and column index in this table. For example, -5 is located at position (3,-2). The coefficients to be encoded are mapped through this table. The above example gives
     1  is at  (1, 0) in binary (0001,00000000000)   or   (0001,0)
     2    "    (2, 0)      "    (0010,00000000000)    "   (0010,00)
    -1    "    (1,-1)      "    (0001,11111111111)    "   (0001,1)
     0    "    (0, 0)      "    (0000,00000000000)    "   (0000,)
     2    "    (2, 0)      "    (0010,00000000000)    "   (0010,00)
     3    "    (2, 1)      "    (0010,00000000001)    "   (0010,01)
    -3    "    (2,-2)      "    (0010,11111111110)    "   (0010,10)
In the third column of numbers, the first unsigned binary number can be seen to be a 4 bit PCM code for the minimum number of bits required to transmit all potential values of the second signed 12 bit binary number. The last column reflects this fact by retaining only the necessary number of bits in the second number. Under the baseline method, the first number is additionally compressed using either a default Huffman table, or optionally one provided with the image. The default table (for luminance DC coefficients) has the mapping:
    0000 ->  00
    0001 ->  010
    0010 ->  011
    0011 ->  100
    0100 ->  101
    0101 ->  110
    0110 ->  1110
    0111 ->  11110
    1000 ->  111110
    1001 ->  1111110
    1010 ->  11111110
    1011 ->  111111110
It can be seen that the default table gives additional compression, relative to the 4 bit PCM code, only when the numbers involved are most often contained in the first six rows of the table. The amount of compression is small, 2 bits for a row index of zero and 1 bit for rows 1 through 5. If very many coefficients are located above row 6 there will be no net compression and indeed a sizeable increase can occur.

The Huffman codes for the table row are now paired with the stripped binary number representing the table column index to generate the final compressed DC bitstream, in this case:

      010 0 011 00 010 1 00 011 00 011 01 011 10
     (...1. ...2.. ..-1. .0 ...2.. ...3.. ..-3..)
or 30 bits to represent these seven coefficients.

Each coefficient is thus represented by a pair of codes in the data stream, the first being a Huffman code translateable to a 4 bit PCM code, and the second an n-bit PCM code, where n is specified by the first 4 bit PCM code. The T.81 standard calls the first in each pair the "Huffman code" and the second the "extra bits". They are sometimes called, rather confusingly, the "variable length code" and the "variable length integer".

Note that most of the compression occurs in the transition to the fourth column above, through the removal of the excess sign bits of the column index. The Huffman table mapping in this example has saved an additional eight bits compared to a simple pairing of the 4 bit row index with the "extra bits" column index. In practice most JPEG images are compressed with medium-to-low qualities and rarely require indexing above row 7. In these cases the row index could be contained in 3 bit PCM and Huffman encoding would save very little indeed. On the other hand, if all the row indices are required with roughly equal probabilities, such as would occur with higher qualities or images composed of 12 bit pixels, Huffman encoding will again provide no benefit and probably increase the number of bits required. The T.81 standard itself warns that Huffman coding may not provide compression for 12 bit images.

Under DAKX compression with typical rules, the bitstream encoding these same DC coefficients might be

(Encode 1,2,-1,0,2,3,-3 with an initial field width of 2, minimum width 2)

    01  (1, keep width 2, the minimum)
    10  (expand code, increment width to 3)
    010 (2, keep width 3, since 2 can not be represented in 2 bits)
    111 (-1, width drops to 2, since -1 could be represented in 2 bits)
    00  (0, width stays at 2, the minimum width)
    10  (expand code, width now 3)
    010 (2, width stays at 3, since 2 can not be represented in 2 bits)
    011 (3, width stays at 3, since 3 can not be represented in 2 bits)
    101 (-3, width stays at 3, since -3 can not be represented in 2 bits)
which requires only 23 bits to represent the same information.

The AC coefficients are processed in a similar fashion through F.1 but with a "two-dimensional" Huffman code specifying, besides the row index, a run length of preceding zero coefficients. With average-quality images this gives very efficient compression, as many of the AC coefficients will zero. When higher quality is required, e.g. the ability to view magnified areas without detecting the block structure, this rapidly becomes less efficient, often becoming worse than no compression at all. For each image there is usually a quality above which the lossless methods are favorable.