Next generation image compression standards: JPEG XR and AIC

1
Multimedia Signal Processing Group
Swiss Federal Institute of Technology
Next generation imageNext generation image
compression standards:compression standards:
JPEG XR and AICJPEG XR and AIC
Touradj Ebrahimi
Touradj.Ebrahimi@epfl.chTouradj.Ebrahimi@epfl.ch
Invited paper at: Mobile Multimedia/Image Processing, Security, and ApplicationsInvited paper at: Mobile Multimedia/Image Processing, Security, and Applications
2009, SPIE Defense, Security and Sensing Symposium, Orlando, FL, April 13-17,2009, SPIE Defense, Security and Sensing Symposium, Orlando, FL, April 13-17,
2009.2009.

2
• Introduction
• Overview of JPEG XR
• Codec evaluations and discussions
• Overview of AIC
Outline of the presentation

3
• JPEG image compression has been among the most
successful standards ever:
– Used in digital photography, Internet imaging, …
– Based on a 25 years old technology (DCT, Huffmann coding,
…)
• JPEG 2000 image compression has been proven to be
the most efficient image compression standard ever:
– Used in digital cinema, remote sensing, …
– Based on a more recent technology (Wavelets, Arithmetic
coding, …)
• Why yet another image compression standard?
Introduction

4
JPEG XR overview
• Under standardization by JPEG committee
(currently at FDIS level)
• Starting point is Microsoft’s proprietary
compression scheme HD Photo
• Mainly aimed at digital photography applications
where JPEG 2000 penetration has been limited
due to complexity reasons
• Emphasis on High Dynamic Range (HDR)
imaging

5
JPEG XR positioning vs JPEG and JPEG 2000
Complexity
Performance
JPEG
JPEG 2000
JPEG XR

6
JPEG XR algorithm vs JPEG and JPEG 2000

7
JPEG XR compression algorithm
• JPEG XR is built around 2 central innovations
1. Reversible lapped biorthogonal transform (LBT)
2. Advanced coefficient coding
• Otherwise, it is using rather “traditional” components
– Color conversion
– Transform
– Quantization
– Coefficient prediction
– Coefficient scanning
– Entropy coding
7

8
JPEG XR transform
• JPEG XR transform consists of 2 building blocks
– Core transform (PCT) which is similar to a 4x4 DCT
– Overlap transform (POT) which is shifted to reduce blockiness
4x4 block
4x4 block
8

9
• Hierarchical transform
– Two transform stages – second stage operates on DC of 4x4 blocks of first
stage
• 3 overlap modes supported
– Mode 0 or no overlap, which reduces to 2 stages of 4x4 core (block) transform
 Lowest complexity mode
– Mode 1 – overlap operator applied only to full resolution data, and not to second stage
 Best R-D performance
– Mode 2 – overlap operator applied at both resolutions
 Best visual quality at very high compression
JPEG XR transform
T
Transform of DC
coefficients
T
9

10
JPEG XR quantization
• “Harmonic” quantization scale
– Defined by the set
{Q} U {(Q+16) 2k
}, for Q = 0…15
• Transform coefficients are equal norm
by design
– No additional scaling required during
(de)quantization to compensate for
unequal norms
QP remapping
1
10
100
1000
10000
0 20 40 60 80 100 120 140 160 180
10

11
JPEG XR coefficient prediction
• JPEG XR uses DCAC prediction
– Three levels of prediction
1. Prediction of DC values of second stage transform (DC subband)
2. Prediction of DCAC values of second stage transform (lowpass subband)
3. Prediction of DCAC values of first stage transform (highpass subband)
11

12
JPEG XR coefficient prediction
• Coefficient prediction rules
– DC prediction
 Null, top, left and mixed (mean of
top and left) allowed
 Only within-tile prediction
– Lowpass prediction
 Null, top and left allowed
 Top and left need dominant edge
signatures to be picked
– Highpass prediction
 Null, top and left allowed
 Dominant direction indicated by
lowpass values
 Only within macroblock prediction
Left prediction of highpass DCAC showing within
macroblock prediction
12

13
JPEG XR coefficient scanning
• The process of converting the 2D transform into a linear encodable list
– Also referred to as zigzag scan
• JPEG XR scan order is adaptive
– Changes as data is traversedAround 3% savings in bits over fixed scan orders
13

14
JPEG XR entropy coding
• Adaptive coefficient normalization
– Handles high-variance transform coefficient data
– Key observation:
X=Exponential(λ)
H=4.4393 bits
X/8
H=1.4970 bits
X mod 8
H=2.9423 bits
Why not use fixed length codes?
Alphabet of 8 ⇒ H=3 bits
Overall entropy = 4.4970 bits
14

15
• Adaptive coefficient normalization
– Triggered when nonzero transform coefficients happen frequently
– When triggered, additional bit stream layer “Flexbits” is generated
 Flexbits is sent “raw”, i.e. uncoded
 For lossless 8 bit compression, Flexbits may account for more than 50% of the total bits
 Flexbits forms an enhancement layer which may be omitted or truncated
15

16
SYMBOL Code 0 Code 1 Code 2 Code 3 Code 4
0 0000 1 0010 11 001 010
1 00 0001 0 0010 001 11 1
2 000 0000 00 0000 000 0000 000 0000 000 0001
3 000 0001 00 0001 000 0001 0 0001 0001
4 0 0100 0011 0 0001 0 0010 000 0010
5 010 010 010 010 011
6 0 0101 0 0011 000 0010 000 0001 0000 0000
7 1 11 011 011 0010
8 0 0110 011 100 0 0011 000 0011
9 0001 100 101 100 0011
10 0 0111 0 0001 000 0011 00 0001 0000 0001
11 011 101 0001 101 0 0001
SYMBOL Code 0 Code 1
0 010 1
1 0 0000 001
2 0010 010
3 0 0001 0001
4 0 0010 00 0001
5 1 011
6 011 0 0001
7 0 0011 000 0000
8 0011 000 0001
SYMBOL Code 0
0 10
1 001
2 0 0001
3 0001
4 11
5 010
6 0 0000
7 011
0 01 1
1 10 01
2 11 001
3 001 0001
4 0001 0 0001
5 0 0000 00 0000
6 0 0001 00 0001
SYMBOL Code 0 Code 1 Code 2 Code 3
0 1 01 0000 0 0000
1 0 0000 0000 0001 0 0001
2 001 10 01 01
3 0 0001 0001 10 1
4 01 11 11 0001
5 0001 001 001 001
0 1 1
1 01 000
2 001 001
3 0000 010
4 0001 011
SYMBOL Code 0
0 1
1 01
2 001
3 000
All code tables used in
JPEG XR for coefficient
and coded block pattern
coding, enumerated in
binary
16

17
JPEG XR bitstream layout
• Bitstream consists of header, index table and tile payloads
– Index table points to start of tile payloads
– Index entries can be up to 64 bits
• Bitstream laid out in spatial or frequency mode
– All tile payloads in an image are in the same mode
– In spatial mode, macroblock data is serialized left to right, top to bottom
– In frequency mode, tile payload is separated into four bands – DC, lowpass,
highpass and flexbits
17

18
• Codec performance evaluation in terms of:
– Compression efficiency.
– Computational requirements.
– Additional functionalities.
Codec performance evaluation
• Rate-Distortion (RD) curves = quality measure vs bit per pixel
Original
picture Output
picture
JPEG or
JPEG 2000 or
JPEG XR
HUMAN SUBJECT
(subjective QA)
or
FR METRIC
(objective QA)

19
THERE ARE NOT YET RELIABLE and STANDARD OBJECTIVE METHODS
FOR IMAGE QUALITY ASSESSMENT (QA)
• Image and video systems complexity
• Human Visual System (HVS) complexity
• Lack of standardization
Objective QA can be performed to provide a first
comparison of a wide range of conditions.
Subjective QA needs to be performed as benchmark, to
validate the results of the objective metrics.
Status

20
Objective QAObjective QA
• Test materialTest material
• Codecs and configuration parametersCodecs and configuration parameters
• Quality metricsQuality metrics
• Selected resultsSelected results

21
Test Material – 24 bpp pictures
(sample pictures from Thomas Richter dataset,
2 different spatial resolutions: 3888x2592, 2592x3888 )
(sample pictures from Microsoft dataset,
6 different spatial resolutions: 4064x2704, 2268x1512, 2592x1944, 2128x2832, 2704x3499, 4288x2848)

22
JPEG XR vs JPEG2000 vs JPEG:
• JPEG XR (DPK version 1.0):
 one level overlapping and two level overlapping.
 4:4:4 and 4:2:0 chroma subsampling.
• JPEG 2000 (Kakadu version 6.0):
 default settings (64x64 code-block size, 1 quality layer, no precincts, 1 tile, 9x7 wavelet, 5 decomposition
levels).
 rate control.
 no visual frequency weighting and visual frequency weighting.
• JPEG (IJG version 6b):
 default settings (Huffman coding).
 default visually optimized quantization tables.
Codecs and configuration parameters

23
Different JPEG XR implementations:
• JPEG XR DPK version 1.0:
 different quantization steps for different color channels (default).
 same quantization steps for different frequency bands (default).
• JPEG XR Reference Software version 1.0:
 same quantization steps for different color channels (default).
 same quantization steps for different frequency bands (default).
• JPEG XR Reference Software version 1.2 ‐ i.e. Thomas Ricther’s version:
 different quantization steps for different color channels (same as DPK).
 different quantization steps for different frequency bands (default).
 new POT (leakage fix described in wg1n4660) (default).
• JPEG XR Microsoft implementation described in HDPn21 / wg1n4549 :
 different quantization steps for different color channels (enhanced encoding techniques
 described in HDPn21 / wg1n4549) (default).
 different quantization steps for different frequency bands (enhanced encoding techniques of HDPn21 /
wg1n4549) (default).
 new POT (leakage fix described in wg1n4660) (default).
Codecs and configuration parameters

24
Metric 1: Maximum Pixel Deviation (Linf)
Linf R= max [abs(ImaR(x,y)-ImbR(x,y))]
• Considering RGB color space:
where:
Ima , Imb = pictures to compare
Linf G= max [abs(ImaG(x,y)-ImbG(x,y))]
Linf B= max [abs(ImaB(x,y)-ImbB(x,y))]
(Linf ∈ [0,1])

25
• PSNR evaluation considering:
– R, G and B components
– Y’, Cb and Cr components (ITU-R Rec. BT.601)
MSE
)12(
log10PSNR
2B
10
−
=
∑∑
= =
−=
M
1y
N
1x
2
ba y)](x,Imy)(x,[Im
MN
1
MSEwhere:
M, N = image dimensions
B= bit depth
Metric 2: single channel PSNR

26
Metric 3: PSNR weighted average (WPSNR)
WPSNR = w1PSNR1 + w2PSNR2 + w3PSNR3
1/3w1/3,w1/3,w 321 ===
• PSNR considering weighted summation of the PSNRs evaluated on
R, G and B components or Y’, Cb and Cr components (ITU-R Rec.
BT.601):
where:
, considering R,G, and B components.
, considering Y’, Cb, and Cr components.
101080 321 .w,.w,.w ===

27
Metric 3: PSNR weighted average (WPSNR_MSE)
)MSEwMSEwMSE(w
1)(2
10log
332211
2B
10
++
−
=WPSNR_MSE
• PSNR considering weighted summation of the MSEs evaluated on R,
G and B components or Y’, Cb and Cr components (ITU-R Rec.
BT.601):
where:
1/3w1/3,w1/3,w 321 ===
101080 321 .w,.w,.w ===

28
Metric 3: PSNR weighted average (WPSNR_PIX)
WPSNR_PIX
• PSNR considering MSE evaluated on weighted summation of the
image R, G and B components:
( ) ( )[ ]∑∑= =
++−++
−
= M
y
N
x
bbbaaa
B
)y,x(Imw)y,x(Imw)y,x(Imw)y,x(Imw)y,x(Imw)y,x(Imw
MN
)(
log
1 1
2
332211332211
2
10
1
12
10
where:
B= bit depth
1/3w1/3,w1/3,w 321 ===
101080 321 .w,.w,.w ===

29
 Estimate of luminance = mean intensity:
Metric 4: Mean SSIM (MSSIM) (I)
∑∑= =
=µ
M
1y
N
1x
)y,xIm(
MN
1
2/1
M
1y
N
1x
2
))y,x(Im(
1MN
1








µ−
−
=σ ∑∑= =
 Estimate of contrast = standard deviation:
σ
µ− )(Im
[1] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli,
“Image Quality Assessment: From Error Measurement
to Structural Similarity” (2004).
 Estimate of picture structure:
• Structural information = “attributes that represent the structure of objects in
the scene, independent of the average luminance and contrast”.

30
Metric 4: Mean SSIM (MSSIM) (II)
1
2
2
2
1
121
21
C
C2
)Im,(Iml
+µ+µ
+µµ
=
2
2
2
2
1
221
21
C
C2
)Im,(Imc
+σ+σ
+σσ
=
[ ] [ ] [ ]γβα
= )Im,(Ims)Im,(Imc)Im,(Iml)Im,(ImSSIM 21212121 )0,0,0( >γ>β>α
 Luminance comparison function: (C1=constant)
 Contrast comparison function: (C2=constant)
 Measure of structural similarity = correlation between and
Structure comparison function:
where
1
11 )(Im
σ
µ−
2
22 )(Im
σ
µ−
321
32,1
21
C
C
)Im,(Ims
+σσ
+σ
=
∑∑= =
µ−µ−
−
=σ
M
1y
N
1x
22112,1 ))y,x()(Im)y,x((Im
1MN
1
(C3=constant)

31
Metric 4: Mean SSIM (MSSIM) (III)
• The SSIM indexing algorithm is applied using a sliding window
approach which results in a SSIM index quality map of the image.
• The average of the quality map is called Mean SSIM index (MSSIM).
• Weighted summation of MSSIM indexes evaluated on Y’, Cb and Cr
components (Y’CbCr color space - Rec. ITU-R BT.601):
MSSIM = wyMSSIMY + wCbMSSIMCb + wCrMSSIMCr
where: .0.1w0.1,w0.8,w CrCbY ===
(MSSIM ∈ [0,1])

32
Metric 5: Visual Information Fidelity – Pixel (VIF-P) (I)
[2] H. R. Sheikh, A. C. Bovik “Image Information And Visual Quality” (2004).
• “Image information measure that quantifies the information that is present in the
reference image and how much this reference information can be extracted from
the distorted image” using statistical approach.
Natural image
(source)
Channel
(distortion)
HVS
HVS
C F
E
 Reference image (E) = output of a stochastic natural source that passes through
HVS channel and is processed by the brain
 Test image (F) = output of an image distortion channel that distorts the output of
the natural source before it passes through the HVS channel

33
Metric 5: Visual Information Fidelity – Pixel (VIF-P) (II)
( )
( )zE,
zF,
VIF
CI
CI
=
 Natural image modeling in wavelet domain using Gaussian scale mixtures (GSMs)
 Information that the brain could ideally extract from reference image =
mutual information between C and E:
 Corresponding information that could be extracted from test image =
mutual information between C and F:
( )zE;CI
 VIF-P is a new implementation in a multi-scale pixel domain:
• computationally simpler than Wavelet domain version.
• performance slightly worse than Wavelet domain version.
( )zF;CI
where:
z= source model parameters.
(VIF ∈ [0,1]
and VIF>1 if the test image is enhanced
version of the original)

34
Metric 6: PSNR-HVS-M (I)
Block 8x8 of
distorted
image
Block 8x8 of
original
image DCT of difference
between pixel
values
Reduction by
value of contrast
masking
MSEH calculation
of the block
[3] N. Ponomarenko, F. Silvestri, K. Egiazarian, M.Carli, J. Astola, and V. Lukin,
“On between-coefficient contrast masking of DCT basis functions” (2007).
• DCT coefficients of 8x8 pixel blocks X and Y are visually undistinguished if:
Ew(X-Y) < max (Em(X), Em(Y))
where Ew(block) is the energy of DCT coefficients of the block weighted according
to CSF and Em(block) is the masking effect of DCT coefficients of the block which
depends upon Ew(block) and upon the local variances.

35
Metric 6: PSNR-HVS-M (II)
H
2B
10
MSE
)12(
log10MHVSPSNR
−
=−−
[ ]∑∑∑∑= = = =
∆=
7-M
1i
7N-
1j
8
1m
28
1n
cijH )n,m(T)n,m(XKMSEwhere:
K= constant
= visible difference between DCT coefficient of the original
image and distorted image 8x8 blocks , depending upon
contrast masking
Tc = matrix of correcting factors based on standard visually optimized
JPEG quantization tables
B= bit depth
ij)n,m(X ∆

36
Metric 7: DC Tune
[4] A. B. Watson, A. P. Gale, J. A. Solomon, and A. J. Ahumada JR.,
“DCTune: A Techinque For Visual Optimization Of DCT Quantization Matrices For
Individual Images” (1994).
 developed as a method for optimizing JPEG image compression by computing the
JPEG quantization matrices which yields a designated perceptual error
 model of perceptual error based upon DCT coefficients analysis, taking into
account:
• luminance masking.
• contrast masking.
• spatial error pooling.
• frequency error pooling.

37
Selected results 4:4:4 – JPEG XR vs JPEG2000 vs JPEG
Average over image dataset of PSNR values
on R component:
bpp (bits/pixel) bpp (bits/pixel) bpp (bits/pixel)
on G component: on B component:

38
on Y’ component: on Cb component: on Cr component:

39
Average over image dataset of WPSNR values
on Y’CbCr components:on RGB components:
bpp (bits/pixel) bpp (bits/pixel)

40
Average over image dataset of WPSNR-MSE values

41
Average over image dataset of WPSNR-PIX values

42
Average over image dataset of MSSIM values
on Y’ component: on Cb component: on Cr component:
bpp (bits/pixel)

43
Average over image dataset of VIF-P values
on Y’ component only:
bpp (bits/pixel)

44
(one level POT)
on R component: on G component: on B component:
Selected results 4:4:4 – different JPEG XR implem.

45
(two levels POT)
on R component: on G component: on B component:

46
(one level POT)
on Y component: on Cb component: on Cr component:

47
(two levels POT)

48
Average over image dataset of WPSNR_MSE values
one level POT: two levels POT:

49
Average over image dataset of MSSIM values (one level POT)
bpp (bits/pixel)

50
Average over image dataset of MSSIM values (two levels POT)
bpp (bits/pixel)

51
Subjective quality assessment
• Subjective quality assessment is the ultimate
proof of efficiency
• JPEG committee is currently carrying out
various subjective quality assessments to
evaluate JPEG XR performance
• This is an essential part of JPEG XR
development which still remains to be
completed

52
Overview of Advanced Image Coding
• JPEG 2000 has become an International
standard since almost a decade
• A natural question to ask is whether after
more than a decade, there is any new
technologies that may bring a significant edge
when compared to JPEG 2000, useful for
applications?
• JPEG committee adopted a two-phase
approach towards AIC standardization.

53
Overview of Advanced Image Coding (AIC)
• Phase 1:
– Define a dataset of typical images
– Define anchors (the best JPEG and JPEG 2000 compressed
images)
– Define objective and subjective evaluation criteria to assess the
performance of any new codec when compared to JPEG, JPEG
2000)
– Identify other useful features needed in applications and define
criteria to evaluate their efficiency.
– The result of Phase 1 will be put in a Technical Report and can
be used by the scientific and technical communities to allow
them to assess their codecs.

54
Overview of Advanced Image Coding
• Phase 2:
– Standardization of an AIC compression algorithm
based on a call for proposals using the evaluation
approach in the technical report of Phase 1.
– Call for evidence has already been issued to make
sure that the evaluation methodologies of phase 1
can cope with new technologies assessments.
– Several inputs already received as potential
technologies (X-lets, non-linear transform, moving
pictures compression, …)

55
AIC positioning vs JPEG, JPEG 2000 and JPEG XR
Complexity
Performance
JPEG
JPEG 2000
JPEG XR
AIC

56
Acknowledgements:
Francesca De Simone (EPFL)
Vittorio Baroncini (FUB)
Thanos Skodras (EAP)
Some of the slides on JPEG XR overview have been adapted from a Microsoft
presentation describing HD Photo to JPEG committee

Next generation image compression standards: JPEG XR and AIC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Next generation image compression standards: JPEG XR and AIC

Similar to Next generation image compression standards: JPEG XR and AIC (20)

More from Touradj Ebrahimi

More from Touradj Ebrahimi (20)

Next generation image compression standards: JPEG XR and AIC

Editor's Notes