compression bo f 2009

27
Astronomical Data Compression: Algorithms & Architectures Rob Seaman – Na#onal Op#cal Astronomy Observatory William Pence – NASA / Goddard Space Flight Center Richard White – Space Telescope Science Ins#tute Séverin Gaudet – Na#onal Research Council Canada Astronomical Data Analysis SoAware & Systems XIX – Sapporo, Japan See also poster 57, “Op#mal Compression Methods for Floa#ngpoint Format Images”, Pence, et al.

Upload: rob13seaman

Post on 02-Jul-2015

118 views

Category:

Documents


2 download

DESCRIPTION

Slides from 2009 Astronomical Data Compression BoF at Sapporo ADASS."Only entropy comes easy" - Chekhov

TRANSCRIPT

Page 1: Compression Bo F 2009

Astronomical  Data  Compression:  Algorithms  &  Architectures  

Rob  Seaman  –  Na#onal  Op#cal  Astronomy  Observatory  William  Pence  –  NASA  /  Goddard  Space  Flight  Center  Richard  White  –  Space  Telescope  Science  Ins#tute  Séverin  Gaudet  –  Na#onal  Research  Council  Canada  

Astronomical  Data  Analysis  SoAware  &  Systems  XIX  –  Sapporo,  Japan  

See  also  poster  57,  “Op#mal  Compression  Methods  for  Floa#ng-­‐point  Format  Images”,  Pence,  et  al.  

Page 2: Compression Bo F 2009

Agenda  

•  Overview  –  Rob  •  Tile  compression  and  CFITSIO  –  Bill  

•  Experiences  with  FITS  compression  in  a  large  astronomical  archive  –  Séverin  

•  Lossy  compression  –  Rick  

•  Open  discussion  •  Door  prize!  

Thanks  to  Pete  Marenfeld  &  Koji  Mukai  

5  October  2009   2  

Page 3: Compression Bo F 2009

Overview  

•  FITS  Ule  compression  •  Rice  algorithm  

•  CFITSIO  /  FPACK  •  IRAF  and  community  soAware  

•  The  ubiquity  of  noise:    opUmal  DN  encoding  

•  The  role  of  sparsity:    compressive  sensing  

•  An  informaUon  theory  example  

5  October  2009   3  

Page 4: Compression Bo F 2009

Overview  

•  FITS  Ule  compression  •  Rice  algorithm  

•  CFITSIO  /  FPACK  •  IRAF  and  community  soAware  

•  The  ubiquity  of  noise:    opUmal  DN  encoding  

•  The  role  of  sparsity:    compressive  sensing  

•  An  informaUon  theory  example  

5  October  2009   4  

Page 5: Compression Bo F 2009

References  

•  Too  many  ADASS  presentaUons  to  list  

•  See  references  within:    “Lossless  Astronomical  Image  Compression  and  the  Effects  of  Noise”,  Pence,  Seaman  &  White,  PASP  v121  n878  2009,  hPp://arxiv.org/abs/0903.2140v1  

5  October  2009   5  

Page 6: Compression Bo F 2009

FITS  Ule  compression  

•  ADASS  1999  (Pence,  White,  Greenfield,  Tody)  •  FITS  ConvenUon  v2.1,  2009  •  Images  mapped  onto  FITS  binary  tables  

•  Headers  remain  readable  

•  Tiling  permits  rapid  RW  access  

•  Supports  mulUple  compression  algorithms  

•  First  &  every  copy  can  be  compressed  5  October  2009   6  

Page 7: Compression Bo F 2009

Rice  algorithm  

•  Fast  (difference  coding)  – near  opUmum  compression  raUo  

–  throughput  is  key,  not  just  storage  volume  

•  Numerical,  not  character-­‐based  like  gzip  

•  Depends  on  pixel  value  so  BITPIX  =  32  compresses  to  same  size  as  BITPIX  =  16  

5  October  2009   7  

Page 8: Compression Bo F 2009

CFITSIO  /  FPACK  

•  fpack  can  be  swapped  in  for  gzip    &  funpack  for  gunzip  

•  Library  support  (eg,  CFITSIO)  allows  jpeg-­‐like  access  –  compression  built  into  the  format  

•  More  opUons  means  more  parameters  –  

 sejng  appropriate  defaults  is  key  

5  October  2009   8  

Page 9: Compression Bo F 2009

IRAF  and  community  soAware  

•  Tile  compression  can  &  should  be  supported  by  all  soAware  that  reads  FITS  

•  Instrument  and  pipeline  soAware  may  benefit  strongly  from  wri#ng  compressed  FITS  

•  Transport  &  storage  always  benefit  •  IRAF  fitsuUl  package  in  beta  tesUng  •  Work  on  a  new  IRAF  FITS  kernel  pending  

•  VO  applicaUons  and  services  5  October  2009   9  

Page 10: Compression Bo F 2009

The  ubiquity  of  noise  

•  Noise  is  incompressible  •  Signals  are  correlated  – physically  –  instrumentally  

•  Shannon  entropy:    H  =  –  Σ  p  log  p  – depends  only  on  the  probabiliUes  of  the  states  – measures  “irreducible  complexity”  of  the  data  

5  October  2009   10  

Page 11: Compression Bo F 2009

OpUmal  DN  encoding  

•  CCD  “square-­‐rooUng”  •  Variance  stabilizaUon,  more  generally  – many  staUsUcal  methods  assume  homoscedasUcity  

– generalized  Anscombe  transform  

•  FoundaUons  of  the  empirical  world  view:  – ergodicity  (staUsUcal  homogeneity)  

– Markov  processes  (memoryless  systems)  •  hPp://www.aspbooks.org/publica#ons/411/101.pdf  5  October  2009   11  

Page 12: Compression Bo F 2009

The  role  of  sparsity  

•  For  most  astronomical  data,  compression  raUo  depends  only  on  the  background  noise  – Sparse  signals  are  negligible  (in  whatever  axes)  – Noise  is  incompressible  

 R  =  BITPIX  /  (Nbits  +  K)    K  is  about  1.2  for  Rice  

5  October  2009   12  

Page 13: Compression Bo F 2009

Compression  raUo  

16  Dec  2008  

R  

Noise  bits  

Mosaic  II   Compression  correlates  closely  with  noise  

DisUncUve  funcUonal  behavior  

For  three  very  different  comp.  algorithms  

For  flat-­‐field  and  bias  exposures  as  well              as  for  science  data  

That  is,  for  pictures  of:  the  sky                                                        a  lamp  in  the  dome    no  exposure  at  all  

Signal  doesn’t  maser!  

Page 14: Compression Bo F 2009

A  beser  compression  diagram  

16  Dec  2008  

– 8 –

0 5 10 15

0

5

10

15

Com

pres

sed

bits

per p

ixel

Nbits Noise

01234K

R1.2

1.5

2.0

4.0

Fig. 1.— Plot of compressed bits per pixel versus the number of noise bits in 16-bit synthetic

images. The solid lines are generated from the images that have Nbits of pure noise, and the

circles or triangles are generated from the images that have Gaussian distributed noise, where

Nbits is calculated from equation 4. The solid circles are for Rice compression, the open circle for

Hcompress, and the triangle are for GZIP. The dashed lines show the compression ratio, R, and

the dotted lines have a slope = 1.0, with K o!sets ranging from 0 to 4.

 R  =  BITPIX  /  (Nbits  +  K)  

 EffecUve  BITPIX:  

   BITEFF  =  BITPIX  /  R  

   BITEFF  =  Nbits  +  K  

 Line  with:    Slope  =  1  

 Intercept  =  K  

GZIP  

Rice  Hcompress  

Lossy  algorithms  

shi^  le^  to  move  down  

Margin  for  improvement  

Page 15: Compression Bo F 2009

Compressive  sensing  •  Real  world  data  are  oAen  sparse  (correlated)  •  Nyquist/Shannon  sampling  applies  broadly  

•  But  we  can  do  even  beser  if  we  sample  against  purpose-­‐specific  axes:  

hPp://www.dsp.ece.rice.edu/cs  hPp://nuit-­‐blanche.blogspot.com  

•  Herschel  proof  of  concept,  Starck,  et  al.  •  CS  is  about  the  sampling  theorem  

•  OpUmal  encoding  is  about  quanUzaUon  5  October  2009   15  

Page 16: Compression Bo F 2009

An  informaUon  theory  example  

5  October  2009   16  example of objective

About the Coins

If you like this game send feedbackemail: Joe

hPp://www.m

apsofcon

sciousness.com

/12coins  

Page 17: Compression Bo F 2009

Compression  =  opUmal  representaUon  

A. 11  coins  all  the  same    +  1  coin,  idenUcal  except  for  weight  

B.  Scale  to  weigh  groups  of  coins  C.  In  only  3  steps,  must  idenUfy:  

 the  coin  that  is  different  and    whether  it  is  light  or  heavy  

“The  12-­‐balls  Problem  as  an  IllustraUon  of  the  ApplicaUon  of  InformaUon  Theory”  –  R.H.  Thouless,  1970,  Math.  GazePe,  v54n389.  

5  October  2009   17  

Page 18: Compression Bo F 2009

How  to  solve  a  problem  

•  First,  define  the  problem  –  second,  entertain  soluUons  –  third,  iterate  (don’t  give  up)  

•  More  basic  yet,  what  is  the  goal?  –  to  solve  the  problem?  

–  or  to  understand  how  to  solve  it?  

•  StaUng  a  problem  constrains  its  soluUons  5  October  2009   18  

Page 19: Compression Bo F 2009

What  do  we  know?  

•  One  bit  discriminates  two  equally  likely  alternaUves    To  select  between  N  equal  choices,  Nbits  =  log2  N  

•  For  12-­‐coin  problem,  Nbits  =  log2  (12)  +  1  =  log2  24    (must  also  disUnguish  light  vs.  heavy)  

•  InformaUon  provided  in  each  measurement  is  log2  3    (3  posiUons  for  scale:  le^,  right,  balanced)  

•  For  three  weighings,  Wbits  =  log2  33  =  log2  27    Meets  necessary  condiUon  that  Wbits  >=  Nbits  

5  October  2009   19  

Page 20: Compression Bo F 2009

Necessary,  but  not  sufficient  

•  A  strategy  is  also  necessary  such  that        Wbits  >=  Nbits    (remaining)  

 is  saUsfied  at  each  step  to  the  soluUon    

•  Nbits  is  the  same  thing  as  the  entropy  H        H  =  –  Σ  p  log  p        where      p  =  1/N              =  –  Σ  (1/N)  log  (1/N)  =  (Σ  (1/N))  log  N  =  log  N      H  =  log2  N      (in  bits)  

5  October  2009   20  

Page 21: Compression Bo F 2009

What  else  do  we  know?  

•  Physical  priors!  – only  one  coin  is  fake  – astronomical  data  occupy  sparse  phase  space  

•  FITS  arrays  =  images  (physical  priors)  – of  astrophysical  sources  –  taken  through  physical  opUcs  –  recorded  by  physical  electronics  – digiUzaUon  is    restricted  by  informaUon  theory  – possessing  a  disUncUve  noise  model  

5  October  2009   21  

Page 22: Compression Bo F 2009

5  October  2009   22  

example of objective

About the Coins

If you like this game send feedbackemail: Joe

Page 23: Compression Bo F 2009

5  October  2009   23  

example of objective

About the Coins

If you like this game send feedbackemail: Joe

Page 24: Compression Bo F 2009

5  October  2009   24  

example of objective

About the Coins

If you like this game send feedbackemail: Joe

Page 25: Compression Bo F 2009

5  October  2009   25  

example of objective

About the Coins

If you like this game send feedbackemail: Joe

Page 26: Compression Bo F 2009

ObservaUons  about  observaUons  

•  The  sequence  of  three  measurements  can  occur  in  any  order  

•  The  systemaUzaUon  of  the  soluUon  occurs  during  its  definiUon,  not  at  run  Ume  

5  October  2009   26  

Page 27: Compression Bo F 2009

Try  it  yourself  

hPp://heasarc.gsfc.nasa.gov/fitsio/fpack  

hPp://www.mapsofconsciousness.com/12coins  

5  October  2009   27