astroinformatics 2015: large sky surveys: entering the era of software-bound astronomy

Post on 14-Apr-2017

543 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015Name of Meeting • Location • Date - Change in Slide Master

Large Sky Surveys:Entering the Era of Software-Bound Astronomy

Mario JuricWRF Data Science Chair in Astronomy, University of WashingtonLSST Data Management Project Scientist

ASTROINFORMATICS 2015Dubrovnik, Croatia, October 7 th, 2015

2ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

• Large Survey and Why They’re Different

Hipparchus of Rhodes (180-125 BC)

Discovered the precession of the equinoxes.

Measured the length of the year to ~6 minutes.

In 129 BC, constructed one of the first star catalogs, containing about 850 stars.

n.b.: also the one to blame for the magnitude system …

Galileo Galilei (1564-1642)

Researched a variety of topics in physics, but called out here for the introduction of the Galilean telescope.

Galileo’s telescope allowed us for the first time to zoom in on the cosmos, and study the individual objects in great detail.

The Astrophysics Two-Step

• Surveys– Construct catalogs and maps of objects in the sky. Focus

on coarse classification and discovering targets for further follow-up.

• Large telescopes– Acquire detailed observations of a few representative

objects. Understand the details of astrophysical processes that govern them, and extrapolate that understanding to the entire class.

Analogy: Google Search

Google’s index is a catalog of the Web. We use it to “zoom in” on individual entries to find out more.

But, it’s more than just a catalog of pointers – more and more, Google itself collects, processes, indexes, visualizes, and serves the actual information we need.More and more often, our “research” begins and ends with Google!

Entering the Era of Massive Sky Surveys

• There’s a close parallel with large surveys in astronomy, in scale, quality, and richness of the collected information

– Scale: We’re entering the era when we can image and catalog the entire sky– Quality: Those catalogs will be as precise as the measurements taken with

“pointed” observations (used to be ~5-10x worse)– Richness: Those catalogs contain not only positions and magnitudes, but also

shapes, profiles, and temporal behavior of the objects.

• Quite often, the research can begin and end with the survey.• This is what makes large surveys of today not just bigger, but better;

different. They’re much more than just “finding charts”.

Sloan Digital Sky Survey2.5m telescope >10000 deg2 0.1” astrometry r<22.5 flux limit

5 band, 2%, photometry for >50M stars>300k R=2000 stellar spectra

10 years of ops: ~10 TB of imaging

SDSS DR6 Imaging Sky Coverage(Adelman-McCarthy et al. 2008)

Panoramic Survey Telescope and Rapid Response System

1.8m telescope 30000 deg2 50mas astrometry r<23 flux limit

5 band, better than 1% photometry (goal)

~700 GB/night

13Astroinformatics 2015 • Dubrovnik, Croatia • October 7th, 2015

LSST: A Deep, Wide, Fast, Optical Sky Survey

8.4m telescope 18000+ deg2 10mas astrom. r<24.5 (<27.5@10yr)

ugrizy 0.5-1% photometry

3.2Gpix camera 30sec exp/4sec rd 15TB/night 37 B objects

Imaging the visible sky, once every 3 days, for 10 years (825 revisits)

14ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Turning the Sky into a Database

− A wide (half the sky), deep (24.5/27.5 mag), fast (image the sky once every 3 days) survey telescope. Beginning in 2022, it will repeatedly image the sky for 10 years.

− The LSST will be an automated survey system. In nighttime, the observatory and the data system will operate with minimum human intervention.

− The ultimate deliverable of LSST is not the telescope, nor the instruments; it is the fully reduced data.• All science will be come from survey catalogs and images

Telescope Images Catalogs

15ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

History

1996-2000 “Dark Matter Telescope”This project began as a quest to understand cosmology and the Solar System.

2000 - … “LSST”Emphasizes a broad range of science from the same multi-wavelength survey data, including unique time domain exploration

We’ve reached the scale where a single telescope, a single general purpose data set, can serve to answer a wide swath of science questions

The evolution of LSST design

LSST: Evolution of Design and Purpose

16ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Frontiers of Survey Astronomy (1/4)

− Time domain science• Nova, supernova, GRBs • Source characterization • Instantaneous discovery

− Census of the Solar System• NEOs, MBAs, Comets• KBOs, Oort Cloud

− Mapping the Milky Way• Tidal streams• Galactic structure

− Dark energy and dark matter• Strong Lensing• Weak Lensing• Constraining the nature of dark energy

17ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Exposure 1

Exposure 2

Exposure 1-

Exposure 2

Frontiers of Survey Astronomy (2/4)

− Time domain science • Nova, supernova, GRBs • Source characterization • Instantaneous discovery

− Census of the Solar System• NEOs, MBAs, Comets• KBOs, Oort Cloud

− Mapping the Milky Way• Tidal streams• Galactic structure

− Dark energy and dark matter• Strong Lensing• Weak Lensing• Constraining the nature of dark energy

18ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Frontiers of Survey Astronomy (3/4)

− Time domain science • Nova, supernova, GRBs • Source characterization • Instantaneous discovery

− Census of the Solar System• NEOs, MBAs, Comets• KBOs, Oort Cloud

− Mapping the Milky Way• Tidal streams• Galactic structure

− Dark energy and dark matter• Strong Lensing• Weak Lensing• Constraining the nature of dark energy

RR Lyrae limit

MS starslimit

Dwarf Galaxies

19ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Frontiers of Survey Astronomy (4/4)

− Time domain science • Nova, supernova, GRBs • Source characterization • Instantaneous discovery

− Census of the Solar System• NEOs, MBAs, Comets• KBOs, Oort Cloud

− Mapping the Milky Way• Tidal streams• Galactic structure

− Dark energy and dark matter• Strong lensing• Weak lensing• Constraining the nature of dark energy

Astroinformatics 2015 • Dubrovnik, Croatia • October 7th, 2015

Location: Cerro Pachon, Chile

Leveling of El Peñón (the summit of Cerro Pachón)

21ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

LSST Site (April 14th, 2015)

22ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

LSST Observatory (cca. late ~2018)

23ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Done!

24ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

LSST Camera

Parameter ValueDiameter 1.65 mLength 3.7 mWeight 3000 kgF.P. Diam 634 mm

1.65 m5’-5”

– 3.2 Gigapixels– 0.2 arcsec pixels– 9.6 square degree FOV– 2 second readout– 6 filters

25ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

LSST’s #1 Challenge:

Processing the Data in a way that Enables Science Effectively

26ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

From Data to Knowledge

Model inference – Data

And metadata!

Model inference – Catalog Data Processing – Data

ProjectScientistsScientists

Scientists Project Project ProjectScientists

Computationally (and cognitively) expensive, science-case speciific

Computationally cheaper,Easier to understand,Science-case speciific

• Computationally expensive, general• Reprojection; may or may not involve

compression• Almost always introduces some

information loss• Data Processing == Instrumental

Calibration + Measurement

27ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Guiding Principles for LSST Data Products

− There are virtually infinite options on what quantities (features) one can measure on images. But if catalog generation is understood as a (generalized) cost reduction tool, the guiding principles become easier to define:

1. Maximize science enabled by the catalogs- Working with images takes time and resources; a large fraction of LSST

science cases should be enabled by just the catalog.- Be considerate to the user: provide even sub-optimal measurements if

they will enable leveraging of existing experience and tools2. Minimize information loss

- Provide (as much as possible) estimates of likelihood surfaces, not just single point estimators

3. Provide and document the transformation (the software)- Measurements are becoming increasingly complex and systematics

limited; need to be maximally transparent about how they’re done

28ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

What LSST will Deliver: A Data Stream, a Database, and a (small) Cloud

− A stream of ~10 million time-domain events per night, detected and transmitted to event distribution networks within 60 seconds of observation.

− A catalog of orbits for ~6 million bodies in the Solar System.

− A catalog of ~37 billion objects (20B galaxies, 17B stars), ~7 trillion single-epoch detections (“sources”), and ~30 trillion forced sources, produced annually, accessible through online databases.

− Deep co-added images.

− Services and computing resources at the Data Access Centers to enable user-specified custom processing and analysis.

− Software and APIs enabling development of analysis codes.

Level 3Level 1

Level 2

29ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Level 1: Transient Alerts

− LSST will generate ~10k time-domain events per observation (average)• Includes anything that changed wrt. deep template (variables, explosive

transients, asteroids, etc.)

− Planning to measure and transmit with each alert:• position• PSF flux, aperture flux(es), trailed model fits• shape (adaptive moments)• light curves in all bands (up to a ~year; stretch: all)• variability characterization (eg., low-order light-curve moments,

probability the object is variable; Richards et al. 2011)• cut-outs centered on the object (template, difference image)

30ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Level 2: Data Release Catalogs

− Object characterization (models):• Moving Point Source model• Double Sérsic model (bulge+disk)

- Maximum likelihood peak- Samples of the posterior (hundreds)

− Object characterization (non-parametric):• Centroid: (α, δ), per band• Adaptive moments and ellipticity

measures (per band)• Aperture fluxes and Petrosian and Kron

fluxes and radii (per band)− Colors:

• Seeing-independent measure of object color

− Variability statistics:• Period, low-order light-curve moments,

etc.

LSST Science Book, Fig. 9.3

31ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

The DPDD Captures the Details of this Thinking

LSST Data Products Definition Document

A document giving a high-level description of LSST data products.

http://ls.st/dpdd

Level 1 Data Products: Section 4.

Level 2 Data Products: Section 5.

Level 3 Data Products: Section 6.

Special Programs DPs: Section 7.

Questions: http://community.lsst.orgChange Requests: https://github.com/lsst/data_products

32ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Looking Ahead: Astronomical Data Processing in the 2020ies

(or why software is important)

33ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Astronomy was hardware-limited

− Astronomical data processing software is a lesson in (intelligent) cutting of corners. We (generally) didn’t need particularly sophisticated codes so far.

− We were lucky because typically• a) there was no data,• b) the data was poor, or• c) there wasn’t enough computing power available to process it right.

− Very frequently, all of the above.

− Simple algorithms were sufficient. The challenge was in the hardware.

34ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Astronomy is becoming software-bound

− Things are changing at a rapid pace.

− We can collect amazing quantities of data.− Telescopes and cameras built today are extremely well characterized and

calibrated instruments.− Computing has reached a level where more complex algorithms are

feasible. It’s also getting cheap.• Also, lagging memory bandwidths are forcing us towards more complex

algorithms.

− The science requires it. The alternatives (“build a bigger telescope!”) are incredibly costly.

− Time to develop software is becoming >= time to build (astronomical) hardware

35ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

LSST Science Pipelines

− 02C.01.02.01/02. Data Quality Assessment Pipelines

− 02C.01.[02.01.04,04.01,04.02] Calibration Pipelines− 02C.03.01. Single-Frame

Processing Pipeline− 02C.03.02. Association pipeline− 02C.03.03. Alert Generation

Pipeline− 02C.03.04. Image Differencing

Pipeline− 02C.03.06. Moving Object Pipeline− 02C.04.03. PSF Estimation Pipeline− 02C.04.04. Image Coaddition

Pipeline− 02C.04.05. Deep

Detection Pipeline− 02C.04.06. Object

Characterization Pipeline− 02C.01.02.03. Level 3 Toolkit

− 02C.03.05/04.07 Core Libraries (afw)

Leve

l 1Le

vel 2

L3

Data Management Applications Design (LDM-151)

+ middleware, databases, user interfaces, computing+storage hardware, networking, etc…

36ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Example: Multi-Epoch Measurement via Coadds

Exposure 1

Exposure 2

Exposure 3

Coadd

Galaxy / Star Models

FittingWarp,

Convolve

Coadd Measurement

Hard, but we only have to do it once.

Easy; relatively few data points.

37ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Except that you should not do that…

− Impossible to combine multi-epoch data taken in different seeings/exposure times without information loss• Suboptimal S/N

− Warping and resampling correlates pixel values and noise; correlation matrices are (practically) impossible to carry forward.• Source of systematic error

− Detector effects have to be taken out at the pixel level• Further correlates the noise

− The effective bandpass changes from exposure to exposure.• Coadding different sorts of apples…

− Stars move!

38ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Multi-Epoch Measurement with Forward Modeling

Exposure 1

Exposure 2

Exposure 3

Galaxy / Star

Models

Transformed Model 1

Transformed Model 2

Transformed Model 2

Warp, Convolve

Fitti

ng

MultiFit (Simultaneous Multi-Epoch Fitting)

Easier (depends on model), but we have to do it every iteration!

Same number of parameters, but with orders of magnitude

more data points.

39ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Detecting and Estimating Proper Motions Below the Single-Epoch Flux Limit

Optimal measurement of properties of objects imaged in multiple epoch. Left: extraction of a moving point source (Lang 2009).

Individual exposures: objects are undetected or marginally detected

Moving point-source and galaxy models are indistinguishable on the coadd

40ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Example: Sampling and retaining the Likelihoods

Perform importance sampling from a proposal distribution determined on the coadd. Plan to characterize (and keep!) the full posterior for each object. (Unexplored) possibilities for compression.

41ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Looking Ahead: Astrophysical Inference in the 2020s

(or why software is even more important than you think)

42ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Pushing the Boundaries of Optimal Inference

− As our measurements become more and more systematics limited, what occurs in the “Data Processing” box above becomes incredibly import

Model inference – Data

Model inference – Catalog Data Processing – Data

− Sometimes, an assumption or an algorithmic choice that’s been made there may introduce a systematic that drowns out the signal (or eliminates it).

− For optimal inference, one wants to design measurements that directly probe the relevant aspects of the original (imaging data), and not the (lossy-compressed) catalog.

• Or derive more appropriate catalogs/feature sets/etc.

43ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

− Reasons we don’t do this today:1. Computationally (and I/O) intensive2. Conceptually difficult

- Expertize in statistics, applied math, and software engineering is often not there- Catalogs are too often taken as “God given”, fundamental, result of a survey

− Things are changing• Big data problems are becoming increasingly computationally tractable• Average astronomer in the 2020s will grow up with an expectation of being well

versed in Stats, SE, Appl. Math.• A concerted effort is under way, primarily driven by people in large survey and

telescope projects, to create the necessary software to make this possible.

Model inference – Data

Model inference – Catalog Data Processing – Data

Pushing the Boundaries of Optimal Inference

44ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Astronomy 2025: Personalized Medicine

− LSST “Level 3” concept our first step in that direction: Enabling the community to create new products using LSST’s software, services, or computing resources. This means:• Providing the software primitives to construct custom

measurement/inference codes• Enabling the users to run those codes at the LSST data center,

leveraging the investment in I/O (piggyback onto LSST’s data trains).

− Looking ahead: Right now, we see the data releases as the key product of a survey. By the end of LSST, I wouldn’t be surprised if we saw the software as the key product, with hundreds specialized (and likely ephemeral) catalogs being generated by it.

− LSST “data releases” will just be some of those catalogs, designed to be more broadly useful than others, and retained for a longer period of time.

− LSST software software and hardware is being engineered to make this possible.

45ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Astronomy in the Age of Large Surveys

− Traditionally, astronomy was a data-starved science. Our methods and approach to research were shaped by this environment. Surveys are altering it; data is becoming abundant, well characterized, and rich.

− LSST is a poster-child of this transformation: it will deliver the positions, magnitudes and variability information for virtually everything in the southern sky to 24th-27th magnitude.

− In that environment success in research will depend on the ability to mine knowledge from that, existing, data. A bigger instrument is not a cost-effective solution any more. A Stage V DETF experiment may be a software project.

− This will re-ignite the drive behind VO-like ideas, though emphasis will shift to algorithms, tools, and reusable, collaborative, organically grown software frameworks. Projects like AstroPy and LSST are already pushing on this.

− Need is the mother of invention, and she’s here.

46ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

top related