databases unplugged: challenges in ubiquitous data management

46
Databases Unplugged: Databases Unplugged: Challenges in Challenges in Ubiquitous Data Ubiquitous Data Management Management Michael Franklin UC Berkeley

Upload: porter

Post on 25-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Databases Unplugged: Challenges in Ubiquitous Data Management. Michael Franklin UC Berkeley. “Gazillions of Gizmos”. “In ten years, billions of people will be using the Web, but a trillion "gizmos" will also be connected to the Web.” Asilomar Rep. on DB Research, Dec. 1998 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Databases Unplugged: Challenges in Ubiquitous Data Management

Databases Unplugged:Databases Unplugged:Challenges in Ubiquitous Challenges in Ubiquitous

Data ManagementData ManagementMichael Franklin

UC Berkeley

Page 2: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 2

““Gazillions of Gizmos”Gazillions of Gizmos” “In ten years, billions of people will be using the

Web, but a trillion "gizmos" will also be connected to the Web.” Asilomar Rep. on DB Research, Dec. 1998

You’ve heard it before… Smartphones, PDAs, Smartcards, badges,

wearables, lightswitches, toasters, … Worldwide sales of Internet-enabled appliances

projected to grow from 5.9M units in 1998 to 55.7M units in 2002. IDC via H&Q report

Page 3: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 3

An Explosion in ScaleAn Explosion in Scale

Distribution

Personalization

More

Less

Less More

Batch RJE

Time Sharing

WS/Server

PC + Network

Many peopleper computer

One personper computer

Many computersper person

InformationAppliances

Scaled downPCs, desktop

metaphor

(Picture is by way of Randy Katz)

Page 4: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 4

Technical ChallengesTechnical Challenges Disconnection/Weak Connection

Standard distributed database techniques break down. Limited resources

Memory, CPU, Power, User Interface, Bandwidth Movement/Location

Killer Mobile apps use current and future locations. Scale

Number and diversity of devices. Reliability - Palm Pilots don’t bounce.

Page 5: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 5

But, is Mobile Data Mgmt Needed?But, is Mobile Data Mgmt Needed? “Fundamentally, the ability to access all

information from anywhere and have ONE unified and synchronized information repository is critical to making appliances useful.” Hambrecht and Quist, iWord , March 1999

“All these information appliances have internal data that "docks" with other data stores. Each gizmo is a candidate for database system technology, because most will store and manage some information.” Asilomar Report

Page 6: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 6

Road MapRoad Map Motivation Alternative scenarios for mobile Databases Technical/Research challenges Some solutions

Consistency Data Dissemination Data Recharging

Conclusions

Page 7: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 7

How Will it Happen?How Will it Happen?

SQL engine on the device (largely standalone)

Extension of enterprise infrastructure Data Collection (device to infrastructure) Data Dissemination (infrastructure to

device) PIM-driven information assistant

Alternatives

Page 8: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 8

SQL Engine on the DeviceSQL Engine on the Device Reasonable for Palmtop — but probably not the

toaster or light-switch… Stand-alone with occasional synchronization. Footprint versus functionality

Engine can be made surprisingly small (10-100s KB). Sybase uses “take what you need” library approach

All major vendors are playing in this space: Oracle Lite, Sybase SQL Anywhere, Informix/Cloudscape, DB2 Oracle Lite, Sybase SQL Anywhere, Informix/Cloudscape, DB2

for the Workpad, SQL Server for Windows CEfor the Workpad, SQL Server for Windows CE But, what is the killer app???

Page 9: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 9

Extension of EnterpriseExtension of Enterprise Logical Progression?

Mainframe->Desktop->Palm ERP-> Palm

Device becomes the endpoint of the enterprise infrastructure (queries and updates).

This is happening but must take into account fundamental limitations of the mobile platforms.

Again, examples exist, but the killer app has not yet emerged here.

Page 10: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 10

Data Collection DevicesData Collection Devices Inventory Management/Tracking/Sensors/Census Examples: Symbol technologies --- Palm with a

bar code scanner; more futuristic: smart dust. Asymmetric (device to server) data flow/usage

dictates system architecture. Many applications exist, but no clear need for full

function DBMS on the device. Server-side DB must handle data streams

Page 11: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 11

Data DisseminationData Dissemination Many Potential Apps

stock and sports tickers traffic information systems software distribution news and/or entertainment delivery

Asymmetric (server to devices) data flow/usage dictates system architecture.

No clear need for full function DBMS on the device, but intelligent caching and filtering on device is crucial.

Page 12: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 12

Personal Information ManagementPersonal Information Management PIM is the killer app for mobile devices. So, use PIM to drive the data management

architecture. Example: IBM’s Active Calendar

Calendar provides semantic information on what information will be needed when (and where).

Use this information to pre-stage information from the fixed infrastructure.

This seems to be the most promising approach for driving device DB functionality.

Page 13: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 13

Research IssuesResearch Issues Transactions (not likely) and Consistency. Distribution of function

how to split query functionality? adaptive??

New Querying and Access Models info filtering and dissemination location centric/movement triggers/pervasive (invasive?) computing Evidence Accrual – killer app: dating game

Availability and Recovery

Page 14: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 14

Data Caching and ConsistencyData Caching and Consistency How to keep distributed data consistent? Centralized algorithms require connectivity at

specific times. Alternative: Epidemic Algorithms (Peer-to-peer)

Conflict detection: timestamps, version vectors,… Conflict Handling (update commitment):

OptimisticOptimistic (resolution) - Manual except in limited (resolution) - Manual except in limited domains,domains,

PessimisticPessimistic (avoidance) - primary copy, write- (avoidance) - primary copy, write-all or voting-based.all or voting-based.

Page 15: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 15

Epidemic Protocol IllustrationEpidemic Protocol Illustration(Picture is by way of Ugur Cetintemel)

Page 16: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 16

Deno - Cetintemel and KeleherDeno - Cetintemel and KeleherPessimistic, Asynchronous (epidemic), voting-based“Bounded” weighted-voting:

Each replica is assigned a currency ci s.t. 0 ci 1.0 Total currency in the system is bounded, i.e., ci=1.0 Currency can be re-distributed for optimization or planned disconnection.

An update’s life: Sites issue tentative updates Updates and votes are propagated in a pair-wise fashion Updates gather votes as they pass through sites An update commits when it gathers plurality of votes

Page 17: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 17

Decentralized Update CommitmentDecentralized Update Commitment An update u wins an election

with plurality A site s maintains:

votes(u): the sum of votes u gained so far

unknown: the sum of votes unknown to s

(i.e., 1.0 – votes(u), for u) u commits iff for all u’ <> u,

votes(u) > votes(u') + unknown and

votes(u) > unknown

Issues: time to commit; abort rates

s1 Oi

(s1, 0.20, u1)

votes(u1) = 0.20unknown = 0.80

(s1, 0.20, u1)(s5, 0.20, u1)

votes(u1) = 0.40

unknown = 0.60

(s1, 0.20, u1)(s5, 0.20, u1)(s6, 0.15, u2)

votes(u1) = 0.40votes(u2) = 0.15unknown = 0.45

(s1, 0.20, u1)(s5, 0.20, u1)(s6, 0.15, u2)(s2, 0.15, u1)votes(u1) = 0.55votes(u2) = 0.15unknown = 0.30

u1 commits!

s1 Oi

(s1, 0.20, u1)

votes(u1) = 0.20

unknown = 0.80

(s1, 0.20, u1)

(s4, 0.20, u2)votes(u1) = 0.20votes(u2) = 0.20

unknown = 0.60

(s1, 0.20, u1)

(s4, 0.20, u2)(s6, 0.25, u3)votes(u1) = 0.20votes(u2) = 0.20votes(u3) = 0.25unknown = 0.35

(s1, 0.20, u1)

(s4, 0.20, u2)(s6, 0.25, u3)(s2, 0.25, u2)

votes(u1) = 0.20votes(u2) = 0.45votes(u3) = 0.25unknown = 0.10u2 commits!

Page 18: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 18

Semantic Caching - Dar et al.Semantic Caching - Dar et al. Idea: Maintain description of cache contents as a set of

logical predicates rather than a list of items. Potential advantages:

Less overhead with no need for static clustering (reduces bandwidth requirements).

Describe missing items with logical remainder query. Application/Environment specific replacement functions ---

e.g. considering direction and velocity. Issues:

controlling complexity of cache descriptions interacting with real database systems

Page 19: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 19

Dissemination-Based Info Sys Dissemination-Based Info Sys (DBIS)(DBIS)

1) Push vs. Pull is just one dimension along which to compare data delivery mechanisms.- We’ve identified three.

2) Different mechanisms for data delivery can (and should) be applied at different points in the system.- Select components from toolkit.

Franklin and Zdonik - Framework in OOPSLA 97,Toolkit description and demo in SIGMOD 99.

Page 20: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 20

DBIS FrameworkDBIS Framework An architecture that combines data delivery

techniques for responsive client access. 3 types of nodes:

Data sources Clients Information brokers (can add value)

Any data delivery mode can be used. Network transparency

Possibly dynamic.

Page 21: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 21

Delivery OptionsDelivery Options

PushPull

Aperiodic Periodic

Unicast 1-to-n Unicast 1-to-n

Aperiodic Periodic

Unicast 1-to-n Unicast 1-to-nrequest/response

request/responsew/snoop

polling pollingw\snoop

Email lists

publish/subscribe

Emaillistdigests

Broad-castdiskspublish/

subscribe

Page 22: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 22

Network TransparencyNetwork TransparencyClients Brokers Sources

The type of a link matters The type of a link matters only only to nodes on each endto nodes on each end

Page 23: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 23

DBIS ExampleDBIS Example

1-to-n pushServerDB

Proxy cache

An example:

Can vary dynamically

Unicast pull

Proxy cache

Proxy cache

Unicast pull

Unicast pull

Page 24: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 24

DBIS Research IssuesDBIS Research Issues Each data delivery mechanism has unique aspects

Broadcast Disks - sched., caching, prefetching,updates On-demand Broadcast -scheduling, data staging Publish/Subscribe-large-scale filtering, channelization

Security/Fault-tolerance/Reliability End-to-End network design and control Fundamental performance tradeoffs Exploiting existing and emerging technologies

Page 25: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 25

““Data Recharging”Data Recharging” Mobile devices require 2 resources: power and data

It is impractical to be continuously connected to fixed sources of these.

Devices cope with disconnection using caching: Power cached in rechargeable batteries Data cached in hot-synched memory

Ideal: make recharging data as simple as power: Anywhere (with adapters), anytime, flexible connection

duration Joint work w/ Mitch Cherniack and Stan Zdonik getting

underway

Page 26: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 26

Data Recharging - Research Data Recharging - Research AgendaAgenda

Profile Definition and Maintenance Update Storage and Preparation Efficient integration of "recharge" updates with

existing cached data. Recharge, Trickle Charge, Jump Start...

Consistency Guarantees Global Data Staging Approaches will be driven by (mostly PIM) applications.

Page 27: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 27

ConclusionsConclusions Lots of plausible/useful Mobile data architectures.

For many, the applications exist today Each has its own set of fascinating research

opportunities. PIM is the killer app for mobile data access.

It can be used to drive the integration with enterprise and Internet data sources.

Successful MDA work lies at the intersection of communications and data management rather than exclusively in either camp.

Page 28: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 28

The Data Flood is RealThe Data Flood is Real

Source: J. Porter, Disk/Trend, Inc.http://www.disktrend.com/pdf/portrpkg.pdf

0500

10001500200025003000350019

88

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

Year

Petabytes

Sales

Moore'sLaw

Page 29: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 29

Disk Appetite, cont.Disk Appetite, cont. Greg Papadopoulos, CTO Sun:

Disk sales doubling every 9 months

Note: only counts the data we’re saving!

Translate: Time to process all your data doubles every 18 months MOORE’S LAW INVERTED!

(and Moore’s Law may run out in the next couple decades?)(and Moore’s Law may run out in the next couple decades?)

Big challenge (opportunity?) for SW systems research Traditional scalability research won’t help

““Ideal” linear scaleup is NOT NEARLY ENOUGH!Ideal” linear scaleup is NOT NEARLY ENOUGH!

Page 30: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 30

Data Volume: PrognosticationsData Volume: Prognostications Today

SwipeStream E.g. Wal-Mart 24 Tb Data WarehouseE.g. Wal-Mart 24 Tb Data Warehouse

ClickStream Web

Internet Archive: ?? TbInternet Archive: ?? Tb Replicated OS/Apps

Tomorrow Sensors Galore DARPA/Berkeley “Smart Dust”

Note: the privacy issues onlyget more complex! Both technically and ethically

Temperature, light, humidity, pressure, accelerometer,magnetics

Page 31: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 31

Explaining Disk AppetiteExplaining Disk Appetite Areal density increases 60%/yr Yet Mb/$ rises much faster!

020

4060

80100

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

Year

MB/$ Mb/$

Moore's Law

Source: J. Porter, Disk/Trend, Inc.http://www.disktrend.com/pdf/portrpkg.pdf

Page 32: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 32

ScenariosScenarios Ubiquitous computing: more than clients

sensors and their data feeds are key smart dust, biomedical (MEMS sensors)smart dust, biomedical (MEMS sensors) each consumer good records (mis)use each consumer good records (mis)use

• disposable computing video from surveillance cameras, broadcasts, etc.video from surveillance cameras, broadcasts, etc.

Global Data Federation all the data is online – what are we waiting for? The plumbing is coming

XML/HTTP, etc. give LCD communicationXML/HTTP, etc. give LCD communication but how do you flow, summarize, query and analyze data robustly but how do you flow, summarize, query and analyze data robustly

over many sources in the wide area?over many sources in the wide area?

Page 33: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 33

Dataflow in Volatile EnvironmentsDataflow in Volatile Environments Federated query processors a reality

Cohera, IBM DataJoiner No control over stats, performance, administration

Large Cluster Systems “Scaling Out” No control over “system balance”

User “CONTROL” of running dataflows Long-running dataflow apps are interactive No control over user interaction

Sensor Nets: the next killer app E.g. “Smart Dust” No control over anything!

Telegraph Dataflow Engine for these environments

Page 34: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 34

Data Flood: Main FeaturesData Flood: Main Features What does it look like?

Never ends: interactivity required Online, controllable algorithms for all tasks!Online, controllable algorithms for all tasks!

Big: data reduction/aggregation is key Volatile: this scale of devices and nets will not

behave nicely

Page 35: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 35

The Telegraph Dataflow EngineThe Telegraph Dataflow Engine Key technologies

Interactive Control interactivity with early answers and examplesinteractivity with early answers and examples online aggregation for data reductiononline aggregation for data reduction

Dataflow programming via paths/iterators Elevate query processing frameworks out of DBMSsElevate query processing frameworks out of DBMSs Long tradition of static optimization hereLong tradition of static optimization here

• Suggestive, but not sufficient for volatile environments Continuously adaptive flow optimization

massively parallel, adaptive dataflow via Rivers and massively parallel, adaptive dataflow via Rivers and EddiesEddies

Page 36: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 36

OceanStore Context: OceanStore Context: Ubiquitous ComputingUbiquitous Computing

Computing everywhere: Desktop, Laptop, Palmtop Cars, Cellphones Shoes? Clothing? Walls?

Connectivity everywhere: Rapid growth of bandwidth in the interior of the net Broadband to the home and office Wireless technologies such as CMDA, Satelite, laser

Rise of the thin-client metaphor: Services provided by interior of network Incredibly thin clients on the leaves

MEMs devices -- sensors+CPU+wireless net in 1mmMEMs devices -- sensors+CPU+wireless net in 1mm33

Mobile society: people move and devices are disposable

Page 37: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 37

Questions about information:Questions about information: Where is persistent information stored?

20th-century tie between location and content outdated (we all survived the Feb 29th bug -- let’s move on!)

In world-scale system, locality is key How is it protected?

Can disgruntled employee of ISP sell your secrets? Can’t trust anyone (how paranoid are you?)

Can we make it indestructible? Want our data to survive “the big one”! Highly resistant to hackers (denial of service) Wide-scale disaster recovery

Is it hard to manage? Worst failures are human-related Want automatic (introspective) diagnose and repair

Page 38: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 38

First Observation:First Observation:Want Utility InfrastructureWant Utility Infrastructure

Mark Weiser from Xerox: Transparent computing is the ultimate goal Computers should disappear into the background

In storage context: Don’t want to worry about backup Don’t want to worry about obsolescence Need lots of resources to make data secure and highly available, BUT don’t want

to own them Outsourcing of storage already becoming popular

Pay monthly fee and your “data is out there” Simple payment interface

one bill from one company

Page 39: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 39

Second Observation:Second Observation:Need wide-scale deploymentNeed wide-scale deployment

Many components with geographic separation System not disabled by natural disasters Can adapt to changes in demand and regional outages Gain in stability through statistics Difference between thermodynamics and mechanics

surprising stability of temperature and pressure given 1030

molecules with highly variable behavior! Wide-scale use and sharing also requires wide-scale deployment

Bandwidth increasing rapidly, but latency bounded by speed of light

Handling many people with same system leads to economies of scale

Page 40: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 40

OceanStore:OceanStore:Everyone’s data, One big UtilityEveryone’s data, One big Utility

“The data is just out there”

Separate information from location Locality is an only an optimization (an important one!) Wide-scale coding and replication for durability

All information is globally identified Unique identifiers are hashes over names & keys Uniform location mechanism:

replaces: DNS, server location, data locationreplaces: DNS, server location, data location No centralized namespace required (e.g. like SDSI)

Page 41: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 41

Amusing back of the envelope Amusing back of the envelope calculationcalculation

(courtesy Bill Bolotsky, Microsoft)(courtesy Bill Bolotsky, Microsoft)

How many files in the OceanStore? Assume 1010 people in world Say 10,000 files/person (very conservative?) So 1014 files in OceanStore!

If 1 gig files (not likely), get 1 mole of bytes!

Truly impressive number of elements…… but small relative to physical constants

Page 42: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 42

Basic Structure:Basic Structure:Irregular Mesh of “Pools”Irregular Mesh of “Pools”

Page 43: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 43

Service provided by confederation of companies Monthly fee paid to one service provider Companies buy and sell capacity from each other

Utility-based InfrastructureUtility-based Infrastructure

Pac Bell

Sprint

IBMAT&T

CanadianOceanStore

IBM

Page 44: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 44

OutlineOutline Motivation Properties of the OceanStore and Assumptions Specific Technologies and approaches:

Conflict resolution on encrypted data Replication and Deep archival storage Naming and Data Location Introspective computing for optimization and repair Economic models

Conclusion

Page 45: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 45

Ubiquitous Devices Ubiquitous Devices Ubiquitous Ubiquitous StorageStorage

Consumers of data move, change from one device to another, work in cafes, cars, airplanes, the office, etc.

Properties REQUIRED for OceanStore storage substrate: Strong Security: data encrypted in the infrastructure;

resistance to monitoring and denial of service attacks Coherence: too much data for naïve users to keep coherent

“by hand” Automatic replica management and optimization: huge

quantities of data cannot be managed manually Simple and automatic recovery from disasters: probability

of failure increases with size of system Utility model: world-scale system requires cooperation

across administrative boundaries

Page 46: Databases Unplugged: Challenges in Ubiquitous Data Management

M. Franklin, 12/17/99 46

State of the Art?State of the Art? Widely deployed systems: NFS, AFS (/DFS)

Single “regions” of failure, caching only at endpoints ClearText exposed at various levels of system Compromised server all data on server compromised

Mobile computing community: Coda, Ficus, Bayou Small scale, fixed coherence mechanism Not optimized to take advantage of high-bandwidth connections

between server components ClearText also exposed at various levels of system

Web caching community: Inktomi, Akamai Specialized, incremental solutions Caching along client/server path, various bottlenecks

Database Community: Interfaces not usable by legacy applications ACID update semantics not always appropriate