web computing for information island crisis in the era of ... · pdf fileweb computing for...

29
Web Computing for Information Island Crisis in the Era of Big Data Gang Huang Peking University 2016.10.20, Taiyuan, China 数据孤岛的Web开放之道

Upload: truongthuan

Post on 18-Mar-2018

226 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Web Computing for Information Island

Crisis in the Era of Big Data

Gang Huang

Peking University

2016.10.20, Taiyuan, China

数据孤岛的Web开放之道

Page 2: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Agenda

• Information Island Crisis in the Era of Big Data

• Web Computing Paradigm as a Silver Bullet

• 10 Years Research on Web Computing

• Future of Web Computing for Big Data

1Web Computing for Big Data - Gang Huang

Page 3: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Data as a Resource

2Web Computing for Big Data - Gang Huang

Enterprise

Information

System

Mobile AppDesktop/Web App

Embedded System

“Surface” Data from World Wide Web • Data can be retrieved by standard web crawlers or search

engines such as Google, Baidu, Bing, etc

• Till June 2016,4.5+ million web sites with 200+ billion pages

“Deep” Data from Service-Oriented Web• Source: enterprise/organization information systems, business

systems like Amazon, Ctrip, CRM, SCM, and zillions of

desktop/mobile apps

• Such data is dynamically generated with the service interaction,

but CANNOT be accessed via crawler!

• Volume: 10x-100x compared to surface data(excluding

video/audio)• Value: pretty higher than surface data

Big Data is generated by billions of Information Systems

Page 4: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Deep Data Collection

3Web Computing for Big Data - Gang Huang

Enterprise

Information

System

Mobile AppDesktop/Web App

Embedded System

In 2012, Google announced

the “In-App Search” for deep

data exploration

In 2015, Apple iOS 9 supports

deep data search for Apple

APPs and cached data search

of other APPs

Surface data collection is the core competence of WWW

Deep data collection is the core competence of Big Data

Page 5: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Information Island Crisis in the Era of Big Data

4

The In-App

Search can

support only

1000+ apps

The iOS supports only

local cache of third-

party APPs

Web Computing for Big Data - Gang Huang

中国大数据产业峰会(2016年5月25日)

50,000,000 Man-Months

100,000,000,000 RMB

Collecting data from 100,000 e-Gov Systems

* from Digital China, Neusoft, Taiji, CS&S, etc.

Page 6: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Silver Bullet to Information Islands

Export from

Close DB

Package

Interception

on HTTPS

Crawler

on C/S

Crawler

on A/S

B/S

C/S

A/SDB

Application

Logic

Network数据

• Specific or ad-hoc solutions for

different levels and scenarios.

• Typically include DB exporter/

importer, crawler, refactoring,

• Heavily depend on the

application infrastructure, e.g.,

hardware, OS, security policies.

• High difficulty, risk, cost, labor-

based, error-prone.

5Web Computing for Big Data - Gang Huang

Refactoring without

source code and

developer ?ET

LDB , refactoring CrawlerIntereption

Page 7: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Agenda

• Information Island Crisis in the Era of Big Data

• Web Computing Paradigm as a Silver Bullet

• 10 Years Research on Web Computing

• Future of Web Computing for Big Data

6Web Computing for Big Data - Gang Huang

Page 8: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

YanCloud for Data as a Service

7

Desktop/Web/Mobile Application Systems

Data API Learning and Construction Platform

Client API Cloud

Data API Runtime and Management Platform

• Data Catalog

• API composition

• Online deployment and evolution

• Data accounting

Data API Store

• Domain-Specific API

• API production, consumption

• General at memory level

• Read, but can write back

• Real-time data manipulation

• WYSWYG data visualization

• The unique PRODUCT supports deep data collection of Web/PC/App

Web Computing for Big Data - Gang Huang

Page 9: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

YanCloud Applications on Smart City

Data API

Data API

Data API

1. 马鞍山2. 北京本地新闻3. 北京晨报4. 北京新闻5. 本溪门户网6. 本溪通7. 本溪网8. 广佛都市网9. 四川新闻网10. 四川在线网11. 威海网12. 威海信息港13. 西安新闻网14. 张家港在线15. 张家港在线16. 中国本溪17. 中国首都网18. 北京政风行风热线19. 成都市长邮箱20. 张家港市便民服务网21. 北京12345微博22. 北京发布微博23. 北京交警微博24. 本溪发布厅微博25. 平安本溪微博26. 威海播报微博27. 威海发布微博28. 威海警方在线29. 北京交通违章30. 北京驾照扣分31. 福州驾照扣分32. 山东威海交通违章33. 威海水费管理系统34. 威海电费系统

35. 西安公积金系统36. 西安驾照扣分37. 西安社保38. 北京公交卡系统39. 成都机动车违章系统40. 成都地税系统41. 成都售楼系统42. 重庆机动车违章查询43. 重庆驾驶人违章查询44. 重庆交通管理信息网45. 重庆驾照记分系统46. 重庆图书馆系统47. 成都水账单系统48. 北京摇号系统49. 便民查询网系统50. 北京交通违章系统51. 北京公积金系统52. 北京社保系统53. 扬州驾驶人违章系统54. 扬州用电系统55. 武汉用水系统56. 武汉用电系统57. 珠海—中国南方电网58. 全国违章查询系统59. 中国扬州系统60. 扬州燃气系统61. 徐州社保公积金水费查询62. 南通公积金,水费查询63. 本溪交通,水费查询64. 扬州-物价云管理系统65. 贵阳电费系统66. 石家庄违章系统67. 石家庄驾驶人扣分系统68. 珠海驾驶人系统…

From 60

man-months

To 1 man-

day

8Web Computing for Big Data - Gang Huang

315 data APIs for 121

systems from 43 cities

Page 10: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

YanCloud Applications on Data Collection

9Web Computing for Big Data - Gang Huang

Tax Management Systems

HR Management System

From impossible

To 5 man-days using YanCloud

From unsolvable

To 3 man-days using YanCloud

Page 11: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

YanCloud Applications on Data Collection

10Web Computing for Big Data - Gang Huang

500+ Systems in 20+ Provinces and Ministries across China in 2016

Engineering Efficiency ⬆100X Labor Cost ⬇90%

Sharing andCrowdsourcingof data,algorithms,applicationsandstakeholders

Page 12: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

YanCloud Applications on Mobilization

11Web Computing for Big Data - Gang Huang

Generate mobile APPfrom legacy Visa Application System

Generate WeChat Public Accountfrom legacy Journal Portal

Page 13: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

DaaS Applications on Mobile Intelligence

12Web Computing for Big Data - Gang Huang

腾讯新闻 猫眼电影

美团

滴滴出行

Deep Sensing Deep Searching Deep Linking

Deep Sensing Deep Searching Deep Linking

Deep Sensing Deep Searching Deep Linking

Page 14: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Agenda

• Information Island Crisis in the Era of Big Data

• Web Computing Paradigm as a Silver Bullet

• 10 Years Research on Web Computing

• Future of Web Computing for Big Data

13Web Computing for Big Data - Gang Huang

Page 15: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Our Vision on Internet Computing

Web Computing for Big Data - Gang Huang 14

Pervasive Computing

Internet of Things

Service Computing

Semantic Web

Social Computing

System of Systems

Grid/Cloud Computing

as a

Computer

Digital Economy

E-government

Modern Service

Smarter Planet

Internet Culture

Social Network

Virtual World

Internet

Technical Trend Business TrendBig Trend

•Grid/Cloud computing proposes a new model of networked applications from the perspective of resource sharing and management.

•Pervasive computing discusses a new situation of networked applications from the perspective of human computer interaction.

•Service Oriented Computing focuses on a new form of software with emphasis on collaboration and dynamism from the philosophy of

software as a service.

•…

Page 16: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Internetware for Internet Computer

Web Computing for Big Data - Gang Huang 15

“Internet Computer” requires substantial improvements in software

characteristics for implementing new business naturally with new technology.

Internetware: A New Software Paradigm for Internet Computing, IEEE Computer 2012

IBM GTO (Global Technology Outlook) 2012

Page 17: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Web Pages as Web Services

16Web Computing for Big Data - Gang Huang

Web Technologies Mechanisms (HTML, JavaScript, CSS)

Web Browser

SOAP Service RESTful Service JavaScript API RSS/Atom

Internetware Rich Client: Browser-based Middleware

Application Programming Interface

MaaSiMashup (Mashup Environment) Service-Oriented Rich Client

Intra-Browser Communication

Mechanisms

Event Bus UI Composition

Browser-Server Communication

Mechanisms

Service Data Cache

Business Process Integration

Component

Container

On-the-fly Composition

Model Checking for Quality

Cross-Domain OAuth

Advanced Features Advanced Features

CyberC 2009 Best Paper

IEEE Transactions on Services Computing 2009

Q 1: Very few service mashup components?

Silver Bullet Part 1: We controlthe web pages for opening

information island !

Service mashup is a data flow integratingmultiple interactive web services

A : Any Web page can become a web mashup

component if we break the security mechanisms

of standard web pages, i.e. sandbox.

Page 18: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

In-Depth Analysis on Services Mashups

17Web Computing for Big Data - Gang Huang

Behavior model

(UML Sequence Diagram)

Verification model

(in Premola)

Model Checker

SPIN

Behavior of

application

Specification of

constraints and refinements

Results

(trace sets and violation)

① Generating Behavior Model ② Constraints and

Refinements

Specification

③ Verification of

Behavior Model

Behavior of

environment

Generation Template for Runtime Environment

behavior meta-model Synthesized behavior model

ICSS 2010 Best Paper Award

Q 2: Web browser controls the behavior of web pages?

A: We analyze the source code of web browser and

model checks its runtime behavior for understanding the

whole browser-based service mashups

Performance evaluation

Silver Bullet Part 2: We controlthe web browser for opening

information island !

Page 19: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Data Cache for Services Mashup

18Web Computing for Big Data - Gang Huang

Logic

1. Intercept User Requests

2. Query

3. Invoke

4. Respond 5. Cache

6. Respond

7. Validate

Instance Repository

Application Programming Interface

Cache Strategy

Data Model

Component

Context Desired cache strategy

Google Weather’s

cache strategy:

Cacheexpires

immediately.

AA uses the weather data in a real time application.

The same as from Google.

B

B uses the weather data to feed other services which care less about the accuracy.

Frequency: cache data does not expire within five minutes from the last response.

C

C only needs today’s weather from the responses, which varies less frequent.

Granularity: cache should be done on fine-grained structures within the responses.

Q 3: Standard Browser/Server

interactions unfit service

mashup?

A : We control the cache

strategies of HTTP and HTML.

SOCA 2010 Best Paper Nomination, WWW 2015

Silver Bullet Part 3: We controlthe interaction between webbrowser and web server for

opening information island !

Page 20: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Offloading Javascript Programs

19Web Computing for Big Data - Gang Huang

•Rich Web mashups cannot work well on mobile devices •Chess games, 3D Graphics, RPGs

•Mobile Web can leverage the cloud-side resources

49x page load time

improvement

92% Energy saving

Generally applied to

major browsers

Chrome, Safari, and

FireFox

SPLASH 2012, WWW 2016, IEEE Transactions on Mobile Computing 2016

Q 4: Javascript make web pages much more

complex to understand and control?

A : We make the Javascript programs

offloaded from mobile browser to cloud.

Silver Bullet Part 4: We controlthe Javascript programs for

opening information island !

Page 21: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

All-in-One by SM@RT

20Web Computing for Big Data - Gang Huang

Science China 2013 & IEEE Transactions on Services Computing 2016

SM@RT SMVC ModelSM@RT Client-Cloud-Convergence

Platform

Page 22: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

SM@RT for Java-based Information Islands

21Web Computing for Big Data - Gang Huang

Clusteredappclasses

3.detectwhichclassesshouldbeoffloadedasawhole

Locationanchoredappclasscluster

Movableappclasscluster

1.detectwhichclassesaremovable

a b c

d e f

g h i

ClassifiedappclassesLocationanchoredappclass

Movableappclass

2.makemovableclassesbeabletooffload

Proxyclass

Transformedappclasseswithproxies

a b c

d e f

g h i

4.packagedeployablefiles+

DeployableAndroidapp+

MovableappclassespackedinanexecutableJarfile

JavaBytecode

Android app

a b c

d e f

g h i

ClassifiedappclassesLocationanchoredappclass

Movableappclass

OOPSLA 2012

Re-implement the silver bullet for opening Java-based information island

Runtime model of an offloaded Android app

97% execution time and 83% energy saving

SM@RT SMVCProgramming Abstraction

•Java bytecode•Java VM •Java Invocation•VM in Cloud

Page 23: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Our Silver Bullet to Information Island Crisis

22Web Computing for Big Data - Gang Huang

Apps/Server, Client/Server, Browser/Server

Networked Software

Architecture

Dev Framework

Middleware

Host OS Host OS

Network

Code/Data

Analytics

Re

co

ve

ry a

nd

Re

facto

ring

Micro

Service

Self-Organize

Self-Optimize

Self-Evolve

Self-Configure

Self-Healing

Self-Protect

HTML/CSS

Javascript

Java bytecode

Assembly

Browser

JDK/JVM

GUI Widget

HTTP Stack

Android/Linux

Service Oriented

Software Architecture

Data and Service

Innovation

SM@RT

Model

View

Controller

Presentation DataBusiness

Page 24: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Summary of Our Web Computing for Big Data

• The ONLY Silver Bullet for Web/Desktop/Mobile Information Islands

• 500+ Government and Enterprise Applications

• 100X Engineering Efficiency Improvement

• 90% Labor Cost Saving

• 80,000,000 RMB Patent Royalties

• 10 years research and practice

Web Computing for Big Data - Gang Huang 23

Page 25: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Agenda

• Information Island Crisis in the Era of Big Data

• Web Computing Paradigm as a Silver Bullet

• 10 Years Research on Web Computing

• Future of Web Computing for Big Data

24Web Computing for Big Data - Gang Huang

Page 26: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

Intra-Organization Deep Data Sharing

Palantir:real-time inspection of tens of

government system

20 billion $ assessed value

DOMO: Real-time collection of hundreds of

EIS and BI support

2 billion $ Assessed Value

25Web Computing for Big Data - Gang Huang

Page 27: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

API Economy for Big Data

Palantir:real-time inspection of

tens of government system, 20

billion $ assessed value

DOMO: Real-time collection of

hundreds of EIS and BI support

2000M $ Assessed Value

API-based data

trading (10+ billions

of RMB market)

Intra-Organization Deep Data Inter-Organization Deep Data

API Economy for situational applications

5 Billion API request of Google and Facebook

3 Billion API request of Twitter(75% of total traffic)25+ billion USD market (Gartner]

Situational Deep Data

26Web Computing for Big Data - Gang Huang

Page 28: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

API Economy by Web Computing

API Specification(Data)

API Management(Data)

API Invocation(Data)

API Consumption(Data)

API Economy

625 M $2000M $ 2800M $

600M $

Web 1.0 (HTML+HTTP) Web 2.0 (REST+XML) Web 3.0 (Semantics)

Web 1.0 (HTML+HTTP) Web 2.0 (REST+API ) Web 3.0 (Big Data)

HTMLv.s.

API/Data Spec

Web Searchv.s.

API/Data Search

HTTP/SSL for Web Pages v.s.

HTTP/Block-chain for Data

RESTfulv.s.

Micro-Services

Web Computing for Big Data - Gang Huang 27

1 ZB互联网年流量

Page 29: Web Computing for Information Island Crisis in the Era of ... · PDF fileWeb Computing for Information Island Crisis in the Era of Big Data ... Web Computing for Big Data ... Client

ThanksWeb Computing for Information Island Crisis in the Era of Big Data

Gang Huang, Peking University, [email protected]

数据孤岛的Web开放之道 黄罡 北京大学