hadoop和rdbms - oracle€¦ · hadoop hdfs data node exadata + oracle database oracle catalog...

50
版权所有 © 2014Oracle /或其关联公司。保留所有权利。 | Corey Wei 技术顾问 甲骨文公司 基于HadoopRDBMS Oracle大数据分析

Upload: others

Post on 21-May-2020

68 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Corey Wei技术顾问甲骨文公司

基于Hadoop和RDBMS

的Oracle大数据分析

Page 2: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 | 2

Agenda

Big Data Solution Overview

Big Data Appliance

Oracle NoSQL Database

Big Data SQL

Big Data Connectors

Oracle Advanced Analytics

Case Study

Page 3: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Big Data Solution Overview

3

Page 4: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Big Data: Techniques and

Technologies that Make Handling

Data at Extreme Scale

Economical.

Brian Hopkins and Boris Evelson, Forrester Research, “Expand Your Digital Horizons with Big Data” (September 2011)

Big Data Definition

4

Page 5: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Stream Acquire – Organize – Analyze

In-D

atab

ase

An

alyt

ics

Data

Marts / ODS

Predictive

Analytics

Decide

Event / StreamData Capture

Log / FileData Capture

Ap

plic

atio

ns

NoSQL

Hadoop

Predictive

Analytics

Bridge Unstructured/

Structured

ETL

Data

Warehouse

Dashboards,Reporting & Query

Real-Time Information Discovery

Oracle Big Data Approach - Functional View

5

Page 6: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Stream Acquire – Organize – Analyze

In-D

atab

ase

An

alyt

ics

Data

Warehouse

Oracle Advanced

Analytics

Oracle

Database

Decide

Oracle Event Processing

Apache Flume

Ap

plic

atio

ns

Oracle NoSQL

Database

Cloudera

Hadoop

Oracle R

Distribution

Oracle Big Data Connectors

Oracle DataIntegrator

Oracle Industry

Data Model(s)

Oracle BI Enterprise Edition

Oracle Real-TimeDecisions

Endeca Information Discovery

Oracle Big Data Approach - Product View

6

Page 7: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Stream Acquire – Organize – Analyze

In-D

atab

ase

An

alyt

ics

Data

Warehouse

Oracle

Advanced

Analytics

Oracle

Database

Oracle BI Enterprise Edition

Oracle Real-TimeDecisions

Endeca Information Discovery

Decide

Oracle Event Processing

Apache Flume A

pp

licat

ion

s

Oracle

NoSQL

Database

Cloudera

Hadoop

Oracle R

Distribution

Oracle Big Data Connectors

Oracle DataIntegrator

Oracle Big Data Approach – Engineered Systems

• Complete

• Integrated

• Scalable

7

Page 8: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Big Data Appliance

8

Page 9: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Sun Oracle X4-2L Servers with per server:

• 2 * 8 Core Intel Xeon E5 Processors

• 64 GB Memory

• 48TB Disk space

Integrated Software:

• Oracle Linux

• Oracle Java VM

• Cloudera Distribution of Apache Hadoop (CDH)

• Cloudera Manager

• All Cloudera Options

• Oracle R Distribution

• Oracle NoSQL Database

All integrated software (except NoSQL DB CE) is supported as part of Premier Support for Systems and Premier Support for Operating Systems

9

Big Data Appliance X4-2

Page 10: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Starter Rack is a fully cabled and

configured for growth with 6 servers

In-Rack Expansion delivers 6 server

modular expansion block

Full Rack delivers optimal blend of

capacity and expansion options

Grow by adding rack – up to 18 racks

without additional switches

Big Data Appliance Product Family

10

Page 11: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Engineered Systems Benefits

Lower TCO than DIY Hadoop Clusters

Faster Time to Value

Higher Performance out-of-box

Lower Management Overhead

Integrated and Comprehensive Security

Tight Integration with your Infrastructure

Big Data Appliance

11

Page 12: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

TCO Data Points:

18 servers (DL380 vs. X4-2L)

864TB Raw Storage

288 Cores

1152GB Total Memory

Cloudera Enterprise Subscription

with all options

Subscription vs. Perpetual

Equivalent Installation Cost

Not calculated:

Soft Cost (people and time to value)

Data integration licenses

$0

$200,000

$400,000

$600,000

$800,000

$1,000,000

$1,200,000

$1,400,000

Year 1 Year 2 Year 3 Year 4 Year 5

Oracle BDA

HP + Cloudera

Savings

List Price Comparisons

Cu

mu

lative

Co

st a

nd

Sa

vin

gs

Engineered Systems Benefits

12

Page 13: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

BDA 3.0 DIY CDH 5.0

Management Console

Single Command Patching

and Upgrade

Full Stack Patching and

Upgrading

Automatic Cluster Re-

Configuration

Security (AAA) out-of-box

Encryption out-of-box

(network and at-rest)

InfiniBand + Optimizations

Stack Tuning

(OS, Java, Hadoop)

Engineered Systems Benefits

13

Page 14: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Authentication through Kerberos

Authorization through Apache Sentry

Auditing through Oracle Audit Vault

Encryption for Data-at-Rest

Network Encryption

BDA Security Overview

14

Page 15: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Management Infrastructure combines EM and CM

Quick view of Hardware and Software status

in Oracle Enterprise Manager

Integrated Management Framework

15

Page 16: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Oracle NoSQL Database

16

Page 17: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Features

Scalable, Highly Available, Key-Value Database

Application

Storage NodesDatacenter B

Storage NodesDatacenter A

Application

NoSQL DB Driver

Application

NoSQL DB Driver

Application

• Key-value, JSON & RDF data

• Large Object API

• BASE & ACID Transactions

• Data Center Support

• Online Rolling Upgrade

• Online Cluster Management

• Table data model

• Secondary Indices

• Secondary Zones (Data Centers)

• Security

Oracle NoSQL Database

17

Page 18: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

• Automatic election of new Master

• Rejoining nodes automatically synchronize with the Master

• Isolated nodes can still service reads

• All nodes are symmetric

Automatic Failover

Replication factor = 5

Rep

Node

Master

Rep

Node

Replica

Rep

Node

Replica

Rep

Node

Replica

Rep

Node

Replica

New Master

Features - Failover

18

Page 19: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

• Simple data model – key-value pair (major+minor-key paradigm)

• Simple operations – read/insert/update/delete, RMW support

• Major key: hashed to a Shard (partition), Minor key Btree within a Shard

• Raw Key/Value and JSON schema APIs supported

Key-Value pairs

userid

addresssubscriptions

email idphone #expiration date

Major key:

Minor key:

Value:

Strings

Byte Array

Value Options: Key-Value JSON RDF Triples Tables/Rows

picture

.jpg

Features – Flexible Data Model

19

Page 20: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

• Benefits

– Lower barrier to adoption, shorter time to market

– Simplified application modeling

– Uses familiar table concepts

• Features

– Layered on top of distributed key-value model

– Compatible with Release 2.0 JSON schemas

– Supports table evolution, retains flexible client access

• Sets foundation for future capabilities

NoSQL DB Table Model

Features – Flexible Data Model

20

Page 21: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

• Configurable Durability per operation

• Configurable Consistency per operation

• ACID by default

• Transaction scope is single API call

• Records share same major key

• Multiple operations supported

Greater Flexibility

Features – Configurable Transactions

21

Page 22: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

• Increase Data Capacity

– Add more storage nodes

– New shards automatically created

• Increase Data Throughput

– More shards = better write throughput

– More replicas/shard = better read throughput

On Demand NoSQL DB Driver

Application

Master

Replica

Replica

StorageNode StorageNode StorageNode

Shard-1

Master

Replica

Replica

Shard-2

On-Demand Cluster Expansion

Features – Elasticity

22

Page 23: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

• Supports heterogeneous storage topology

• Replicas move from over-utilized to under-utilized storage nodes

• Number of shards and replication factor remain unchanged

Improve PerformanceStorage Node 1 Storage Node 2 Storage Node 3

Represents a partition

Features – Automatic Rebalancing

23

Page 24: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

0

2.5

5

7.5

10

12.5

15

17.5

72 (24x3) 144 (48x3) 216 (72x3)

Tim

e t

o U

pgr

ade

(m

in)

Total Nodes (Shards x Rep. Factor)

Online Rolling Upgrade

• We did do it!• Admin commands available to

describe safe upgrade order• Scripted available hands-free

upgrade experience• Read/Write availability

throughout the upgrade process

What’s the Big Deal

Ever tried to upgrade a 200 node system while it‟s active?

Features – Online Rolling Upgrades

24

Page 25: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Query NoSQL data from Oracle Database

Access NoSQL data from Hadoop for DW and analytics

Share data with Coherence for extensible in-memory cache grid

Persist history & event streams for processing with OEP

Store & query RDF data using Oracle RDF for NoSQL

Oracle NoSQL Database: Integrated out of the box

Features – Integration

25

Page 26: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

0

1

2

3

4

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

6 (2x3) 12 (4x3) 24 (8x3) 30 (10x3) Ave

rage

Lat

en

cy (

ms)

Thro

ugh

pu

t (o

ps/

sec)

Cluster Size

Mixed Throughput

Throughput (ops/sec) Write Latency (ms)

Read Latency (ms)

•1.25M ops/sec

• 2 billion records

• 2 TB of data

• 95% read, 5% update

• Low latency

• High Scalability

(Yahoo Cloud Scalability Benchmark)

Benchmark Results - YCSB

26

Page 27: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Big Data SQL

27

Page 28: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

• Hadoop is good at some things

• Databases are good at others

• SQL is very important

Strengths of Both Systems

28

Page 29: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Exadata+

Oracle Database

Big Data Appliance+

Hadoop & NoSQL

UnifyDevelopment languages

Security

Administration

Support

Workload management

Lifecycle management

Availability

Embrace Innovation and Integrate

29

Page 30: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 | 30

Big Data Appliance+

Hadoop

HDFS

Data Node

Exadata+

Oracle Database

Oracle Catalog

External Table

create table customer_address

( ca_customer_id number(10,0)

, ca_street_number char(10)

, ca_state char(2)

, ca_zip char(10)

)

organization external (

TYPE ORACLE_HIVE

DEFAULT DIRECTORY DEFAULT_DIR

ACCESS PARAMETERS

(com.oracle.bigdata.cluster hadoop_cl_1)

LOCATION ('hive://customer_address')

)

HDFS

Data Node

HDFS

Name Node

Hive metadata

External Table

Hive metadata

Publish Hadoop Metadata to Oracle Catalog

Page 31: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 | 31

HDFS

Data Node

Oracle Catalog

External TableHDFS

Data Node

HDFS

Name Node

Hive metadata

External Table

Hive metadata

HDFS

Data Node

HDFS

Data Node

Determine:• Data locations • Data structure• Parallelism

Send to specific data nodes:• Data request• Context

Executing Queries on HadoopSelect c_customer_id

, c_customer_last_name

, ca_county

From customers

, customer_address

where c_customer_id = ca_customer_id

and ca_state = „CA‟

Page 32: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 | 32

HDFS

Data Node

Oracle Catalog

External Table

Select c_customer_id

, c_customer_last_name

, ca_county

From customers

, customer_address

where c_customer_id = ca_customer_id

and ca_state = „CA‟

HDFS

Data Node

HDFS

Name Node

Hive metadata

External Table

Hive metadata

HDFS

Data Node

HDFS

Data Node

“Tables”

Do I/O and Smart Scan:• Filter rows• Project columns

Move only relevant data• Relevant rows• Relevant columns

Apply join with database data

Executing Queries on Hadoop

Page 33: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Storage Indexes

• Automatically collect and store the minimum and maximum value within a storage unit

• Before scanning a storage unit, verify whether the data requires falls within the Min-Max

• If not, skip scanning the block and reduce scan time

33

HDFS

Data Node

HDFS

Data Node

HDFS

Name Node

Hive metadata

HDFS

Data Node

HDFS

Data Node

“Blocks”

MinMax

MinMax

MinMax

Optimizing Scans on Hadoop

Page 34: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

One Query Spanning Oracle Database, Hadoop & NoSQL

Query Data in RDBMS,

Hadoop & NoSQL

Oracle SQL

Oracle

NoSQL DB

HDFS

Data Node

Oracle

NoSQL DB

HDFS

Data Node

Oracle Database

Storage Server

Oracle Database

Storage Server

FastMassive Parallelism

Storage Indexes

Filtered Locally

Minimized Data Movement

Oracle Big Data SQL

34

Page 35: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Big Data Connectors

35

Page 36: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

SHUFFLE

/SORT

SHUFFLE

/SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

ORACLE LOADER FOR HADOOP Offloads data pre-processing from the database server to Hadoop

Works with a range of input data formats

Automatic balancing in case of skew in input data

Online and offline modes

Kerberos authentication

Connect to the database from reducer nodes, load into database partitions in parallel (JDBC or direct path)

Partition, sort, and convert into Oracle data types on Hadoop

Oracle Loader for Hadoop

36

Page 37: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Generate external table in database pointing to HDFS data

Load into database or query data in place on HDFS

Fine-grained control over data type mapping

Parallel load with automatic load balancing

Kerberos authentication

Use Oracle SQL to Access Data on HDFS

External

Table

OSCH

OSCH

OSCH

SQL Query

HDFS

Client

Hadoop Oracle Database

Access or load into the database in parallel using external table mechanism

OSCH

Oracle SQL Connector for HDFS

37

Page 38: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

R Analytics leveraging Hadoop and HDFS

Linearly Scale a Robust Set of R Algorithms

Leverage MapReduce for R Calculations

Compute Intensive Parallelism for SimulationsHDFS

Hadoop

Oracle R Client

MAPMAP MAPMAP

REDUCE REDUCE

Oracle R Advanced Analytics for Hadoop

38

Page 39: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Transforms

Via MapReduce(HIVE)

Loads

Activates

Oracle

Loader for

Hadoop

Oracle Data

Integrator

Benefits

• Consistent tooling across BI/DW, SOA, Integration and Big Data

• Reduce complexities of processing Hadoop through graphical tooling

• Improves productivity when processing Big Data (Structured + Unstructured)

Oracle Database

Improving Productivity and

Efficiency for Big Data

Oracle Data Integrator Application Adapters for Hadoop

39

Page 40: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Acquire – Organize – Analyze

Oracle Big Data Connectors

Oracle DataIntegrator Oracle

Loaderfor

Hadoop

OXH is a transformation engine for Big Data

XQuery language executed on the Hadoop

XQuery

for $ln in

text :collect ion()

let $f :=

tokenize($ln)

where $f[1] = 'x '

return

text :put ($f[2] )

Map/Reduce

Execut ion Plan

M/R

M/R

M/R

M/R

Map/Reduce

Worker Nodes

HDFS

OXH

Engine

Oracle XQuery for Hadoop

40

Page 41: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Simplify Map Reduce

OLH

&

OSCH

Oracle

Data

Integrator

• Automatically generates MapReduce code

• High performance loads into Data Warehouse leveraging both OLH and OSCH

• Manages the process across platforms

Oracle Data Integrator

41

Page 42: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Oracle Advanced Analytics

42

Page 43: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Function Algorithms Applicability

Classification

Logistic Regression (GLM)Decision TreesNaïve Bayes Support Vector Machines (SVM)

Classical statistical techniquePopular / Rules / transparencyEmbedded appWide / narrow data / text

RegressionLinear Regression (GLM)Support Vector Machine (SVM)

Classical statistical technique

Wide / narrow data / text

Anomaly Detection

One Class SVM Unknown fraud cases or anomalies

Attribute Importance

Minimum Description Length (MDL)Principal Components Analysis (PCA)

Attribute reduction, Reduce data noise

Association Rules

Apriori Market basket analysis / Next Best Offer

ClusteringHierarchical k-MeansHierarchical O-ClusterExpectation-Maximization Clustering (EM)

Product grouping / Text miningGene and protein analysis

Feature Extraction

Nonnegative Matrix Factorization (NMF)Singular Value Decomposition (SVD)

Text analysis / Feature reduction

In-Database Data Mining Algorithms

A1 A2 A3 A4 A5 A6

A7

F1 F2 F3 F4

Oracle Advanced Analytics

43

Page 44: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

• R-SQL Transparency Framework intercepts R

functions for scalable in-database execution

• Function intercept for data transforms,

statistical functions and advanced analytics

• Interactive display of graphical results and flow

control as in standard R

• Submit entire R scripts for execution by

database

• Scale to large datasets

• Access tables, views, and external tables,

as well as data through DB LINKS

• Leverage database SQL parallelism

• Leverage new and existing in-database

statistical and data mining capabilities

R Engine Other R

packages

Oracle R Enterprise packages

User R Engine on desktop

• Database can spawn multiple R engines for

database-managed parallelism

• Efficient data transfer to spawned R

engines

• Emulate map-reduce style algorithms and

applications

• Enables “lights-out” execution of R scripts

1User tables

Oracle DatabaseSQL

Results

Database Compute Engine

2R Engine Other R

packages

Oracle R Enterprise packages

R Engine(s) spawned by Oracle DB

R

Results

3

?x

ROpen Source

Oracle R Enterprise Compute Engines

Oracle Advanced Analytics

44

Page 45: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Unified Analytics API

SQL R MR

Unified Analytics Processing Platform

Hadoop RDBMS

IB

Management Framework and Tools

Unified access model supporting all analysys capabilities: SQL, R & MR

Oracle Enabling Technologies

45

Page 46: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Case Study

46

Page 47: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

1. Provision systems are

complex, expensive and inefficient

2. Lack of business agility and very long

time-to-market and time-to-value (6-12

months)

3. Business users are by-passing IT

corporate systems

4. Datamarts are strongly siloed with no

interoperability

5. Complex Operations with very limited

backup/recovery and no HA

capabilities

6. Unstructured information not managed

7. Lack of Advanced Analytic capabilities

Current (“as-is”) architecture is based on a

“years 90s” design: siloed datamarts with

complex and expensive provision

systems, unable to respond to the new

business requirements with agility

Siloed Operational Systems with complex , heavy and slow data

transformation and flows to Data Marts

Current (“as-is”) Architecture

47

Page 48: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

Security and Metadata

Source Data Layer

SAP RRHH

Diario Electronico

Streaming

Sensors

Social/Text

Information Management

Data Integration : Data Factory Engine & ODI + Metadata

Staging & Raw Data Layer

Access & Level 4

Performance Layer

Knowledge Discovery Area

Embedded

Data Marts

Level 0 + 1

Data

Quality

High Density

Information Access

Alerts, Dashboards,

Reporting

Services

Foundation Layer Level 2 + 3

Rapid Development SandboxAnalytical Discovery Sandbox

Advanced Analysis &

Data Science(Discovery)

BI A

bst

ract

ion

& Q

uer

y Fe

der

atio

n

Performance Management

Mainframe

Low Density

High DensityD

ata

Fac

tory

En

gin

e

OD

I

GG

MQ

FTE

Low Density

OtrosM

QFT

E Transformed data

Interfases

MQ

BI S

erve

r

Da

ta F

ac

tory

En

gin

e

Da

ta F

ac

tory

En

gin

eOD

I

OD

I

Data Marts

Data Pool Logical Architecture

48

Page 49: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number

版权所有 © 2014,Oracle 和/或其关联公司。保留所有权利。 |

DC2DC1

IB IB

PRD

Backup

Snapshot

TSM10GbE

VTL VTL

Oracle DataGuard

FC

IB IB IB IB

ZFS Replication

UATPRD‟

BDR

Replication

ZS-3 Backup

Oracle RMANTSM

FC 10GbEBackup

Snapshot

Data Pool Data Pool

ZS-3 Backup

Oracle RMAN

UAT‟

SAN SAN

Data Pool Hardware Architecture

49

Page 50: Hadoop和RDBMS - Oracle€¦ · Hadoop HDFS Data Node Exadata + Oracle Database Oracle Catalog External Table create table customer_address ( ca_customer_id number(10,0), ca_street_number