-
1 Cloudera, Inc. All rights reserved.
MapReduce Spark Hadoop Spark The One Platform Initiative
Doug Cutting | | Cloudera@cutting
-
2 Cloudera, Inc. All rights reserved.
Apache Spark
Spark
MapReduce Spark
One Platform Initiative
Hadoop
-
3 Cloudera, Inc. All rights reserved.
MapReduce ...
/
MapReduce
Hive Pig Mahout SolrCrunch
-
4 Cloudera, Inc. All rights reserved.
...
: Giraph/Graphlab () Impala ( SQL)
MapReduce
:Hama () Dryad (Arbitrary DAG)
-
5 Cloudera, Inc. All rights reserved.
Apache Spark
MapReduce
(Full Directed Graph expressions)
:
-
6 Cloudera, Inc. All rights reserved.
Apache SparkHadoop
API
Scala,Java,Python API
API
-
7 Cloudera, Inc. All rights reserved.
API Scala, Java, Python
2~5
Python lines = sc.textFile(...) lines.filter(lambda s: ERROR in s).count()
Scala val lines = sc.textFile(...) lines.filter(s => s.contains(ERROR)).count()
Java JavaRDD lines = sc.textFile(...); lines.filter(new Function() { Boolean call(String s) { return s.contains(error); } }).count();
-
8 Cloudera, Inc. All rights reserved.
percolateur:spark srowen$ ./bin/spark-shell --master local[*]...Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.0-SNAPSHOT /_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_51)Type in expressions to have them evaluated.Type :help for more information....
scala> val words = sc.textFile("file:/usr/share/dict/words")...words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at :21
scala> words.count...res0: Long = 235886
scala>
-
9 Cloudera, Inc. All rights reserved.
Spark
RDDResilient Distributed Dataset)
-
10 Cloudera, Inc. All rights reserved.
Spark Hadoop
Spark Streaming MLlib SparkSQL GraphX Data-frames SparkR
HDFS, HBase
YARN
Spark Impala MR OthersSearch
-
11 Cloudera, Inc. All rights reserved.
Cloudera Spark
2013 2014 2015 2016
Spark
CDH 4.4 Spark
YARN Spark
Spark
Spark
Cloudera OReilly Spark
-
12 Cloudera, Inc. All rights reserved.
Cloudera Spark ClouderaSpark Hadoop SparkCloudera
Cloudera Spark Hadoop
Cloudera 25
Spark
-
13 Cloudera, Inc. All rights reserved.
Cloudera Spark
Cloudera67%
Intel17%
Hortonworks17%
Hadoop Spark *
IBM MapR
Hadoop
Cloudera, 370 Hortonworks, 4 IBM, 12 MapR, 1 Intel, 400
-
14 Cloudera, Inc. All rights reserved.
Cloudera
Spark 150 800 Spark
-
15 Cloudera, Inc. All rights reserved.
Cloudera Core Spark Spark Streaming
ETL 20
Jaccard
ERP
(LDA)
1010
-
16 Cloudera, Inc. All rights reserved.
Spark MapReduce Hadoop
-
17 Cloudera, Inc. All rights reserved.
Spark MapReduce
1
Crunch on SparkSearch on Spark
2
Hive on Spark (beta)Spark on HBase (beta)
3
Pig on Spark (alpha)Sqoop on Spark
Cloudera Spark
-
18 Cloudera, Inc. All rights reserved.
Spark Hadoop One Platform Initiative
Hadoop
Hadoop
1
80%
-
19 Cloudera, Inc. All rights reserved.
Hadoop Spark
Spark
Impala
Low-Latency
Solr
MapReduce I/O
:
-
20 Cloudera, Inc. All rights reserved.
Cloudera
Hadoop 1
Cloudera
-
21 Cloudera, Inc. All rights reserved.
Spark Spark OReilly Advanced Analytics with Spark eBook (Cloudera) Cloudera Developer Blog cloudera.com/spark
Cloudera Spark Training
Cloudera Live Spark Tutorial
-
22 Cloudera, Inc. All rights reserved.
@cuMng