spark手把手:[e2-spk-s01]
TRANSCRIPT
-
1
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
3 . 1
-
3 . 2
-
3 . 2
-
3 . 2
-
3 . 2
-
3 . 2
-
3 . 2
-
3 . 3
-
3 . 4
-
3 . 4
-
3 . 4
-
3 . 4
-
3 . 4
-
3 . 5
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 5
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 5
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 5
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 6
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 6
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 7
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 7
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 8
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 8
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 9
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 9
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 10
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 10
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 11
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 11
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 12
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 12
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 13
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 13
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 14
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
3 . 14
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
-
4 . 1
-
4 . 2
-
4 . 2
-
4 . 3
-
4 . 3
-
4 . 3
-
4 . 3
-
4 . 4
-
4 . 4
-
4 . 4
-
4 . 4
-
5 . 1
-
5 . 2
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
-
5 . 2
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
-
5 . 3
-
5 . 4
http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe
-
5 . 5
-
5 . 6
https://maven.apache.org/http://apache.stu.edu.tw/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.zip
-
5 . 7
-
5 . 8
http://www.eclipse.org/downloads/http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/mars/2/eclipse-jee-mars-2-win32-x86_64.zip
-
5 . 9
-
5 . 10
-
5 . 11
http://192.168.0.2/apps/e2-spk-v01/present/e2-spk-s01/assets/files/e2-spk-s01_java.zip
-
5 . 12
-
5 . 13
-
6 . 1
-
6 . 2
http://scala-ide.org/index.htmlhttp://scala-ide.org/download/sdk.html
-
6 . 3
-
6 . 4
-
6 . 5
http://192.168.0.2/apps/e2-spk-v01/present/e2-spk-s01/assets/files/e2-spk-s01_scala.zip
-
6 . 6
-
6 . 7
-
6 . 8
-
7 . 1
-
7 . 2
http://192.168.0.2/apps/e2-spk-v01/present/e2-spk-s01/spark.apache.orghttp://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
-
7 . 3
-
7 . 4
-
7 . 5
-
7 . 6
-
valdistData=sc.parallelize(Seq("eighty20","spark","traing","hello","world"))valresult_count=distData.count()println("Countresultis:"+result_count)
7 . 7
-
8 . 1
-
8 . 2
-
valdistData=sc.parallelize(Seq("eighty20","spark","traing","hello","world"))valresult_count=distData.count()println("Countresultis:"+result_count)
8 . 3
-
9 . 1
-
9 . 2
-
9 . 3
-
10 . 1
-
10 . 2
-
10 . 2
-
10 . 2
-
10 . 2
-
10 . 2
-
10 . 2
-
10 . 3
-
10 . 4
http://spark.apache.org/docs/latest/spark-standalone.html
-
10 . 5
http://spark.apache.org/docs/latest/running-on-mesos.htmlhttp://mesos.apache.org/
-
10 . 6
http://spark.apache.org/docs/latest/running-on-yarn.html
-
11 . 1
-
11 . 2
-
11 . 2
-
11 . 2
-
11 . 2
-
11 . 2
-
11 . 2
-
11 . 2
-
12 . 1
-
12 . 2
http://192.168.0.2/apps/e2-spk-v01/present/e2-spk-s01/index.html#/4
-
12 . 3
-
12 . 4
-
12 . 5
-
12 . 6
-
12 . 7
-
spark-submit\--classcc.eighty20.spark.s01.sc_00_helloworld\--masterlocal\e2spks01-0.0.1.jar
12 . 8
http://spark.apache.org/docs/latest/submitting-applications.html
-
spark-submit\--classcc.eighty20.spark.s01.sc_00_helloworld\--masterspark://192.168.0.2:7077\e2spks01-0.0.1.jar
12 . 9
-
13 . 1
-
13 . 2
http://192.168.0.2/apps/e2-spk-v01/present/e2-spk-s01/index.html#/5
-
13 . 3
-
13 . 4
-
13 . 5
-
13 . 6
-
13 . 7
-
spark-submit\--classcc.eighty20.spark.s01.sc_00_helloworld\--masterlocal\e2spks01-0.0.1.jar
13 . 8
http://spark.apache.org/docs/latest/submitting-applications.html
-
spark-submit\--classcc.eighty20.spark.s01.sc_00_helloworld\--masterspark://192.168.0.2:7077\e2spks01-0.0.1.jar
13 . 9
-
14 . 1
-
14 . 2
-
14 . 3
-
14 . 4
-
14 . 4
-
14 . 4
-
14 . 4
-
14 . 4
-
14 . 4
-
14 . 4
-
14 . 4
-
14 . 5
-
14 . 5
-
14 . 5
-
14 . 5
-
14 . 6
-
14 . 6
-
14 . 6
-
14 . 6
-
14 . 6
-
14 . 7
-
14 . 8
-
14 . 8
-
14 . 8
-
14 . 8
-
14 . 8
-
14 . 9
-
14 . 10
-
14 . 11
-
14 . 12
-
14 . 13
-
14 . 14
-
14 . 15
-
14 . 16
-
14 . 17
-
14 . 17
-
14 . 17
-
14 . 17
-
14 . 17
-
14 . 17
-
14 . 17
-
14 . 17
-
14 . 18
-
14 . 18
-
14 . 18
-
14 . 18
-
14 . 18
-
14 . 18
-
14 . 19
-
14 . 19
-
14 . 19
-
14 . 19
-
14 . 19
-
14 . 19
-
14 . 20
-
14 . 20
-
14 . 20
-
14 . 20
-
14 . 20
-
14 . 21
-
14 . 21
-
14 . 22
-
14 . 22
-
14 . 22
-
14 . 22
-
14 . 22
-
14 . 22
-
14 . 22
-
14 . 22
-
14 . 22
-
14 . 22
-
14 . 23
-
14 . 24
-
14 . 24
-
14 . 24
-
14 . 24
-
14 . 24
-
14 . 25
-
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
-
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
-
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
-
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
-
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
-
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
-
valr00=sc.parallelize(0to9)valr01=sc.parallelize(0to90by10)valr10=r00.cartesian(r01)valr11=r00.map(n=>(n,n))valr12=r00.zip(r01)valr13=r01.keyBy(_/20)valr20=Seq(r11,r12,r13).foldLeft(r10)(_union_)
14 . 27
-
valr00=sc.parallelize(0to9)valr01=sc.parallelize(0to90by10)valr10=r00.cartesian(r01)valr11=r00.map(n=>(n,n))valr12=r00.zip(r01)valr13=r01.keyBy(_/20)valr20=Seq(r11,r12,r13).foldLeft(r10)(_union_)
14 . 27
-
valr00=sc.parallelize(0to9)valr01=sc.parallelize(0to90by10)valr10=r00.cartesian(r01)valr11=r00.map(n=>(n,n))valr12=r00.zip(r01)valr13=r01.keyBy(_/20)valr20=Seq(r11,r12,r13).foldLeft(r10)(_union_)
14 . 27
-
14 . 28
-
14 . 28
-
14 . 28
-
14 . 28
-
14 . 28
-
15
-
16 . 1
-
packagecc.eighty20.spark.s01;importorg.apache.spark.SparkConf;importorg.apache.spark.api.java.JavaRDD;importorg.apache.spark.api.java.JavaSparkContext;publicclasssc_01_anatomy_driver{publicstaticvoidmain(String[]args){StringmasterURL="local[*]";//(1) SparkConfconf=newSparkConf()//(2).setAppName("sc_01_anatomy_driver").setMaster(masterURL); JavaSparkContextsc=newJavaSparkContext(conf);//(3) StringfileName=""; if(args.length>0&&args[0]!=null&&!args[0].isEmpty())//(4) fileName=args[0]; else fileName="pom.xml"; JavaRDDlines_rdd=sc.textFile(fileName);//(5) longlines_count=lines_rdd.count();//(6) System.out.printf("Thereare%slinesin%s\n" ,lines_count,fileName); sc.close();}}
16 . 2
-
StringmasterURL="local[*]";//(1)
16 . 3
-
SparkConfconf=newSparkConf()//(2).setAppName("sc_01_anatomy_driver").setMaster(masterURL);
16 . 4
-
JavaSparkContextsc=newJavaSparkContext(conf);//(3)
16 . 5
-
StringfileName="";if(args.length>0&&args[0]!=null&&!args[0].isEmpty())//(4)fileName=args[0];elsefileName="pom.xml"; JavaRDDlines_rdd=sc.textFile(fileName);//(5)longlines_count=lines_rdd.count();//(6)System.out.printf("Thereare%slinesin%s\n" ,lines_count,fileName);
16 . 6
-
16 . 7
-
17 . 1
-
packagecc.eighty20.spark.s01importorg.apache.spark.{SparkConf,SparkContext}objectsc_01_anatomy_driver{defmain(args:Array[String]){ valmasterURL="local[*]"//(1) valconf=newSparkConf()//(2) .setAppName("sc_01_anatomy_driver") .setMaster(masterURL) valsc=newSparkContext(conf)//(3) valfileName=util.Try(args(0)).getOrElse("pom.xml")//(4) vallines_rdd=sc.textFile(fileName).cache()//(5) vallines_count=lines_rdd.count()//(6) println(s"\nThereare$lines_countlinesin$fileName")}}
17 . 2
-
valmasterURL="local[*]"//(1)
17 . 3
-
valconf=newSparkConf()//(2) .setAppName("sc_01_anatomy_driver") .setMaster(masterURL)
17 . 4
-
valsc=newSparkContext(conf)//(3)
17 . 5
-
valfileName=util.Try(args(0)).getOrElse("pom.xml")//(4)vallines_rdd=sc.textFile(fileName).cache()//(5)vallines_count=lines_rdd.count()//(6)println(s"\nThereare$lines_countlinesin$fileName")
17 . 6
-
17 . 7
-
18
-
19 . 1
-
ERROR php:dyingforunknownreasonsWARN dave,areyouangryatme?ERROR didmysqljustbarf?WARN xylonsapproachingERROR mysqlcluster:replacewithsparkcluster
19 . 2
-
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")//transformedRDDsvalerrors=lines.filter(_.startsWith("ERROR"))valmessages=errors.map(_.split("\t")).map(r=>r(1)).cache()//action1valmysql_errors=messages.filter(_.contains("mysql")).count()//action2valphp_errors=messages.filter(_.contains("php")).count()
19 . 3
-
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")
19 . 4
-
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")
19 . 5
-
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")
19 . 6
-
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")
19 . 7
-
//transformedRDDsvalerrors=lines.filter(_.startsWith("ERROR"))valmessages=errors.map(_.split("\t")).map(r=>r(1)).cache()
19 . 8
-
//action1valmysql_errors=messages.filter(_.contains("mysql")).count()
19 . 9
-
//action2valphp_errors=messages.filter(_.contains("php")).count()
19 . 10
-
//action2valphp_errors=messages.filter(_.contains("php")).count()
19 . 11
-
//action2valphp_errors=messages.filter(_.contains("php")).count()
19 . 12
-
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")//transformedRDDsvalerrors=lines.filter(_.startsWith("ERROR"))valmessages=errors.map(_.split("\t")).map(r=>r(1)).cache()//action1valmysql_errors=messages.filter(_.contains("mysql")).count()//action2valphp_errors=messages.filter(_.contains("php")).count()
19 . 13
-
19 . 14
-
19 . 15
-
19 . 16
-
19 . 17
-
19 . 18
-
20 . 1
-
#ApacheSparkSparkisafastandgeneralclustercomputingsystemforBigData.Itprovideshigh-levelAPIsinScala,Java,Python,andR,andanoptimizedenginethatsupportsgeneralcomputationgraphsfordataanalysis.Italsosupportsarichsetofhigher-leveltoolsincludingSparkSQLforSQLandDataFrames,MLlibformachinelearning,GraphXforgraphprocessing,andSparkStreamingforstreamprocessing.
##OnlineDocumentationYoucanfindthelatestSparkdocumentation,includingaprogrammingguide,onthe[projectwebpage](http://spark.apache.org/documentation.html)and[projectwiki](https://cwiki.apache.org/confluence/display/SPARK).ThisREADMEfileonlycontainsbasicsetupinstructions.##BuildingSpark...
20 . 2
-
valtopN=10valfileName="hdfs://log_file_path/README.md"//RDDcreationfromexternaldatasourcevaldocs=sc.textFile(fileName)//Splitlinesintowordsvallower=docs.map(line=>line.toLowerCase())valwords=lower.flatMap(line=>line.split("\\s+"))valcounts=words.map(word=>(word,1))//Countallwords(automaticcombination)valfreq=counts.reduceByKey(_+_)//Swaptuplesandgettopresultsvaltop=freq.map(_.swap).top(topN)top.foreach(println)
20 . 3
-
20 . 4
-
20 . 5
-
20 . 6
-
20 . 7
-
20 . 8
-
20 . 9
-
20 . 10
-
20 . 11
-
20 . 12
-
20 . 13
-
20 . 14
-
20 . 15
-
20 . 16
-
20 . 17
-
20 . 18
-
20 . 19
-
20 . 20
-
20 . 21
-
20 . 22
-
20 . 23
-
21