01 course overview - cs.sjtu.edu.cnyshen/courses/bigdata/01 course overview.pdf · 3...
TRANSCRIPT
1
Spring 2017
§ Instructor:Yao SHEN(沈耀)§ Email:yshenATcs.sjtu.edu.cn§ Office:SEIEEBuilding 3-535
§ Coursewebsite:§ http://www.cs.sjtu.edu.cn/~yshen/courses/BigData/
§ TeachingAssistant:§ 沈国栋 [email protected]§ 陈健 [email protected]
2
2
§AnandRajaramanandJeffreyD.Ullman.MiningofMassiveDatasets.CambridgeUniversityPress, 2011.
Youcandownloaditfromthebookwebsite(http://www.mmds.org/).
3
§ JiaweiHan,andMichelineKamber.DataMining:ConceptsandTechniques.MorganKaufmann,SecondEdition,2006.
§ ChristopherM.Bishop.PatternRecognitionandMachineLearning.Springer,2006.
§ ChuckLam.HadoopinAction.ManningPublications,FirstEdition,2010.
§ Holden Karau, Andy Konwinshi, Patrick Wendell, Matei Zaharia,Learning Spark, O’REILLY, 2015.
§ Nick Pentreath. Machine Learning with Spark. Packt Publishing, 2015.
4
3
§ Introduction: Data-IntensiveScalableComputing(DISC) & Data Mining
§ Parallel & Distributed Computing (esp.CloudComputing)§ OpenMP, Pthreads, MPI§ MapReduce (Hadoop) and Spark
§ DataMiningandMachineLearning§ Association rules, Latent semantic indexing, Dimensionality Reduction§ Clustering, Supervisedlearning
§ Data-IntensiveApplications§ Search,linkanalysis,recommendersystems,advertisingonWeb
§ Extra-Topics§ From data processing companies
5
§ Datastructure
§ Designandanalysisofalgorithms
§ Linearalgebra
§ Probabilitytheory
§ Programminglanguages:Java,c++
6
4
§ Homework (40%)
§ Final exam(60%)
7
§ Homework is due on the assigned date.
§ Late submissions of homework or project will receive partial or no credit.§ 20% penalty for per day.§ NOT accepted 72 hours after the due date.
8