il cdlm in data sciencedatascience.disco.unimib.it/wp-content/...in-data-science@bicocca... · in...
TRANSCRIPT
Master Degree in Data Science
University of Milano Bicocca Start: October 2017
1
We are at the dawn of a new revolution in the Information Age: data
Every animate and inanimate object on Earth will soon be generating data”
(Smolan and Erwitt, 2013)
2
Relevance of Digital Data in Research, Economy, eGovernment and Society
•Data Big Data: change of paradigm
•Fast growth of jobs in «informative services», services based on digital data (30% of overall jobs in 2020, 65% in 2050)
•Digital economy
•Social and ethical issues in Data Science
3
Estimated world (pre-1800) and then U.S. Labor Percentages by Sector
4
0
20
40
60
80
100
120
20000
00 Y
A
20000
YA
10000
YA
2000
YA180
0185
0190
0195
0200
0205
0
Services (Info)
Services (Other)
Industry (Goods)
Agriculture
Hunter-Gatherer
Estimations based on Porat, M. (1977) Info Economy: Definitions and Measurement
Traditional Analysis Life cycle vs new Analysis life cycle of digital (big) data
5
Source Selection & Extraction
S E M A N T I C S
Q U A L I T Y
L E A R N I N G
V A L U E
Storage
Integration
Analysis
Visualization
Extract
Transform
Load
Life cycle
Life cycle Cross cutting activities
Four V’s of Big Data
•Volume
•Velocity
•Variety
•Value
6
Change of Paradigm…in Data Management Systems
7
SQL + Traditional DBMS
Volume
Velocity
Small Data
Big Data
Streaming data
Long-term changing data
NoSQL + Hadoop + MapReduce
(plus: distributed file system)
Spark (plus: in-memory processing)
Hadoop & Spark
Change of Paradigm… in Machine Learning Techniques
8
Non parametric models, model/variable selection and
dimension reduction
Handcrafted time series models based
on linear filters
Dynamic factor models, dimension reduction, automated modelling
Volume
Velocity
Small Data
Big Data
Streaming data
Long-term changing data
Generative models and parametric
inference
Relevance of Data Science at University of Milano Bicocca
Many disciplines that make intensive usage of data and apply data management & analysis techniques and technologies • Informatics and Statistics
• Sciences • Life sciences • Physics • Environment • Materials • Social
• Economics • Macro and Micro Economics • Finance • Marketing
9
10
What do Data Scientists do?
11
What do data scientist do?
Make discoveries while swimming in data. It’s their preferred method of navigating the world around them.
At ease in the digital realm, they are able to bring structure to large quantities of formless data and make analysis possible.
They identify rich data sources, join them with other, potentially incomplete data sources, and clean the resulting set.
In a competitive landscape where challenges keep changing and data never stop flowing, data scientists help decision makers shift from ad hoc analysis to an ongoing conversation with data.
12
Professional profiles
•Technological Data Scientist: •Applies, adapts and extends statistical techniques and
computer science technologies providing effective analyses for decision, operational or research problems.
•Performs high level architectural design of services based on digital data.
•Business Data Scientist: •Finds solutions based on statistical techniques and
computer science technologies to enhance value of decisions and value of business processes in companies and public administrations
•Conceives new services based on digital data, which optimize value in use for customers and value in exchange for service providers.
13
Data Scientist in the World
14
Source: Glassdoor 25 Best Jobs in America
15
Industry…
16 Source: O’Really Data Science Survey 2016
Job Opportunities in Italy – Research of CRISP-Bicocca about 2.500 job offers for Data Scientist – Big Data
Steep growth in the last year
17
50% of job offers in Lombardy Region
18
Course organization and required skills
Analytical Track
Business Track
Degreein..
Degreein..
Degreein..
Degreein..
Degreein..
At least 30 credits in informatics and/or statistics and/or mathematics and/or physics
Basics in Informatics* Basics in Statistics*
Advanced techniques & technologies
Lab course in generic domain
Lab in vertical domains
...
First year
Second year
* If needed
CYB – Cybersecurity for data science
IP – Signal and image processing
TIDS – Technological infra-structures for data science
1 among 3
First year
DM&DV - Data management
and visualization
STDA – Statistical modelling
MLDM – Machine Learning &
Decision Models
JSI – Juridical & Social Issues in
Information Society
DSL1 – Data Science Lab 1
Second year
BDBF – - Big data in Business and Finance
BDBP - Big data in Behavioural Psycology
BDPHe – - Big Data in Public Health
BDPS - Big Data in Public and Social Services
BDGIS – Big Data in Geo-graphical Information Systems
BDPhis - Big data management and analysis in physics research
BDB&B – Big data in biotechnology & biosciences
MSBD - Making sense of biological data
BDM1 – - Big Data in Health Care
BDM2 - Medical imaging & big data
IL – Industry Lab
WM&CM – Web marketing &
Communication Management
BS – Basics
in Statistics
1 among 2
Data Science Lab in Environment & Physics
Data Science Lab in biosciences
Data Science Lab in Medicine
SMA – Social Media Analytics
BI - Business Intelligence
1 among 3
SS –Service Science
1 among 3
DS – Data Semantics
BS – Basics in Informatics
1 among 3
IS – Information Systems
Busi- ness Track
TMS – Text mining
and search
ASM – Advanced statistical methodologies for Big Data
SDM – Streaming data management and
time series analysis
1 among 3
EDS – Economics for Data Science
Ana lytical track
Data Science Lab in Business & Marketing
Data Science Lab in Public Policies & Services
1 among 4
Courses
Data Science Lab
CYB – Cybersecurity for data science
IP – Signal and image processing
TIDS – Technological infra-structures for data science
1 among 3
First year
DM&DV - Data management
and visualization
STDA – Statistical modelling
MLDM – Machine Learning &
Decision Models
JSI – Juridical & Social Issues in
Information Society
DSL1 – Data Science Lab 1
Second year
BDBF – - Big data in Business and Finance
BDBP - Big data in Behavioural Psycology
BDPHe – - Big Data in Public Health
BDPS - Big Data in Public and Social Services
BDGIS – Big Data in Geo-graphical Information Systems
BDPhis - Big data management and analysis in physics research
BDB&B – Big data in biotechnology & biosciences
MSBD - Making sense of biological data
BDM1 – - Big Data in Health Care
BDM2 - Medical imaging & big data
IL – Industry Lab
WM&CM – Web marketing &
Communication Management
BS – Basics
in Statistics
1 among 2
Data Science Lab in Environment & Physics
Data Science Lab in biosciences
Data Science Lab in Medicine
SMA – Social Media Analytics
BI - Business Intelligence
1 among 3
SS –Service Science
1 among 3
DS – Data Semantics
BS – Basics in Informatics
1 among 3
IS – Information Systems
Busi- ness Track
TMS – Text mining
and search
ASM – Advanced statistical methodologies for Big Data
SDM – Streaming data management and
time series analysis
1 among 3
EDS – Economics for Data Science
Ana lytical track
Data Science Lab in Business & Marketing
Data Science Lab in Public Policies & Services
1 among 4 The four Vs: 1. VOLume 2. VELocity 3. VARiety 4. VALue
Data Science Lab
VAL
VOL
VAL
VAL
VOL
VOL
VEL
VEL
VAR
VEL
VOL
VOL
VEL
VAL
VAL
VAL
VOL
VAL
VAL
VEL