il cdlm in data sciencedatascience.disco.unimib.it/wp-content/...in-data-science@bicocca... · in...

21
Master Degree in Data Science University of Milano Bicocca Start: October 2017 1

Upload: donhu

Post on 27-Feb-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Master Degree in Data Science

University of Milano Bicocca Start: October 2017

1

Page 2: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

We are at the dawn of a new revolution in the Information Age: data

Every animate and inanimate object on Earth will soon be generating data”

(Smolan and Erwitt, 2013)

2

Page 3: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Relevance of Digital Data in Research, Economy, eGovernment and Society

•Data Big Data: change of paradigm

•Fast growth of jobs in «informative services», services based on digital data (30% of overall jobs in 2020, 65% in 2050)

•Digital economy

•Social and ethical issues in Data Science

3

Page 4: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Estimated world (pre-1800) and then U.S. Labor Percentages by Sector

4

0

20

40

60

80

100

120

20000

00 Y

A

20000

YA

10000

YA

2000

YA180

0185

0190

0195

0200

0205

0

Services (Info)

Services (Other)

Industry (Goods)

Agriculture

Hunter-Gatherer

Estimations based on Porat, M. (1977) Info Economy: Definitions and Measurement

Page 5: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Traditional Analysis Life cycle vs new Analysis life cycle of digital (big) data

5

Source Selection & Extraction

S E M A N T I C S

Q U A L I T Y

L E A R N I N G

V A L U E

Storage

Integration

Analysis

Visualization

Extract

Transform

Load

Life cycle

Life cycle Cross cutting activities

Page 6: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Four V’s of Big Data

•Volume

•Velocity

•Variety

•Value

6

Page 7: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Change of Paradigm…in Data Management Systems

7

SQL + Traditional DBMS

Volume

Velocity

Small Data

Big Data

Streaming data

Long-term changing data

NoSQL + Hadoop + MapReduce

(plus: distributed file system)

Spark (plus: in-memory processing)

Hadoop & Spark

Page 8: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Change of Paradigm… in Machine Learning Techniques

8

Non parametric models, model/variable selection and

dimension reduction

Handcrafted time series models based

on linear filters

Dynamic factor models, dimension reduction, automated modelling

Volume

Velocity

Small Data

Big Data

Streaming data

Long-term changing data

Generative models and parametric

inference

Page 9: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Relevance of Data Science at University of Milano Bicocca

Many disciplines that make intensive usage of data and apply data management & analysis techniques and technologies • Informatics and Statistics

• Sciences • Life sciences • Physics • Environment • Materials • Social

• Economics • Macro and Micro Economics • Finance • Marketing

9

Page 10: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

10

Page 11: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

What do Data Scientists do?

11

Page 12: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

What do data scientist do?

Make discoveries while swimming in data. It’s their preferred method of navigating the world around them.

At ease in the digital realm, they are able to bring structure to large quantities of formless data and make analysis possible.

They identify rich data sources, join them with other, potentially incomplete data sources, and clean the resulting set.

In a competitive landscape where challenges keep changing and data never stop flowing, data scientists help decision makers shift from ad hoc analysis to an ongoing conversation with data.

12

Page 13: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Professional profiles

•Technological Data Scientist: •Applies, adapts and extends statistical techniques and

computer science technologies providing effective analyses for decision, operational or research problems.

•Performs high level architectural design of services based on digital data.

•Business Data Scientist: •Finds solutions based on statistical techniques and

computer science technologies to enhance value of decisions and value of business processes in companies and public administrations

•Conceives new services based on digital data, which optimize value in use for customers and value in exchange for service providers.

13

Page 14: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Data Scientist in the World

14

Page 15: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Source: Glassdoor 25 Best Jobs in America

15

Page 16: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Industry…

16 Source: O’Really Data Science Survey 2016

Page 17: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Job Opportunities in Italy – Research of CRISP-Bicocca about 2.500 job offers for Data Scientist – Big Data

Steep growth in the last year

17

Page 18: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

50% of job offers in Lombardy Region

18

Page 19: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

Course organization and required skills

Analytical Track

Business Track

Degreein..

Degreein..

Degreein..

Degreein..

Degreein..

At least 30 credits in informatics and/or statistics and/or mathematics and/or physics

Basics in Informatics* Basics in Statistics*

Advanced techniques & technologies

Lab course in generic domain

Lab in vertical domains

...

First year

Second year

* If needed

Page 20: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

CYB – Cybersecurity for data science

IP – Signal and image processing

TIDS – Technological infra-structures for data science

1 among 3

First year

DM&DV - Data management

and visualization

STDA – Statistical modelling

MLDM – Machine Learning &

Decision Models

JSI – Juridical & Social Issues in

Information Society

DSL1 – Data Science Lab 1

Second year

BDBF – - Big data in Business and Finance

BDBP - Big data in Behavioural Psycology

BDPHe – - Big Data in Public Health

BDPS - Big Data in Public and Social Services

BDGIS – Big Data in Geo-graphical Information Systems

BDPhis - Big data management and analysis in physics research

BDB&B – Big data in biotechnology & biosciences

MSBD - Making sense of biological data

BDM1 – - Big Data in Health Care

BDM2 - Medical imaging & big data

IL – Industry Lab

WM&CM – Web marketing &

Communication Management

BS – Basics

in Statistics

1 among 2

Data Science Lab in Environment & Physics

Data Science Lab in biosciences

Data Science Lab in Medicine

SMA – Social Media Analytics

BI - Business Intelligence

1 among 3

SS –Service Science

1 among 3

DS – Data Semantics

BS – Basics in Informatics

1 among 3

IS – Information Systems

Busi- ness Track

TMS – Text mining

and search

ASM – Advanced statistical methodologies for Big Data

SDM – Streaming data management and

time series analysis

1 among 3

EDS – Economics for Data Science

Ana lytical track

Data Science Lab in Business & Marketing

Data Science Lab in Public Policies & Services

1 among 4

Courses

Data Science Lab

Page 21: Il CdLM in Data Sciencedatascience.disco.unimib.it/wp-content/...in-Data-Science@Bicocca... · in Data Science University of Milano ... SQL + Traditional DBMS Volume Velocity Small

CYB – Cybersecurity for data science

IP – Signal and image processing

TIDS – Technological infra-structures for data science

1 among 3

First year

DM&DV - Data management

and visualization

STDA – Statistical modelling

MLDM – Machine Learning &

Decision Models

JSI – Juridical & Social Issues in

Information Society

DSL1 – Data Science Lab 1

Second year

BDBF – - Big data in Business and Finance

BDBP - Big data in Behavioural Psycology

BDPHe – - Big Data in Public Health

BDPS - Big Data in Public and Social Services

BDGIS – Big Data in Geo-graphical Information Systems

BDPhis - Big data management and analysis in physics research

BDB&B – Big data in biotechnology & biosciences

MSBD - Making sense of biological data

BDM1 – - Big Data in Health Care

BDM2 - Medical imaging & big data

IL – Industry Lab

WM&CM – Web marketing &

Communication Management

BS – Basics

in Statistics

1 among 2

Data Science Lab in Environment & Physics

Data Science Lab in biosciences

Data Science Lab in Medicine

SMA – Social Media Analytics

BI - Business Intelligence

1 among 3

SS –Service Science

1 among 3

DS – Data Semantics

BS – Basics in Informatics

1 among 3

IS – Information Systems

Busi- ness Track

TMS – Text mining

and search

ASM – Advanced statistical methodologies for Big Data

SDM – Streaming data management and

time series analysis

1 among 3

EDS – Economics for Data Science

Ana lytical track

Data Science Lab in Business & Marketing

Data Science Lab in Public Policies & Services

1 among 4 The four Vs: 1. VOLume 2. VELocity 3. VARiety 4. VALue

Data Science Lab

VAL

VOL

VAL

VAL

VOL

VOL

VEL

VEL

VAR

VEL

VOL

VOL

VEL

VAL

VAL

VAL

VOL

VAL

VAL

VEL