combining various data sources

15
COMBINING VARIOUS DATA SOURCES Case: The weather and net blotch 1 FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä 22.9.2015

Upload: cleenltd

Post on 08-Jan-2017

353 views

Category:

Environment


0 download

TRANSCRIPT

Page 1: Combining Various Data Sources

COMBINING VARIOUS DATA SOURCES

Case: The weather and net blotch

1

FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä 22.9.2015

Page 2: Combining Various Data Sources

INTRODUCTION

• Combining the open and closed data• Challenges

• Case from agriculture• Correlation between the weather conditions and existing of net

blotch, one plant disease (barley)

• Data collection and analysis

• Results

• Potential usage of the results

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

2

Page 3: Combining Various Data Sources

DATA, WHERE?

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

3

• A lot of data collected every day

• Weather conditions

• Trafic

• Consumer habits

• Accidents

• …

• Measurements from industry, agriculture, health care and so on

• Availability of data differs• Open and close data

Page 4: Combining Various Data Sources

CASE; NET BLOTCH AND WEATHER

• Aim to find out weather conditions which predicts the

appearance of the net blotch

• Finnish Meteorological Institute, open data• Data collection

• Natural Resources Institute Finland, measurements, close data• Data ready to use in xls-format

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

4

Data from

Natural

Resources

Institute Finland

Data analysis

Data from

Finnish

Meteorological

Institute

New information

Page 5: Combining Various Data Sources

NATURAL RESOURCES INSTITUTE FINLAND, NET

BLOTCH DATA

• Measurements from different places in different years

• Information:• No net blotch = 0

• A lot of net blotch = 5

• The rest = something between

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

5

1980 2000 20200

1

2Jokioinen

1990 2000 20100

2

4Mikkeli

1980 2000 20200

2

4Ruukki

1980 2000 20200

2

4Ylistaro

Page 6: Combining Various Data Sources

FMI, OPEN DATA

• Data available for everyone

• Requires registration

• User can select the measurement station and listed variables

• Some challenges with usability• Queries in the internet, data in xml-format

• Data includes mixed up letters and numbers

• Time limit: Data can be loaded for only one week maximum• Vs. the earliest net blotch data is from 1991

• Maybe easier to use with R-code?

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

6

Page 7: Combining Various Data Sources

EXAMPLE, FMI DATA, ONE QUERY

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

7

Page 8: Combining Various Data Sources

EXAMPLE, FMI DATA OPENED IN EXCEL

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

8

Page 9: Combining Various Data Sources

FMI DOWNLOADER

• Simple user interface for downloading FMI data

• Available from internet:• http://tumetsu.github.io/FMI-weather-downloader

• Developed by Tuomas Salmi• [email protected]

• Fast and easy to use

• Data in csv-format => usable

• Can’t load every variables• A place and date, rainfall per day,

temperature; minimum, maximum and average per day

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

9

Page 10: Combining Various Data Sources

DATA ANALYSIS

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

10

• The weather data were groupped according the place and net

blotch• Ruukki:

• First group – no net blotch

• Second group – a small amount of net blotch

• Third group – a big amount of net blotch

• Data timing according to growing season => The first point of

the weather data is the beginning of the growing season – in

every data set

• Analysis of the variables from the weather data: What is

different between the years groupped according to net blotch

Page 11: Combining Various Data Sources

DATA ANALYSIS

• Example, Ruukki

• Rainfall per day

• No net blotch

A little amount of

net blotch

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

11

0 20 40 60 80 100 120 1400

5

10

15

20

25

30

35Ruukki, no net blotch, 2011

Time [day]

Rain

fall

per

day

0 20 40 60 80 100 120 1400

5

10

15

20

25

30

35Ruukki, a little amount of net blotch, 2014

Time [day]

Rain

fall

per

day

Page 12: Combining Various Data Sources

DATA ANALYSIS

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

12

• Feature; cumulative sum of the rainfall per day• Green = years with no net blotch, red = years with net blotch

• Feature generation• Combining variables with different mathematical operations

• Ranking the generated features and selecting the suitable one

1 2 3 4 5 6 7 8 90

50

100

150

200

250

300

Ruukki

Time [day]

Rain

fall

per

day,

cum

ula

tive s

um

Page 13: Combining Various Data Sources

1 2 3 4 5 6 7 8 90

50

100

150

200

Ruukki

Time [day]

Cum

ula

tive

su

m o

f g

en

era

ted

fe

atu

re

DATA ANALYSIS

• Feature; (ln(z)+ln(x))*ln(z);• z = minimum daily temperature

• x = rainfall per day• Used algorithm from Ruusunen M.: Signal Correlations in Biomass Combustion - an

Information Theoretic Analysis. Acta Univ Oulu C 459, 2013. PhD Thesis

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

13

Page 14: Combining Various Data Sources

THE USE OF RESULTS

• Predict the appearance of the net blotch

• Optimize the use of chemical plant protectants

• Avoid the unnecessary spraying of the fields

• Positive effects to crop

• Less chemical => save money and nature

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

14

Page 15: Combining Various Data Sources

THANK YOU!

22.9.2015FACULTY OF TECHNOLOGY/ Control Engineering/ Outi Mäyrä

15

• Picture from Bachelor’s thesis of Ville Ruohonen (2015) Disease

resistance of the new spring cereal varieties