disclaimers-space.snu.ac.kr/bitstream/10371/168099/1/0000001589… · prediction among artificial...

저 시-비 리- 경 지 2.0 한민

는 아래 조건 르는 경 에 한하여 게

l 저 물 복제, 포, 전송, 전시, 공연 송할 수 습니다.

다 과 같 조건 라야 합니다:

l 하는, 저 물 나 포 경 , 저 물에 적 된 허락조건 명확하게 나타내어야 합니다.

l 저 터 허가를 면 러한 조건들 적 되지 않습니다.

저 에 른 리는 내 에 하여 향 지 않습니다.

것 허락규약(Legal Code) 해하 쉽게 약한 것 니다.

Disclaimer

저 시. 하는 원저 를 시하여야 합니다.

비 리. 하는 저 물 리 목적 할 수 없습니다.

경 지. 하는 저 물 개 , 형 또는 가공할 수 없습니다.

http://creativecommons.org/licenses/by-nc-nd/2.0/kr/legalcode

http://creativecommons.org/licenses/by-nc-nd/2.0/kr/

PhD Dissertation of Engineering

Prediction Model of Soil Drought

Distribution Using Machine Learning

Algorithms and Geospatial Data

February 2020

Graduate School of Seoul National University Interdisciplinary Program in Landscape Architecture

HaeKyung Park

ABSTRACT

Prediction Model of Soil Drought Distribution Using Machine Learning Algorithms and Geospatial Data

HaeKyung Park

Interdisciplinary Program in Landscape Architecture,

Graduate School, Seoul National University

Supervised by Professor DongKun Lee

This study presents the entire process of establishing a water

management policy based on scientific methods through drought

prediction. Accordingly, this thesis includes the development of the

severe drought area prediction (SDAP) model, verification of the

algorithms used in the proposed model, and an application of the

proposed model to policymaking. The core technology of the SDAP

model is the convergence of machine learning and geospatial science,

which makes it possible to use geospatial data instead of tabular data to

visualize prediction results.

The SDAP model was developed by considering the importance

and difficulty of forecasting short-term droughts and the allocation of

priorities for rapid water supply in terms of the related policy. The

background of the model’s development began from the fact that, despite

the advancements of science and technology, it has become more

difficult to predict the probability of precipitation and prepare for

droughts based on this probability due to the increasingly abnormal

climate that has been associated with global warming. In fact, during

2015–17, Korea suffered from unpredictable, severe spring droughts.

Such droughts are predicted to increase due to the fluctuation of

precipitation as warming increases according to “Global Warming

1.5 °C”, a special report by the Intergovernmental Panel on Climate

Change (IPCC). Thus, water management policy has become

increasingly important over recent decades. In particular, rural areas are

directly adversely affected during drought periods; hence, if soil

droughts are not quickly resolved, crop damage could affect economic

inflation and human life. The US National Drought Mitigation Center

has recommended that water supply priorities be set before the

occurrence of droughts and that they be implemented immediately when

a drought commences in order to minimize damage. However, accurate

drought predictions that are based on probabilistic precipitation are

difficult because short-term droughts (i.e., lasting from several weeks to

months) are at the boundary between weather and climate.

The SDAP model can estimate the spatial distribution of a soil

drought in advance by assuming the subsequent lack of rainfall over the

short-term as opposed to the “yes/no” prediction of a drought. The

characteristics of the SDAP model enable it to predict future droughts by

training the actual past droughts by machine learning using satellite

imagery and topographic data without precipitation data. Prediction

results by the SDAP model therefore assist in the selection of water

supply priority areas through the provision of visualized maps of the

relative severity of a soil drought. In addition, overlaying these maps

with water resources (reservoirs and groundwater) or land use maps can

also help to rearrange priorities in consideration of local conditions.

The study area in this research is the Gyeonggi Province, a

southern metropolitan area in Korea, which has experienced droughts

that are understood to be related to climate change. Python was used as

the programming language to develop the SDAP model. Each chapter of

this thesis consists of stand-alone papers with subtitles as follows.

Chapter 1: “Prediction of Severe Drought Area Based on

Random Forest: Using Satellite Image and Topography Data”. This

chapter covers the details of the SDAP model design, the consideration

of training areas, and the model’s coverage, advantages, and limitations.

The distribution of a soil drought is expressed as the soil moisture index

(SMI) with a float number type between 0 and 1. The model development

began with the idea that machine learning might allow training of the

mechanisms between soil moisture and surface environments (e.g.,

vegetation, topography, water, and temperature) during a drought.

Fifteen input variables corresponding to the surface environment were

generated using Landsat-8 imagery and a digital elevation model (DEM).

The training method belongs to supervised learning because it uses the

SMI after a period of 3 months without rainfall as output variables. As a

result, the trained drought (R2 = 0.91) predicted the SMI distribution after

3 months of no rain with a performance of R2 = 0.58 using current

landsat8 images and a DEM of the study area. The predicted soil

droughts were somewhat lower than the training performance, but the

spatial patterns were similar to the actual SMI after the actual droughts.

Thus, the SDAP model could predict the areas potentially more severely

affected when there was a drought.

Chapter 2: “Tree vs. Network: Which Is Better Machine Learning

Algorithm for Regression Prediction When Using Remote Sensing Data?”

In this chapter, in order to verify the random forest algorithm in the

SDAP model, the method is compared with the multi-layer perceptron

method, which is well-known as a non-linear regression method for

prediction among artificial neural network algorithms. Furthermore, an

attempt is made to decipher the reason for the random forest mainstream

in the remote sensing field, which is unlike other fields. For this reason,

15 training variables were divided into groups according to the data type,

and the training performance was compared. As a result, the analysis

showed a lesser performance when using data groups with either a too

small (-1 to 1) or large (0 to 10,000) range (e.g., map indexes or

reflectance values from satellite images) because the multi-layer

perceptron based on neural networks was sensitive to data ranges. In

contrast, the random forest algorithm performed better because it worked

based on a decision tree that was independent of data range or units.

Therefore, the random forest algorithm was verified to be more suitable

in the SDAP model that uses satellite image and machine learning.

Chapter 3: “Is Water Pricing Policy Adequate to Reduce Water

Demand for Drought Mitigation in Korea?”. In this chapter, the process

of planning policy by applying the SDAP model is suggested in detail.

Two models were independent: the SDAP model predicted the drought

severity and a system dynamics model simulated water-use savings by

assuming a water price increase that targeted the severe drought area.

Three policy scenarios were simulated using data that included

individual daily water usage, population, the water source location, tap

water prices, and the residential water price elasticity index. The results

showed that the visible effect was after 3 months of implementation,

which means that there were few water savings during droughts that

required a water supply, and that policy implementation was not

effective. If the price base policy is not effective as in Korea, it suggests

that the water control policy needs to be supplemented by extending the

system dynamics model to include the non-price factor.

In conclusion, appropriate analytical methods should be used for

the given purpose of a study. Traditional statistics such as linear

regression are useful for understanding causality, and machine learning

is powerful for predictive purposes. Therefore, the black box problem of

machine learning is not an important consideration because the SDAP

model aims to establish a policy based on prediction rather than the cause

and understanding of droughts. Also, system dynamics is a time-efficient

method when considering characteristics that take a long time to confirm

the effect of policy. Although there is a dispute over the validation of the

simulation results by system dynamics, no means exist to completely

verify the real world. Thus, the trend is worth mentioning even though

the resultant value is excluded. The results in Chapter 2 are meaningful

in terms of the fact that the random forest method can be first considered

using satellite imagery and machine learning for regression prediction.

In Chapter 3, the application of the model is meaningful because it shows

the entire process from the use of the model to the establishment of the

water management policy. While existing models or analyses have stated

only simple policy applicability, this thesis is significant because it

shows the whole policy-making process based on scientific methods.

Keywords: machine learning, random forest, artificial neural networks,

prediction, system dynamics, policy simulation, drought mitigation

Student number: 2016-32140

Publications

Please note that Chapters 1-3 of this dissertation proposal were written as stand-alone papers (see below), and therefore there is some repetition in the methods and results.

Chapter 1

Park, Haekyung, Kyungmin Kim, and Dong Kun Lee. “Prediction of

Severe Drought Area Based on Random Forest: Using Satellite Image

and Topography Data.” Water 11, no. 4 (2019). https://doi.org/10.3390/w11040705.

(Published, Patent No. 10-2019-0053212)

Chapter 2

Park, Haekyung and Dong Kun Lee. “Comparison of Prediction

Performance for Soil Drought Distribution Using Satellite Image:

Random Forest (RF) and Multi-layer Perceptron (MLP)”

(Submission in progress)

Chapter 3

Park, Haekyung, and Dong Kun Lee. “Is Water Pricing Policy Adequate

to Reduce Water Demand for Drought Mitigation in Korea?” Water 11,

no. 6 (2019): 1256. https://doi.org/10.3390/w11061256.

(Published)

Contents INTRODUCTION

Chapter 1. MODEL DESIGN

Subtitle: Prediction of Severe Drought Area Based on Random Forest

using Satellite Image and Topography Data

1. Introduction

2. Methodology

3. Results

4. Discussion

5. Conclusion

Chapter 2. MODEL ALGORITHM

Subtitle: Tree vs. Network: Which is better Machine Learning Algorithm

for Regression Prediction when using Remote Sensing Data?

1. Introduction

2. Methodology

3. Results

4. Discussion

5. Conclusion

Chapter 3. APPLICATION to POLICY

Subtitle: Is Water Pricing Policy Adequate to Reduce Water Demand for

Drought Mitigation in Korea?

1. Introduction

2. Methodology

3. Results

4. Discussion

5. Conclusion

CONCLUSION

REFERENCES

List of Tables Chapter 1

Table 1. Feature categories and their input variables. .....................

Table 2. Evaluation of the training performance of the drought function.

..........................................................................................................

Table 3. The importance of land surface factor on drought function.

..........................................................................................................

Chapter 2

Table 1. Data description .................................................................

Table 2. Hyperparameter ranges for RF and MLP regression. ........

Table 3. Training groups according to data characteristics. ............

Table 4. Optimal hyperparameters of machine learning using RF and

MLP ..................................................................................................

Chapter 3

Table 1. Variables of the SDAP model for prediction severe drought area.

..........................................................................................................

Table 2. Variables of the SD model for policy simulation. .............

Table 3. Quantity of available water for drought mitigation. ..........

List of Figures Chapter 1

Figure 1. Study and training areas for short-term agricultural drought

prediction. .........................................................................................

Figure 2. Structure of the severe drought area prediction (SDAP) model.

xti: input variable of the training area; y: actual soil moisture index (SMI);

xpi: input variable of the study area; and �p: predicted SMI of the study

area....................................................................................................

Figure 3. Verification of the training performance and error distribution.

..........................................................................................................

Figure 4. Feature importance of drought function. ..........................

Figure 5. Final predicted SMI map and actual SMI. .......................

Figure 6. Scatter plot between the actual and the predicted SMI of the

study area. .........................................................................................

Figure 7. (a) Relation between the actual and predicted SMI over an area

150 times (8249 km2) larger than the training area; (b) Distance from the

training area and distribution of erroneous samples. ........................

Chapter 2

Figure 1. SDAP model structure and analysis sequence in this study.

..........................................................................................................

Figure 2. Study and training area to estimate soil drought distribution.

..........................................................................................................

Figure 3. Structure of RF tree for regression. ..................................

Figure 4. Structure of MLP for regression.......................................

Figure 5. Training performance of regression model using RF and MLP

..........................................................................................................

Figure 6. Drought prediction performance using RF and MLP.......

Figure 7. Actual drought and predictions using RF and MLP .........

Figure 8. Comparison of training performance according to groups with

different data. ....................................................................................

Chapter 3

Figure 1. Location of the study area. ...............................................

Figure 2. Framework of linked the severe drought area prediction

(SDAP) model and the system dynamics (SD) model. ....................

Figure 3. Structure of the SDAP Model. .........................................

Figure 4. Structure of the SD model ................................................

Figure 5. Predicted agricultural drought map (excluding for the

impervious areas) after non-rainfall during the three-month period from

22 March 2018 to 26 June 2018. ......................................................

Figure 6. Changes in the amount of the water demand by increased water

rate. ...................................................................................................

Figure 7. The SD model included price and non-price factors for saving

residential water. ...............................................................................

According to “Global Warming 1.5 °C”, the special report of the

Intergovernmental Panel on Climate Change (IPCC), as warming

progresses, abnormal weather and precipitation fluctuations occur [1];

thus, flood and drought will increase. Such events are already occurring

worldwide. For example, during 2012–2016, California suffered an

unprecedented drought [2], and during 2015–2017, Korea also

experienced damage from an unexpected drought [3].

During drought conditions, rural areas face considerable

problems that are not experienced by urban areas [4]. Hence, if soil

droughts are not quickly resolved, crop damage could affect economic

inflation and human life [5]. The US National Drought Mitigation Center

has recommended that water supply priorities be set before the

occurrence of droughts and that they be implemented immediately when

a drought commences in order to minimize damage [6]. However,

accurate drought predictions that are based on probabilistic precipitation

are difficult because short-term droughts (i.e., lasting from several weeks

to months) are at the boundary between weather and climate [7]. The

SDAP model was developed by considering the importance and

difficulty of forecasting short-term droughts and the allocation of

priorities for rapid water supply in terms of the related policy [8].



short-term as opposed to the “yes/no” prediction of a drought. The



imagery and topographic data without precipitation data. Prediction








short-term as opposed to the “yes/no” prediction of a drought [8]. The



imagery and topographic data without precipitation data [8]. Prediction






The core technology of the SDAP model is the convergence of

machine learning and geospatial science, which makes it possible to use

geospatial data instead of tabular data to visualize prediction results.

Python was used as the programming language to develop the model.

The selected study area is southern Gyeonggi Province in Korea, which

has experienced severe drought due to climate change, even though it is

not a drought area historically. Each chapter of this thesis consists of

stand-alone papers with subtitles as follows.

� Chapter 1 is the design section of the SDAP model, and deals

with the random forest algorithm, model learning of drought,

model coverage, and the advantages and limitations of the model.

� Chapter 2 is the verification section of the algorithm used in the

SDAP model, which was achieved by comparing the

performance between the random forest and multi-layer

perceptron. In addition, analysis was performed to determine the

data characteristics and the reason why the remote sensing data

was more advantageous in random forest.

� Chapter 3 is the SDAP model application section where the

SDAP model is linked to the system dynamics (SD) model. The

SDAP model predicts the severe drought area and the SD model

then simulates the policy effectiveness targeting this area.

MODEL DESIGN

Subtitle: Prediction of Severe Drought Area Based on Random

Forest using Satellite Image and Topography Data

1. Introduction

The ultimate objective of drought prediction is to prepare a

mitigation plan in advance, rather than resolve intellectual curiosity

about nature. Drought forecasting plays an important role in mitigating

the negative effects of drought [9]; hence, various approaches for

predicting droughts have constantly been attempted, such as stochastic

methods, combined statistical and dynamical models, categorical

prediction, machine learning approaches, and hybrid models [10–14].

However, drought prediction is still challenging because, in addition to

precipitation deficit, complicated interactions among other variables,

such as temperature, evapotranspiration, land surface processes, and

human activities, also contribute to droughts [6,9,15]. Further,

meteorological anomalies, due to climate changes have made it

increasingly difficult to predict precipitation, which forms the basis for

forecasting drought.

1

Agricultural droughts determined by soil moisture must be

predicted several months ahead for proper and rapid resource allocation

[9], because this allocation can mitigate the effects of upcoming droughts

by supplying timely water and guaranteeing suitable crop growth and

availability of food resources. Droughts are generally classified into four

categories, namely, meteorological, hydrological, agricultural, and

socio-economic droughts [16].

In recent years, the prevalence of machine learning

methodologies, and frequent droughts and floods around the world, have

increased the prediction of agricultural drought. However, the

uncertainties of prediction caused by combination of meteorological

factors [17–20], still remain a problem. According to the

Intergovernmental Panel on Climate Change (IPCC) AR5 guidance note,

the complex use of different models, complexity of models, and

inclusion of additional processes in the analysis are the main reasons for

the increase in uncertainties [21]. Thus, the complex models used for the

integrated analysis of meteorological and agricultural drought have

increased the uncertainty in the results [21]. However, agricultural

drought prediction methods that do not include spatial information, and

are only based on precipitation, such those put forth in [19,22], cannot

predict the spatial distribution of agricultural drought. Another difficulty

in predicting agricultural drought stems from predictions based on the

meteorological pattern, such as patterns of past precipitation. Even well-

designed agricultural drought models, based on meteorological factors,

cannot make accurate predictions because of the shifts in existing

patterns, due to climate change. According to a study of agricultural

forecasting models from 2007 to 2017, (see Table 1 in [23]),

precipitation was used as an input variable in almost all the models.

Previous studies have attempted to develop drought prediction

models by using remote sensing data. For example, Sheffield et al. [24]

applied a downscaling technique to predict seasonal droughts by

combining hydrological and agricultural models. However, such

prediction models are difficult to use because they require decades of

weather data. Hao et al. [25] designed a system that predicts soil moisture

and severe drought affected areas by focusing on seasonal droughts,

which is similar to the approach used in our model. However, because of

its global scale and large system, it is difficult to use for developing

regional drought mitigation plans.

Therefore, alternative agricultural drought-forecasting methods

that are easy to use and scalable are required to realize practical drought

mitigation plans. Moreover, these methods must consider the uncertainty

of the precipitation forecasts and spatial information. To achieve this

objective herein, we designed a model, called the severe drought area

prediction (SDAP) model, to estimate soil moisture index (SMI) maps

after several months using surface factors. All variables were composed

of surface factors derived from remote sensing data, which are related to

soil moisture, to obtain spatial information of agricultural drought.

The SDAP model predicts the area of severe agricultural drought

in terms of preparation information to help mitigate agricultural drought

in case there exists a possibility of meteorological drought. This model

was proposed to help planning of priority and rapid water allocation for

predominantly drought-affected risk area in occurring a drought.

Therefore, prediction, in the SDAP model, does not imply a prediction

of the occurrence of drought. Rather, it provides information on the

future development and evolution of agricultural drought, assuming that

rainfall-deficit conditions continue to prevail, in preparation for drought

mitigation plans. We believe this is a realistic approach that avoids the

uncertainty problem associated with meteorological drought forecasting.

In order to design the SDAP model, we classified the land surface

factors that affect the soil moisture into four categories, which are

vegetation, topographic, thermal, and water factors during the non-

rainfall period. Thermal increase is one of the core factors that increases

the risk of agricultural drought by causing loss of soil moisture due to

increased evaporation [26], while the vegetation factor delays the loss of

soil moisture by slowing the increase in surface heat [27]. The

topography factor is another important determinant of soil moisture [28].

The water content status is also a factor related to the soil moisture

remaining after the drought period [9]. These environmental factors,

which are used as the initial land conditions as a predictor for drought

forecasting [9], are regressed on the soil moisture after the non-rainfall

period and can then predict soil moisture during short-term drought [29].

In this regard, our model is designed to regress 15 input variables

(derived from four surface factors) to the soil moisture index using the

random forest (RF) algorithm.

The RF algorithm, proposed by Breiman [30], is an ensemble

technique that uses an average of the multiple decision trees to reduce

overfitting and lead to sound regression performance. Furthermore, the

RF algorithm is advantageous for the analysis of large datasets; thus, it

is suitable for analysis involving satellite images. It generates a tree-

based regression function; hence, the scale adjustment between different

features is not necessary, which is a considerable advantage in cases

involving multiple features with different scales, as is the case in the

current study.

The novelty of this study is as follows: First, we used simple data.

We derived the drought function from the selected training area in the

years when drought existed. In contrast, other studies (see [10,12,31–36])

trained with the entire study area or used data for years. Second, once

created, the drought function was repeatedly used for the same season

and local conditions. This drought function was based on the regression

information for soil moisture according to the land surface and climatic

conditions in the area during a non-rainfall period. Third, the

precipitation data were not used because rainfall-deficit was assumed.

Fourth, we used topographical data, which were an important factor for

soil moisture at the local scale, as a variable. The agricultural drought

forecasting models that were developed [13,37–40] were mostly

analyzed on the global scale and mainly used satellite imagery. In

contrast, our SDAP model uses terrain data for satellite imagery to

improve the model to fit the local scale.

To design the SDAP model, the following approach was used.

We first defined four surface factors that affect the maintenance and

decrease of the SMI during non-rainfall periods. We then selected input

variables corresponding to this category of surface factors. We generated

drought function with the RF algorithm using input variables and three

months afterward SMI to train agricultural drought. Additionally, we

have identified the order of feature importance that affects drought

training (regression) among the input variables (features). The drought

function f(x) used to predict the SMI of the study area. We verified the

training and prediction performance of the drought function using the

actual SMI value.

2. Methodology

2.1. Study and Training Area

Our study area covered 12 administrative districts in the western

part of Korea, where irregular droughts have occurred in spring since

mid-2010. This area is located between 36°50’ N and 37°35’ N latitudes

and 126°30’ E and 127°35’ E longitudes (Figure 1), has been severely

affected by droughts in the springs of 2014–2017.

Figure 1. Study and training areas for short-term agricultural drought

prediction.

In the given study area, training areas for training drought should

be selected based on the following criteria. First, the area with the least

amount of precipitation should be selected to minimize the influence of

meteorological factors. In that regard, we have reviewed all of the

precipitation data [41] corresponding to the weather-observation points

in the study area. Second, it needed to find a place that includes the

various type of land cover. The extent should include enough sample to

train the drought.

In this study, we only used a part of the study area as the training

sample because we cannot control for the influence of meteorological

effects for analysis; moreover, the representative of the drought function

cannot be guaranteed. In addition, the use of the entire study area

increases the number of training samples, which significantly increases

the learning time, rendering it inefficient.

From among the candidate areas for training drought, we

extracted as a square region of 7.5 km × 7.5 km (56 km2, Figure 1) as the

final training area. The study area (approximately 3575 km2) is 20 times

larger than the training area.

2.2. Data and Trained Drought Period

We used Landsat-8 (Level 1) and the Shuttle Radar Topography

Mission (SRTM) 30 m DEM (ARC 1) data downloaded from the United

States Geological Survey (USGS) EarthExplorer [42]. In addition, the

land cover data for selecting the training area were retrieved from the

Korea’s Environmental Information Service [43]. The software for the

analysis was ArcGIS Pro version. The programming language was

Python 3.6.1 version for a 64-bit Windows 10 platform.

The duration of the drought to be used for training in our study

was a spring drought of approximately 3 months (97 days) corresponding

to the period from March 19 2017–June 23 2017. This period was

determined by considering the date of the Landsat-8 image, drought

beginning, and just before drought ending due to rain. The spring in 2017

was the worst drought since 2010, when precipitation was less than 25%

of the normal annual precipitation in the study area.

2.3. Variables

We defined four categories of surface factors (i.e., vegetation,

topographic, water, and thermal factor of the group) that can affect the

soil moisture during short-term drought periods based on previous

studies. We then selected 15 input variables corresponding to the

operational definitions of the four categories mentioned earlier. Table 1

presents these variables with their abbreviations.

Tab

le 1

. Fea

ture

cat

egor

ies

and

thei

r in

put

vari

able

s.

1 Thi

s fa

ctor

lim

it s

oil m

oist

ure

loss

; 2 this

fac

tor

is a

ffec

ting

soi

l moi

stur

e co

nten

t; 3 th

is f

acto

r is

rel

ated

to s

urfa

ce

wat

er;

4 thi

s fa

ctor

is

rela

ted

to t

he s

oil

moi

stur

e ev

apor

atio

n.

Lan

d su

rfac

e fa

ctor

s In

put

vari

able

s (1

5)

Abb

revi

atio

n

Dat

a

Veg

etat

ion

1

Enh

ance

d ve

geta

tion

ind

ex [

44–4

6]

EV

I B

and

2, 4

, 5

Nor

mal

ized

dif

fere

nce

vege

tati

on I

ndex

[45

–47]

N

DV

I B

and

4, 5

S

oil-

adju

sted

veg

etat

ion

inde

x [4

5,46

,48]

S

AV

I B

and

4, 5

M

odif

ied

soil

-adj

uste

d ve

geta

tion

ind

ex [

46,4

9]

MS

AV

I B

and

Top

ogra

phy

2 T

opog

raph

ic w

etne

ss i

ndex

[50

] T

WI

SR

TM

DE

M

Slo

pe [

51,5

2]

S

RT

M D

EM

A

spec

t [5

1,53

]

SR

TM

DE

M

Wat

er 3

Nor

mal

ized

dif

fere

nce

moi

stur

e in

dex

[46,

54]

ND

MI

Ban

d 5,

6

Mod

ific

atio

n of

nor

mal

ized

dif

fere

nce

wat

er i

ndex

[55

] M

ND

WI

Ban

d 3,

6

Moi

stur

e st

ress

ind

ex [

56]

MS

I B

and

5, 6

The

rmal

4

Nea

r in

frar

ed [

57]

NIR

B

and

5 S

hort

-wav

elen

gth

infr

ared

1 [

57]

SW

IR1

Ban

d 6

Sho

rt-w

avel

engt

h in

frar

ed 2

[57

] S

WIR

2 B

and

7 T

herm

al i

nfra

red

1 [5

7]

TIR

S1

Ban

d 10

The

rmal

inf

rare

d 2

[57]

T

IRS

2 B

and

11

We calculated all the input variables as follows, according to the

USGS guidelines [46] and references, as shown in Table 1. The thermal

factor consists of the five bands of Landsat-8, without the calculation.

�� = 2.5 × �� + 6.0 × �� – 7.5 × �� + 1.0

� �� = �� + ��

�� = �� + × 1.5

�� = 2 × �� + 1 – �(2 × �� + 1)� – 8 × (�� )2

�� = ln �tan �

where a (m2) is the upstream contributing area and b is the slope.

� �� = �� + ��

�� = �� + �� where Green is a green wavelength band, which is band 3 in the Landsat-

8 images.

�� = ��

(1)

(2)

(3)

(4)

(5)

(7)

(8)

(6)

Soil moisture can be estimated using various methods and indices,

such as Soil Moisture Anomaly (SMA), Evapotranspiration Deficit

Index (ETDI), Soil Moisture Deficit Index (SMDI), and Soil Water

Storage (SWS) [45]. Accordingly, the appropriate indices must be

selected depending on the purpose of the drought analysis [58]. We

reviewed several methods to obtain the SMI for this study and found that

the SMI derived by Sandholt et al. [59], calculated using NDVI and LST,

using moderate-resolution satellite images, (such as Landsat-8) was the

most appropriate for short-term drought prediction between spring and

autumn. Welikhe et al. [56] suggested that this SMI calculation is

particularly advantageous in estimating soil moisture during growing

periods. The results of their study indicated that this SMI showed the

highest correlation with real soil moisture at a depth of 20 cm. The

formula used to calculate the SMI is presented as follows [59,60]:

�� = �� !

where Ts max and Ts min is the maximum and minimum surface temperature

observation for a given NDVI.

Subsequently, we constructed a drought function using the

aforementioned 15 input variables of the training area on March 19, 2017

and the SMI output variables on June 23, 2017 via RF. Figure 2 shows

the structure of the SDAP model including generation of the drought

function and prediction of the drought-severe area.

(9)

2.4. Drought Function

We performed machine learning via RF, using 62,500 (250 × 250)

data samples from the training area, in order to generate the drought

function f(x). All samples were split by training (75%, n = 46,875) and

test (25%, n = 15,625) datasets after 100 shuffles for all samples. We

then used the RandomForestRegressor python function of the ensemble

module of the scikit-learn library for machine learning. When fitting the

f(x), max_depth is the most important parameter that can be used to

prevent overfitting or underfitting of data during training. The optimal

max_depth was 14 and was found by tuning the coefficient of

determination (R2) of the training dataset maximized, while minimizing

the R2 difference between the training and the test dataset.

We verified the performance of f(x) using root mean square error

(RMSE), normalized RMSE (NRMSE), mean absolute error (MAE) and

R2. In addition, we confirmed the spatial distribution of error by mapping.

�� = "�# $ (% � %&)�#'*�

�� = ��%�� %��! × 100(%)

�� = 1� -|% � %&|#'*�

where y is the actual SMI, � is the predicted SMI, ymax is a maximum

actual SMI and ymin is a minimum actual SMI.

(10)

(12)

(11)

Figu

re 2

. Str

uctu

re o

f th

e se

vere

dro

ught

are

a pr

edic

tion

(S

DA

P)

mod

el. x

ti: i

nput

var

iabl

e of

the

tra

inin

g ar

ea;

y:

actu

al s

oil

moi

stur

e in

dex

(SM

I);

x pi:

inpu

t va

riab

le o

f th

e st

udy

area

; an

d � p

: pr

edic

ted

SM

I of

the

stu

dy a

rea.

2.5. Feature Importance

The tree-based RF regression can confirm the importance of

features using the feature_importances_ function of the ensemble

module of scikit-learn libraries after fitting. We found the effect of each

input variable on the retention and loss of soil moisture, during non-

rainfall periods, by sorting the input variables according to their

individual importance using two functions. In addition, the feature’s

importance of each category (i.e., vegetation, topography, water, and

thermal factor) was summed to obtain the category importance order.

2.6. SMI Prediction on the Study Area

We estimated the SMI of the study area for June 2015, using the

data of 14 March 2015, by applying the drought function to predict the

agricultural drought of 18 June 2015. These are suitable data for

predicting agricultural drought because a real drought happened in that

area between March and July in 2015. The best verification for the SDAP

model is the drought case since 2017, which is the year of the data source

of the drought function. However, drought did not occur after 2017;

hence, the closest case of the past was an alternative. As previously

mentioned, prediction in this study means the future some months after;

thus, the context of year does not matter.

We tried to use the Landsat-8 image on 18 June 2015 (after 97

days from 14 March 2015) for verification, but we alternatively used an

image from 4 July 2015 (after 113 days from 14 March 2015) because of

the many clouds in the image. For reference, the drought function trained

the drought of 97 days.

The final prediction SMI map (the agricultural drought map) of

the study area was obtained by the following process (see prediction

section in Figure 2). We generated 400,000 random points of the study

area. The predicted SMI value of those random points (�p) were obtained

via f(x) using the input variables (xpi) on 14 March 2015. After that, the

final SMI map was completed by interpolating the SMI value of the

random points. We validated this predicted SMI value of random points

with the actual SMI calculated using the Landsat-8 (4 July 2015) and

SRTM DEM.

3. Results

3.1. Training Performance

Confirmation of the training performance and error distribution

of the drought function are shown in Figure 3 and Table 2. R2 showed a

training performance of 0.91, and the error distribution was not

concentrated on a specific land cover. Thus, it was considered to be a

drought function (training area) that can represent drought in this area.

Figure 3. Verification of the training performance and error distribution.

Table 2. Evaluation of the training performance of the drought function.

1 The actual maximum SMI value is 1; 2 The actual minimum SMI value is 0.

3.2. Feature Importance

Figure 4 shows the importance of each input variable for f(x).

Table 3 lists the total importance by category (i.e., thermal, water,

topography, and vegetation features). The results showed that the

thermal factor was the most important, but only SWIR1 had low

explanatory power on the soil moisture. Particularly, among the thermal

images of Landsat-8, TIRS1 was identified as a better predictor of soil

moisture than TIRS2. The slope was the most important of the

topographic features. The importance of water or vegetation features was

similar.

RMSE NRMSE MAE R2 Max. SMI 1 Min. SMI 2 Max. Error

0.05294 5.29% 0.03980 0.91 0.97940 0.05684 0.30352

Figure 4. Feature importance of drought function.

Table 3. The importance of land surface factor on drought function.

3.3. SMI Prediction and Validation

Figure 5a illustrates the map of agricultural droughts after 3

months (97 days) of non-rainfall period, predicted using the input

variables of 14 March 2015 and the drought function. Actual SMI map

calculated from the Landsat-8 images on 4 July 2015, 119 days after 14

March 2015.

Land Surface Factors Importance

Thermal 0.659

Topography 0.238

Water 0.059

Vegetation 0.044

Figure 5. Final predicted SMI map and actual SMI.

Figure 6 shows a scatter plot of the predicted SMI and the actual

SMI for validation. From the validation results, R2 = 0.58, which was

lower than the training performance (R2 = 0.91). However, it shows

clearly the potential severe drought area comparing with Figure 5b as the

actual SMI of drought.

Figure 6. Scatter plot between the actual and the predicted SMI of the

study area.

4. Discussion

4.1. General Information on the SDAP Model

The appropriate training area, the selection of appropriate

drought period, and the number of suitable random points to cover the

study area are important in utilizing the SDAP model. Therefore, the

following factors should be considered in this model. First, in the training

area selection, similar proportions of different land cover types should

be chosen. If a particular land cover type is dominant in the training area,

the model is prone to errors in areas under the less representative land

cover types. In addition, irrigated areas should be avoided to insulate

against human influence on the output variable SMI.

Second, the SDAP model uses only land surface factors; hence,

it is recommended to develop the drought function using data from the

year in which the precipitation was the lowest (i.e., the drought year for

modeling should have had the least rainfall possible for minimizing the

interferences of meteorological factors). In this study, the short-term

drought function was trained using data from March to June (97 days).

However, drought periods can be freely selected within a one-year period

(e.g., May–July), depending on the application, region, and country. It

means this model has transferability to other locations with different

conditions by generating drought function using the local data of the

study area to be analyzed.

Third, enough random points should be allocated for the

estimation and subsequent generation of the prediction map after

interpolation. Given that the SMI is a float number between 0 and 1, if

few points are available for estimation, the SMI pattern will not appear

after the interpolation. In fact, providing a threshold for the number of

appropriate points is difficult (it will vary depending on the scope of the

study area and the variety of land cover). However, increasing the

number of points by an adequate margin is recommended when the

results do not exhibit a clear pattern. In this study, we confirmed suitable

pattern generation for more than 400,000 points, whereas 300,000 points

were insufficient to show the trend of agricultural drought.

4.2. Coverage of the Trained Drought Function

This study shows that the function trained by the RF algorithm

was predicted as R2 = 0.58 over an area approximately 20 times larger

than the training area. However, determining the predictable coverage

using the trained function is difficult, given regional deviations, but

prediction accuracy is negatively correlated with the distance from the

training area and decreases as the prediction area increases.

We conducted further analysis to verify the predictable coverage.

We specifically predicted the SMI of a training area approximately 150

times larger than the training area and obtained a decrease in R2 from

0.58 to 0.39. As shown in Figure 7a, a region with a sharp decrease in

correlation between the actual and predicted SMI was observed. The

distribution of these samples was confirmed on the map. Figure 7b shows

that the highest error mostly occurred far from the training area in largely

heterogeneous areas. In contrast, the samples with a low correlation

occurring near the training area were mostly identified as water bodies,

and the error can be caused by the lack of sufficient water samples for

training.

Figure 7. (a) Relation between the actual and predicted SMI over an area

150 times (8249 km2) larger than the training area; (b) Distance from the

training area and distribution of erroneous samples.

4.3. SDAP Model Advantages

4.3.1. Prediction of Severe Drought Areas

Information on drought severity is the most practical and

important information for mitigating an upcoming drought [61]. The

objective of this study was to predict the distribution of soil moisture

assuming non-rainfall conditions. According to the National Drought

Mitigation Center (NDMC), it is necessary to make a plan for drought

before beginning, confirm the priority of water dependence on

agriculture and the community, and to immediately apply this step-by-

step when water begins to become insufficient [62]. In this step, the

SDAP model can help by supporting spatial information on droughts in

advance, and allow effective and planned resource allocation to reduce

the impact of upcoming droughts.

The SDAP model can show future severe agricultural drought

areas through non-rainfall periods because the method can reduce the

uncertainty by separating two models (i.e., meteorological and

agricultural droughts) with different features in the analytical results [21].

Under non-rainfall conditions, the SDAP model works on the

assumption that meteorological drought forecasting is preceded. Thus,

the SDAP model can analyze drought without considering

meteorological data.

However, the absence of meteorological data for analysis does

not mean that we have excluded climate factors. Machine learning on the

growth of soil drought is a learning of the trend caused by climate and

seasonal changes in the area. Therefore, the further away from the

training area, the more the climate and the weather become different;

hence, the prediction error of the SMI increases.

We found that one of the studies [13] predicted soil moisture

using hybrid machine learning without climate data, with a similar

concept to the SDAP model. However, that study cannot show the spatial

distribution of the SMI because of the usage of the actual measured data

from the soil of only seven points, not the remote sensing data. In fact,

the prediction by [13] aimed to only predict the soil moisture of seven

trained points using training data (corresponding to the validation of the

performance of the drought function in our study); hence, this is just an

assessment of the training performance, and not prediction. However, in

our study, the training data were only used to generate the drought

function, and different data from another year were used for prediction.

Another study [22], using only meteorological data, cannot be used in

practical drought mitigation plans because of the exclusion of the spatial

distribution of agricultural drought in the prediction results.

In contrast, the results of the SDAP model can show clear SMI

trends including the severe area of the SMI (Figure 5). The mitigation of

Short-term drought requires the SMI spatial trend information rather than

accurate SMI values [9,63]. In this model, although the SMI error

(RMSE = 0.382, MAE = 0.375) of the predicted study area increased

compared to the training performance of the drought function (RMSE =

0.052, MAE = 0.039), our model has achieved the aim sufficiently

because it can show the agricultural drought trend clearly.

4.3.2. Feature Importance

We found that the thermal factor is one of the major factors

affecting soil moisture during a drought period by confirming the

findings of Sruthi and Aslam [27], who noted that increased temperature

reduces soil moisture. Furthermore, TIRS1 in the thermal factor was the

most important feature in the drought function (Table 3). However, one

should be cautious when considering the feature importance. For

example, relative vegetation features have been shown to be of low

importance in our model. However, according to Sruthi and Aslam,

vegetation and temperature were strongly correlated with each other [27];

hence, vegetation might play an important role in preserving soil

moisture. Therefore, feature importance should be carefully considered

based on other studies.

We obtained this variable importance information because we

used the RF algorithm for drought training. The RF algorithm has the

advantage of providing variable importance; hence, it is also used to

extract important variables and remove unnecessary variables when

creating a model [64]. One case involved the estimation of the relative

importance of variables related to soil hydrological properties in the field

of hydrology using this importance function [65].

However, whether the correlation between the SMI and feature

importance is negative or positive, is unknown. Therefore, it is also

helpful to reference the multi-linear regression coefficients to understand

feature importance. Comparing the feature importance of the drought

functions, calculated from various regions and periods, can provide

important insights into drought studies related to soil moisture.

4.4. Limitations and Applications of the SDAP Model

The predicted SMI, using machine learning applied to satellite

images with the resolution of 30 m, is a suitable drought information for

planning the mitigation of short-term drought at the regional level

because the local spatial distribution is acquired. However, some general

remote sensing limitations exist, such as difficulty in obtaining data of

the right date to be analyzed because of cloud or revisit cycle of the

Landsat-8. This limitation can be addressed by considering using other

satellite images with a higher temporal resolution. This solution remains

to be confirmed in further studies. Along with the general limitations of

remote sensing, this SDAP model has the following limitations:

First, the SDAP model applies only to the prediction of short-

term droughts, within several months, because the SMI is a suitable

measure for that timescale [56]. In practice, the soil will become dry,

regardless of the surface state if non-rainfall conditions persist for longer

periods. Therefore, the SDAP model is suitable for short-term droughts

that can be mitigated by human action within a few months.

Second, the prediction of droughts may be difficult if the area has

no drought experience or different seasons, because this model is based

on learning from past droughts. Therefore, various drought functions by

period or season must be considered to make a suitable prediction of the

situation.

Third, this model must be based on a sequential approach to

finding solutions by analyzing droughts by each expert group. Thus, it is

not suitable for an integrated method of calculating meteorological,

agricultural, and hydrological droughts at one time because this model

predicts the trend of agricultural drought change after the meteorological

drought analysis is preceded. For example, the spatial information of

agricultural drought, which is the result of this model, is then subject to

a primary review against the corresponding hydrological drought. After

all the analyses have been conducted, and the areas of concern droughts

have been identified, priority plans for water reserves (e.g., water

demand control) and water allocation for these areas should be

established. Therefore, the proposed model is only suitable to prepare for

short-term droughts and find focalized solutions by each expert group,

instead of performing an integrated analysis of various models with high

uncertainty.

5. Conclusions

Agricultural drought is a disaster that must be managed

effectively as it directly affects food security. For this reason, several

models have been developed to predict agricultural drought. However,

regardless of how good a model is, inaccuracies arise in the prediction of

droughts, due to meteorological anomalies where the model is based on

repeated patterns or historical precipitation data. The proposed SDAP

model was hence developed by accepting those uncertainties, rather than

overcome meteorological uncertainties. This SDAP model does not

depend on precipitation data because it predicts potential drought-severe

areas under the assumption that rainfall-deficit conditions already exist.

In addition, existing other models can be catastrophic in the event

that an unpredicted drought occurs. However, given that it makes

prediction under the condition of no rainfall such as that done in our

model, there is a benefit for society even if no drought occurs. Although

our model is designed for a short-term drought, accumulation of these

predictions also helps us identify areas that are susceptible to drought so

that mitigation plans can be prepared from a long-term perspective.

Given that the ultimate objective for predicting droughts is to

develop mitigation plans, the model must be a simple, practical, and

useful such as the SDAP model. This model is easy to understand as it

uses a regression algorithm compared to other huge prediction systems.

Further, the analysis of various case studies for many regions and

drought functions in SDAP model can provide meaningful insights into

the key factors associated with drought. At present, we are working on a

comparison study of machine learning algorithms to improve the SDAP

model as well as a case study involving the application of the SDAP

model.

MODEL ALGORITHM

Subtitle: Tree vs. Network: Which is better Machine Learning Algorithm

for Regression Prediction when using Remote Sensing Data?

1. Introduction

Random forest (RF) is one of the ensemble technique based on

decision-tree, is most used as a classifier for machine learning on remote

sensing (RS) and geographic data [66]. This is because the importance

of variables to parameter selection can be easily derived when using RF

[64], and hence obtaining relevant variables from high-dimensional data

becomes straightforward [67]. Furthermore, there are many studies [68–

73] on the relation between RF and RS for classification, and even an RS

study [36] without fully explains such appropriateness of using RF exists,

as RF is already known to be suitable for RS applications. However,

artificial neural networks (ANNs) seems to be most widely used in other

fields. In fact, one study provides more accurate results than RF and even

stated that RF has been neglected in some fields, such as building

research [74]. Nevertheless, RF algorithms remains dominant in RS

along with ANNs.

2

In addition, some studies have already confirmed that RF can

outperform ANNs as a machine learning approach for RS data.

Rodriguez-Galiano et al. [75] obtained a better performance from RF

than from ANNs when mapping mineral prospects for classification

using Landsat images. Hayes et al. [76] found that an RF classifier of

land cover combining RS data with subsidiary data is effective and very

accurate. However, such studies were restricted to simple comparisons

of RF and ANN classification. Unlike similar comparative studies

[73,75,77], which only evaluated algorithm performance, we aimed to

comprehensively unveil the relation between RS data characteristics and

training performance based on RF and a basic ANNs architecture, multi-

layer perceptron (MLP) [78]. Furthermore, we tried to both identify

whether RF or MLP regression prediction performed better on RS data

using the severe drought area prediction (SDAP) model introduced in our

previous study. As RS data, it means reflectance, elevation, and map

index values and was exclusively considered the regression problem for

regression. To illustrate, other problems such as object detection in

satellite images, which are implemented using convolutional neural

networks [79], are beyond the scope of this study.

Most drought prediction models are based on precipitation data

regardless of the type of drought because agricultural droughts (soil

drought) and hydrologic drought occur sequentially after meteorological

droughts [80]; eventually, these lead to a socio-economic drought. In

addition, droughts generally occur by the interruption of periodic

weather patterns that disrupt hydrological circulation [81]. However, in

recent years, unprecedented droughts are increasingly occurring given

that climate change deviates precipitation more frequently from the

regular patterns. Thus, predicting droughts is becoming more

challenging despite scientific progress.

Although uncertainty in precipitation forecasts, most drought

prediction models, published from 2007 to 2017 (see [23]), still mainly

rely on patterns from rainfall data, or precipitation data. Consequently,

these models require either massive weather data or large systems

observed over several years or even decades, sometimes requiring a

global scale (e.g., [13,37–40]). Thus, accurate drought prediction usually

depends on the availability of huge historical data records. Sometimes,

the scale problems by semi-global analysis are not suitable to implement

regional drought mitigation plans. Furthermore, even though the models

even provide very exact drought probability, it could be insufficient to

devise mitigation policies given not using geospatial data (see [22]).

When developing the SDAP model, we focused on not using

precipitation data and spatially predict short-term drought; specifically,

soil drought distribution (agricultural drought) as it is directly related to

food security was dealt with [8]. This period, several months, is neither

climate nor weather, which is almost impossible to predict [82]. Thus,

short-term drought prediction based on precipitation is unrealistic.

Unlike drought probability methods, the SDAP is not a binary

(yes/no) drought prediction model but provides the spatial distribution of

soil drought assuming 3 months of no rainfall [8]. This model is intended

to give information for allocating water resources with priority to areas

where soil droughts are predicted to be more severe by assuming

meteorological drought. The structure of SDAP is illustrated in Figure 1.

Figu

re 1

. SD

AP

mod

el s

truc

ture

and

ana

lysi

s se

quen

ce i

n th

is s

tudy

.

SDAP requires 15 input variables and 1 output variable [8]. The

inputs were selected based on existing studies [26–28] related to the loss

and maintenance of soil moisture during meteorological droughts and

correspond to four surface factors (i.e., vegetation, water, thermal, and

topographic factor). The output, the soil moisture index (SMI), is one of

the representative indexes for soil drought [45]. In this study, we

compared the performance of the SDAP algorithm by implementing

regression with either RF and MLP and following the analysis sequence

itemized with circled numbers in Figure 1.

2. Methodology

2.1. Study Area and Training Area for Supervised Learning

This study covers an area (~3,575 km2) of the southern part of

Korea's metropolitan area, which is located between the longitudes

126°30'–127°30' and latitudes 36°30'–37°30' (Figure 2). This area has

historically been stress-free from drought but experienced unprecedented

severe spring drought in 2015–2017 [83]. Korean experts on climate

change consider that the abnormal droughts may be increasing gradually.

Hence, drought in this area should be predicted in a timely manner to

allocate water in advance, especially when unexpected meteorological

droughts occur. Thus, this area was suitable for applying the SDAP

model.

Figure 2. Study and training area to estimate soil drought distribution.

The training area (~56 km2) for supervised learning used part of

the study area (Figure 2) and selected drought period for training was

March–June 2017 (3 months), when the spring drought was worst during

2015–2017 in the study area; the worse drought gave a better prediction

because it means the lessor intervention of meteorological factors [8].

Although the size or shape of the training area is not important,

it is essential to include a variety of land use, exclude human intervention

factors such as irrigation, and have sufficient data to learn natural

phenomena [8]. We used 62,500 samples from the training area for

learning the drought; in our previous studies [8], the predictive

performance declined if less than 50,000 samples were available.

Non-rainfall of the samples must be guaranteed for good training

because weather conditions can vary locally. Thus, we thoroughly

reviewed rainfall data from all weather stations in the study area and

selected the training area near a station where drought was predominant

[8]. These procedure to select the training area ensures non-rainfall and

to use the training area instead of the whole provides time efficiency.

2.2. Data description and Processing

Features (15 input variables) from the 62,500 samples were

extracted using a Landsat-8 image from March 19, 2017 and the digital

elevation model of the Shuttle Radar Topography Mission. The SMI was

then calculated using a Landsat-8 image from June 23, 2017. All data

were downloaded from the Earth Explore site [42]. The details and

calculation formulas for the variables are listed in Table 1.

The programming language for the machine learning

implementation was Python 3.6 running on 64-bit Microsoft Windows

10. RF was implemented using RandomForestRegressor from the Scikit-

learn library, and MLP was implemented using the Keras library. ArcGIS

Pro was used to calculate the image index map and topography attributes.

The 62,500 samples were divided into 46,875 (75%) samples for training

datasets and 15,625 (25%) for test datasets after 100 shuffles. In addition,

6875 samples (15% of the training dataset) were used as the validation

dataset for the MLP.

Tab

le 1

. Dat

a de

scri

ptio

n S

urfa

ce

fact

or

Var

iabl

e S

ymbo

l F

orm

ula

Val

ue t

ype

Dat

a ty

pe

Val

id r

ange

Veg

etat

ion

Enh

ance

d V

eget

atio

n In

dex

E

VI

[44]

2.

5 ×

((N

IR –

Red

)/(N

IR +

6.0

×

Red

– 7

.5 ×

Blu

e +

1))

F

loat

In

dex

-1 t

o 1

Nor

mal

ized

Dif

fere

nce

Veg

etat

ion

Inde

x

ND

VI

[47]

(N

IR –

Red

)/(N

IR –

Red

) F

loat

In

dex

-1 t

o 1

Soi

l-A

djus

ted

Veg

etat

ion

Inde

x S

AV

I [4

8]

((N

IR –

Red

)/(N

IR –

Red

+ 0

.5))

×

(1 +

0.5

) F

loat

In

dex

-1 t

o 1

Mod

ifie

d S

oil-

Adj

uste

d V

eget

atio

n In

dex

MS

AV

I [4

9]

(2 ×

NIR

+ 1

– s

qrt(

(2 ×

NIR

+ 1

)2 –

8 ×

(N

IR –

Red

)))/

2 F

loat

In

dex

-1 t

o 1

Wat

er

Nor

mal

ized

Dif

fere

nce

Moi

stur

e In

dex

N

DM

I [8

4]

(NIR

- S

WIR

1) /

(N

IR +

SW

IR1)

F

loat

In

dex

-1 t

o 1

Mod

ific

atio

n of

Nor

mal

ized

D

iffe

renc

e W

ater

Ind

ex

MN

DW

I [5

5]

(Gre

en –

SW

IR1)

/ (

Gre

en +

S

WIR

1)

Flo

at

Inde

x -1

to

1

Moi

stur

e S

tres

s In

dex

MS

I [5

6]

Mid

IR /

NIR

F

loat

In

dex

-1 t

o 1

The

rmal

[5

7]

Nea

r In

frar

ed

NIR

0.

851–

0.87

9 �

m

Inte

ger

Ref

lect

ance

0

to 1

0,00

0

Sho

rt-W

avel

engt

h In

frar

ed 1

S

WIR

1 1.

566–

1.65

1 �

m

Inte

ger

Ref

lect

ance

0

to 1

0,00

0

Sho

rt-W

avel

engt

h In

frar

ed 2

S

WIR

2 2.

107–

2.29

4 �

m

Inte

ger

Ref

lect

ance

0

to 1

0,00

0

The

rmal

Inf

rare

d 1

TIR

S1

10.6

0–11

.19

�m

In

tege

r R

efle

ctan

ce

0 to

10,

000

The

rmal

Inf

rare

d 2

TIR

S2

11.5

0–12

.51

�m

In

tege

r R

efle

ctan

ce

0 to

10,

000

Top

ogra

phy

Top

ogra

phic

Wet

ness

Ind

ex

TW

I [5

0]

Ln

��

�tan

��

Flo

at

Inde

x 0

to 5

0

Slo

pe

- D

egre

e of

slo

pe

Flo

at

Deg

ree

0 to

90

Asp

ect

- D

egre

e of

asp

ect

Flo

at

Deg

ree

-1 t

o 36

0

Tar

get

Soi

l m

oist

ure

inde

x S

MI

[59]

(

T max

– T

)/(T

max

– T

min

) F

loat

In

dex

0 to

1

2.3. Machine learning algorithm for regression prediction

2.3.1. Random forest

RF algorithm proposed by Breiman [30] is an ensemble

technique that uses the average of multiple decision trees to establish a

regression model. Two parameters are the most important to perform RF

regression, namely max_feature, the number of variables to be randomly

selected for segmentation at a node, and max_depth, the depth of the tree.

Figure 3 illustrates a regression tree with max_feature set to 3 and

max_depth set to 4 to show RF tree structure using our study data.

Figure 3. Structure of RF tree for regression.

The procedure from training to prediction using RF regression

can be summarized as follows [73,74,85]. First, bootstrap samples are

drawn from the training data. Then, regression trees for each bootstrap

sample are grown by randomly setting a predefined number of variables

(max_feature) to split at each node of the decision trees until there are no

further splits or until a limit is reached, such as the depth of the tree

(max_depth). At each node, the best split is selected among a randomly

selected subset of input variables. The Gini index is used to set the best

split threshold of input values for a given output value [73,86]. These

steps are repeated until N trees are created, as in T1(X), T2(X), …, TN(X),

where X = x1, x2, . . ., xm is an m-dimensional vector of inputs. Finally,

an ensemble model, the RF regression model, is created by averaging the

N regression trees, and prediction can be obtained using the resulting

function:

/34(9) = 1� - �(:);;*�

2.3.2. Multi-layer perceptron

There are many types of ANNs, however, their operation

principles are similar, with nonlinear classification/regression being the

most basic architecture for an ANN [78]. As our research is limited to

regression for prediction, we only provide a summary of neural networks

and prediction procedures using MLP, which is a multi-layer forward

neural network and one of the most basic types of ANNs [20,87]. MLP

is widely used for solving regression problems that require supervised

learning using classification or quantity data for prediction.

A neural network aims to find relationship between input and

output by minimizing the prediction error between the actual and desired

outputs (i.e., class or quantity labels). As the neurons can find the

(1)

relationship for regression between multiple inputs and one output; we

used this architecture to implement regression, the relationship between

the 15 surface factors and soil moisture during the drought period.

In a neural network, neurons are placed in layers, and they have

stronger connections to those neurons in other layers with more

correlation (weight). MLP consists of an input layer, one or more hidden

layers, and an output layer, as illustrated in Figure 4.

Figure 4. Structure of MLP for regression.

Training using MLP for prediction needs selecting a suitable

structure among layers (including the number of neurons per layer) and

requires the proper initialization of weights and learning rate to prevent

overfitting [75]. Thus, most important issue for the researchers is to find

the optimal hyperparameters.

Generally, the following procedure is adopted for quantity

prediction using supervised learning and MLP [20,74]. First, data are

divided into training, test, and validation datasets. Then, to create a

regression model, optimal hyperparameters architectured by tuning.

Finally, the regression model is validated and then used for prediction.

2.4. Training and Validation of Regression Model

Optimization of regression model using machine learning is

achieved by finding the hyperparameters that minimize a predefined loss

function on given data [88]. We performed hyperparameter optimization

using grid search and the hyperparameter ranges listed in Table 2. In

MLP training, we used the activation function lelu in the hidden layers.

In addition, we performed data scaling; that is, the input variables were

normalized to 0–1, which is the main difference with RF.

Table 2. Hyperparameter ranges for RF and MLP regression.

1 Mean squared error

Both optimized models, fRF(x) and fMLP(x) in the area shown in

Figure 1, were validated using various evaluation metrics, namely

coefficient of determination R2, mean squared error (MSE), mean

absolute error (MAE), and root-mean-square error (RMSE):

RF MLP

Criterion MSE1 Loss function MSE

max_depth 2–20 Hidden layers 1–20

max_feature 2–15 Neurons 3–30

n_estimators 100–2000 Epochs 100–2000

�� = 1� -(% � %&)�#'*�

�� = 1� -|% � %&|#'*�

�� = <1� -(% � %&)�#'*�

2.5. Prediction Performance

To compare the prediction performance of each drought function,

fRF(x) and fMLP(x), we predicted SMI of the 3-month later (June, 2015)

using the records from March 14, 2015 as input data. As no drought has

occurred in study area since 2017, we predicted and validated the

regression models using data from 2015; the drought function doesn't

matter before years or after. Distances from training areas and the

seasons of input data are rather considerations, as verified in our previous

study [78].

The predicted SMIs was validated by considering the SMI

calculated using the Landsat-8 image from July 14, 2015 as reference;

although the best verification would be using the image from June 18,

2015, the image from this date was very cloudy.

(2)

(4)

(3)

2.6. Data Grouping and Training

To verify the training and prediction performance according to

data characteristics, we divided the datasets into three groups by

characteristics of the 15 input variables: Group 1 consists of map indices

with float number type, Group 2 is the reflectance group with integer

number type, and Group 3 consist of degrees with float number type.

More details of the groups are listed in Table 3. Seven groups were

considered for training by taking all the group combinations, and RF and

MLP algorithms was conducted and evaluated by the comparison metrics.

Tab

le 3

. Tra

inin

g gr

oups

acc

ordi

ng t

o da

ta c

hara

cter

isti

cs.

Trai

ning

Gro

up

(No.

var

iabl

es)

Inpu

t Var

iabl

es

D

ata

Cha

ract

erist

ics

Gro

up 1

(7

) E

VI,

ND

VI,

SA

VI,

MS

AV

I, N

DM

I,

MN

DW

I, M

SI

D

ata

type

fl

oat

D

ata

feat

ures

in

dex

V

alid

ran

ge

-1 t

o 1

Gro

up 2

(5

) N

IR, S

WIR

1, S

WIR

2, T

IRS

1, T

IRS

2

Dat

a ty

pe

inte

ger

D

ata

feat

ures

re

flec

tanc

e

Val

id r

ange

0

to 1

0000

Gro

up 3

(3

) T

WI,

Slo

pe, A

spec

t

Dat

a ty

pe

floa

t

Dat

a fe

atur

es

topo

grap

hic

attr

ibut

es

V

alid

ran

ge

-1 t

o 36

0 (-

1: f

lat)

Gro

ups

1 an

d 2

(12)

EV

I, N

DV

I, S

AV

I, M

SA

VI,

ND

MI,

M

ND

WI,

MS

I, N

IR, S

WIR

1, S

WIR

2,

TIR

S1,

TIR

S2

D

ata

type

fl

oat

and

inte

ger

D

ata

feat

ures

in

dex

and

refl

ecta

nce

V

alid

ran

ge

-1 t

o 1

and

0 to

100

00

Gro

ups

1 an

d 3

(10)

E

VI,

ND

VI,

SA

VI,

MS

AV

I, N

DM

I,

MN

DW

I, M

SI,

TW

I, S

lope

, Asp

ect

D

ata

type

fl

oat

D

ata

feat

ures

in

dex

and

degr

ee

V

alid

ran

ge

-1 t

o 1

and

-1 t

o 36

0

Gro

ups

2 an

d 3

(8)

NIR

, SW

IR1,

SW

IR2,

TIR

S1,

TIR

S2,

T

WI,

Slo

pe, A

spec

t

D

ata

type

In

tege

r +

flo

at

D

ata

feat

ures

re

flec

tanc

e +

deg

ree

V

alid

ran

ge

0 to

100

00 &

–1

to 3

60

Gro

ups

1, 2

, and

3

(15)

EV

I, N

DV

I, S

AV

I, M

SA

VI,

ND

MI,

M

ND

WI,

MS

I, N

IR, S

WIR

1, S

WIR

2,

TIR

S1,

TIR

S2,

TW

I, S

lope

, Asp

ect

D

ata

type

fl

oat,

inte

ger,

and

flo

at

D

ata

feat

ures

in

dex,

ref

lect

ance

, and

deg

ree

V

alid

ran

ge

-1 t

o 1,

0 t

o 10

000,

and

-1

to 3

60

Training for both RF and MLP was set up under the same

conditions to verify differences in training and prediction performance

by data characteristics. RF considered max_depth of 14 and max_feature

being the maximum number of inputs, whereas MLP considered the

number of hidden layers as 3 and as many neurons per hidden layer as

the number of inputs. Training proceeded until a limit of 1000 iterations.

3. Results

3.1. Training Performance: RF vs. MLP

The hyperparameter optimization results are listed in Table 4, and

the training performance is shown in Figure 5, where RF regressor

clearly outperforms MLP.

Table 4. Optimal hyperparameters of machine learning using RF and

MLP

1 Mean squared error; 2 In order of hidden layers, there are 30 neurons in the first hidden layer, 15 in the second, and 7 in the third.

RF MLP

Criterion MSE 1 Loss function MSE

max_depth 14 Hidden layers 3

max_feature 15 Neurons 2 30, 15, 7

n_estimators 1000 Epochs 1000

Figure 5. Training performance of regression model using RF and MLP

3.2. Prediction Performance: RF vs. MLP

Figure 6 shows the prediction performance of each drought

function using RF and MLP, fRF(x) and fMLP(x), respectively. RF

retrieved R2 = 0.587, being slightly lower than its training performance

(0.915). However, it provides a clearer drought spatial pattern that is

closer to the actual value compared to the results from MLP (Figure 7).

The prediction of soil drought using MLP notably differs from the actual

soil drought pattern, as shown in Figure 7.

Figure 6. Drought prediction performance using RF and MLP

Figure 7. Actual drought and predictions using RF and MLP

3.3. Training Performance According to Data Characteristics

Figure 8 shows the results of training performance from the seven

training groups by data characteristics. RF learning outperformed MLP

in all training groups, with notably higher performance in groups 1 and

2, and only a slightly superior performance in group 3.

Figure 8. Comparison of training performance according to groups with

different data.

4. Discussion

4.1. RF vs. MLP Regression on RS Data

Algorithm selection depends on the data characteristics and it is

most natural to use an index map or reflectance when comprising a

regression variable using satellite images. The results of this study

showed that RF might outperform MLP in this case. This may explain

the persistent use of RF as a powerful algorithm in RS field, unlike other

fields in which ANNs are the mainstream. RS is the most active field for

research on classification/regression based on satellite images. Below,

we summarize the data characteristics that allow RF to outperform

possibly MLP on RS data.

First, the small data range of index maps, which generally have

normalized values between –1 and 1, may not give significant effect RF

performance. In contrast, data types with small ranges and intervals, such

as map indices, might be more disadvantageous when using MLP.

According to our results, the difference in training performance between

the two evaluated algorithms was the largest in group 1, which consists

of index maps (Figure 8). This can be also supported by the results from

group 3, where the range was between –1 and 360, and the results

retrieved the smallest difference in training performance between RF and

MLP.

The second consideration is the usage of the large valid range

data, such as reflectance values, which lead to disadvantageous to

training when using MLP. Network-based ANNs require data scaling

because they are sensitive to the Euclidean distance between data. On the

other hand, RF does not depend on the distance between datapoints and

requires no scaling, because the variable contributions are determined

from the decision tree. This can be supported by the training performance

in group 2 (reflectance values group) and it was the second-largest

difference after group 1. Thus, RF can give better performance than MLP

when using either reflectance values with a wide range or index map

values with a small range.

Some studies further support our findings. In [73,75], variables

derived from satellite images showed that RF performance is better,

whereas a study on weather data and energy consumption showed that

MLP perform better [74].

4.2. MLP and Data Precision

MLP is one of the most basic regression and prediction

algorithms using ANN architectures, and by itself would be insufficient

to explain the performance of ANNs in general compared to RF learning.

Hence, we constructed to perform regression and further analysis using

a long short-term memory structure, called LSTM and one of the ANNs.

Still, the performance evaluation retrieved similar results to the MLP

(MSE = 0.0044, MAE = 0.0497, R2 = 0.833).

Moreover, we considered data precision as well as the range in

this study. Although we have proved the training performance differed

among the evaluated algorithms due to data range, the results are not

enough to determine the worse performance of MLP without discussion

of the contribution of data precision because the training can be affected

by the length in floating-point data. Thus, we performed further analysis

of MLP by considering only 3 instead of original 6 decimal places.

However, this experiment did not show a significant difference (MSE =

0.0043, MAE = 0.0501, R2 = 0.839) comparing with result of 6 decimal

places, and hence we can conclude that the performance difference

between the RF and MLP algorithms mostly depends on the range of data

regardless of the value precision.

4.3. Drought Prediction Using Machine Learning

We determined the suitability of RF for short-term drought

prediction if using regression and RS data. In the same vein, according

to [23], the thirty-one drought prediction models using ANNs were based

on precipitation data, whereas those using RS data and regression used

RF. Thus, ANNs might be more suitable when using precipitation data

to predict drought, and RF might be better when using RS data,

especially from satellite images. In this study, we did not specifically

mention topographic data besides satellite imagery. This is because we

did not identify a large difference (but, there were small differences, RF

was better) in training performance between two algorithms due to

topographic data. Remarkably, although topographic data did not serve

for soil drought predictions below 0.4 (see group 3 in Figure 8), it is a

very important variable to increase prediction accuracy. When we trained

drought using RF, the R2 value remained below 0.85 without topographic

information.

4.4. Limitations

Optimization of the model when using MLP depends not only on

a designed structure but on underlying reasons affecting the loss function

that are not well-known [89]. Thus, MLP analysis in this study might not

guarantee optimization, however, we believe that the hyperparameters

determined through numerous tests reached optimal values.

Our study is limited to regression (or classification) problems for

prediction. In addition, we did not consider the combination of RS data

with other data types. Thus, complementary studies are required to

support our findings. Nevertheless, our study provides insights on the

superior performance of RF as a machine learning approach for

regression or classification using RS data.

5. Conclusions

When applying machine learning, the appropriate algorithm

should be selected according to the data characteristics and analysis

purpose. From our results, when using data with narrow or wide value

ranges, such as index maps or reflectance values, RF notably

outperforms MLP. Thus, RF may be a better selection over MLP for

regression/classification using satellite images given the data

characteristics of the derived variables.

APPLICATION to POLICY

Subtitle: Is Water Pricing Policy Adequate to Reduce Water Demand

for Drought Mitigation in Korea?

1. Introduction

Historically, Korea has been relatively rich in water supplies and

has not had significant threats to water supply in drought. However, the

southern Seoul metropolitan area suffered from unprecedented spring

droughts during 2015–2017 [1,2]. The Korean government recognized

the necessity for developing drought mitigation policies after this

drought and begun proposing new policies since 2018. The proposed

policies are mainly targeted to long-term droughts preparation using

facilities such as dams and reservoirs to increase water supply [3].

Brears (2017) suggested that prior to planning to increase this

water supply, improved water supply could be achieved by reducing

existing water demand [4]. These reductions could be accomplished

through the conservation of water by reducing usage derived from a

water pricing system, setting public water saving targets, and

encouraging people to save water by changing their lifestyle [4].

3

When looking at California’s drought management policies that

were implemented during a drought they experienced in 2012–2016 [5],

the state seems to have improved its existing water usage, per Brears’s

(2017) [4] suggestions. The California government mentioned that the

water pricing policy implemented in the last drought was an effective

tool to reduce water demand and that it played an important role in

conserving water over the long and short-term as a result [6]. California’s

successful policy for reduction by pricing of water demand provides a

lesson [7] to Korea that existing water usage should be changed. Water

is not an endless resource; therefore water demand reduction policies

need to be implemented in addition to long-term policies that address

increasing water supply [8,9].

However, In Korea, implementation of pricing policy for water

demand reduction is challenging. The effectiveness of water pricing

policy in Korea has not been well known, and the water price elasticity

of demand in Korea has been quite different depending on the

researchers, data, and models used [10]. Not only was there a negative

response from politicians regarding the attempt to research water pricing

policies [11] but Korean citizens are sensitive to raising water prices [10],

even though the water rates are comparatively cheaper [12]. The Korean

government might have focused on long-term drought mitigation, such

as the water supply expansion, because of the difficulties of

implementing a pricing policy to reduce the water demand. Thus,

Korea’s current drought mitigation policy is still insufficient to

considering both long and short-term drought by improving existing

water usage.

In this regard, our study simulated the policy effect by assuming

that the water price policy was implemented during the spring drought

in Korea. In particular, we have estimated the amount of available

emergency agricultural water derived from the reduction in residential

water usage during the drought period. To be specific, during the drought

of Korea in 2015–2017, there was significant damage to agriculture,

while the use of residential water was inconvenienced a little. Most cities

worldwide do not face the risk of running out of the drinking water, but

agriculture is not safe from the effects of drought [8]. Agricultural

droughts require immediate mitigation because the crop death or stunted

growth by the lack of water cannot be recover. Thus, the prompt water

supply for agricultural drought mitigation is important in a drought

period.

This study investigated whether the policy of water usage

regulation by price would be effective in these severe drought areas in

Korea. We predicted the severe agricultural drought area in the region

spatially and simulated the effects of water pricing policy using the

severe drought area prediction (SDAP) model [13] and the system

dynamics (SD) model. The SDAP model was developed by our previous

research [13] based on machine learning (ML). ML can produce good

prediction performance based on big data; hence, it is already widely

used as a means of prediction in many fields. In addition, SD was

developed by Jay W. Forrester, a professor at Massachusetts Institute of

Technology (MIT), [14] and is widely used in various fields, such as the

military, politics, society, economy, and environment [15]. The SD

model, which is composed of several causal connections, looks similar

to the structural equation in that there are several independent and

dependent variables. However, unlike the structural equation, it includes

the concept of time and loop, which allows human behavior to be

analyzed dynamically. Therefore, SD is a useful tool to analyze the

effects of the policy changes in advance [16].

Based on the results from these models, we discussed whether

the water pricing policy is appropriate to reduce water demand for

drought mitigation in Korea, and the reason for the occurrence of

differences between the countries. In addition, we stated the water

demand reduction policy using the non-pricing that could be

complemented when pricing policies were not effective.

2. Methodology

2.1. Study area and Data

We conducted a case study in the southern Seoul metropolitan

area in Korea, called Gyeonggi Province. This study area is located

��

longitudes (Figure 1), has been severely affected by unprecedented

droughts during the spring from 2015 to 2017.

The south Gyeonggi Province in Korea has an average annual

precipitation of approximately 1300 mm. During the spring drought of

the south Gyeonggi Province from 2015 to 2017, however, the province

only had 50% of the usual annual precipitation [2]. Accordingly, we

simulated a policy of the water pricing effect in this southern Seoul

metropolitan area on the assumption that there was a drought in 2018 and

that water rates were raised.

Figure 1. Location of the study area.

The data used in the SDAP model were the Landsat 8 images and

the Shuttle Radar Topography Mission (SRTM) digital elevation model

(DEM) with 30 m resolution downloaded from the USGS EarthExplorer

[17]. The programming language for analysis was Python 3.6.1 version

for 64-bit windows platform and the software for spatial data processing

was ArcGIS pro.

The data used in SD model, the daily usage of water per person,

water price, population, and the information of the water source was

referenced from My Water website [18] and Korean Statistical

Information Service (KOSIS) [1]. Policy simulation was conducted

using Vensim PLE version for Windows.

2.2. SDAP and SD Model

This study has the framework of two linked models, as shown in

Figure 2, which is a structure that simulates policy by SD model based

on the result of SDAP model [13]. The SDAP model (Figure 3) predicts

the spatial distribution pattern of soil moisture after non-rainfall period

using drought function trained by random forest (RF) algorithm [13].

The SD model (Figure 4) estimates the amount of water available to the

provincial government by simulating the price increase policy for the

drought-tolerant areas predicted in the previous process. We estimate the

effectiveness of the policy through the estimated amount of water

resources.

Similar to this, simulations method using linked two models

(spatial information model and simulation model) already exist [19], and

this is continuously influenced by the internal parameters between the

two models connected to each other. In contrast, two models of our study

are driven independently like modules so that the parameters of the

simulation model are adjusted sequentially based on the predicted results

from the spatial information model. Therefore, these models are easy to

understand and apply. In addition, it can be improved by model and be

used separately for each model depending on the purpose of use.

Figu

re 2

. Fra

mew

ork

of l

inke

d th

e se

vere

dro

ught

are

a pr

edic

tion

(S

DA

P)

mod

el a

nd t

he s

yste

m d

ynam

ics

(SD

)

mod

el.

Figu

re 3

. Str

uctu

re o

f th

e S

DA

P M

odel

.

Figu

re 4

. Str

uctu

re o

f th

e S

D m

odel

2.2.1. SDAP Model: Prediction Drought Spatial Distribution

Agricultural drought was trained and predicted by the following

concept [13]. The soil moisture after non-rainfall periods remains

different depending on the condition of the present land surface [20]. In

this regard, we classified the land surface factors that affect the soil

moisture into four categories: vegetation, topographic, water, and

thermal factors during the non-rainfall period [13]. Thermal factors

reduce soil moisture, whereas vegetation retards the loss of soil moisture

by slowing down the increase in land surface heat [21]. Topography is

another important determinant of soil moisture [22]. The land initial

conditions such as existing water-containing state are also related to the

soil moisture remaining after a drought period [20]. Thus, the present

environmental conditions, such as these land surface factors, being

regressed on the soil moisture after the non-rainfall period will make

enable the short-term drought prediction of soil moisture.

Table 1 shows the 15 features (variables) that correspond to four

land surface factor. These 15 features are regressed on the soil moisture

index (SMI) of three months later of no precipitation and it is the input

variables for the RF regression [13].

Table 1. Variables of the SDAP model for prediction severe drought area.

1 �!"��#��$#��"��#"�� *$ �$�"�� <�!��>��!�*!�*��*�� @�$�� **�>��$ � Y� �� Z� �*�� \�^� �� !� ��$#� `#"�� radians.

Land Surface Factors

Input Variables Formula / Description Referen -ces

Vegetation

Enhanced vegetation index (EVI)

2.5 × ((NIR {�Red)/(NIR + 6.0 × Red {��|��Z�Blue + 1)

[23–25]

Normalized difference vegetation index

(NDVI) (NIR { Red)��}��{�� [24–26]

Soil-adjusted vegetation index

(SAVI)

((NIR { Red)/(NIR {�Red + B)) × (1 + 0.5)

[24,25,27]

Modified soil-adjusted vegetation Index

(MSAVI)

(2 × NIR Y��{�sqrt((2 × NIR + 1)2 {��Z��NIR {�Red)))/2

[25,28]

Topography

Topographic wetness index (TWI)

Ln (�/tan �) 1 [29]

Slope Degree of slope [30,31] Aspect Degree of aspect [30,32]

Water

Normalized difference moisture index (NDMI)

(NIR { SWIR1)/(NIR + SWIR1)

[25,33]

Modification of normalized difference water index (MNDWI)

(Green { SWIR1)/(Green + SWIR1)

[34]

Moisture stress index (MSI)

MidIR/NIR [35]

Thermal

Near infrared (NIR) 0.851–0.879 �> [36] Short-wavelength

infrared 1 (SWIR1) 1.566–1.651 �> [36]

Short-wavelength infrared 2 (SWIR2)

2.107–2.294 �> [36]

Thermal infrared sensor 1 (TIRS1)

10.60–11.19 �> [36]

Thermal infrared sensor 2 (TIRS2)

11.50–12.51 �> [36]

The RF algorithm that produces the drought function (hereafter

f(x)) for predicting the SMI is one of the machine learning methods

developed by Breiman [37]. The f(x) was trained the actual drought that

occurred in 2017 in this region, from March 23 to June 23

(approximately three months). We verified the f(x) in our previous study.

The training performance of f(x) was R2 = 0.91 and it can predict the SMI

of the same period drought in the other year as R2 = 0.58. Additionally,

this f(x) is characterized by the fact that the closer the drought is from the

selected training area to be trained, the higher is the accuracy, and the f(x)

should be separately generated for each region and period. We predicted

the soil moisture index of 26 June 2018 using the 15 features of 22 March

2018 and the f(x).

The SMI from Sandholt et al. [38] was used as an output variable

(target variable) for RF regression, which is suitable for representing soil

moisture during the growing season of crops [35] since this index

includes a vegetation index. Thus, the SMI is effective at predicting

agricultural drought in the spring-summer period, which is the season

examined in this study. For reference, this SMI has a real value between

0 and 1 and is most correlated with soil moisture at 20 cm soil depth [35].

We performed the following procedure to obtain a smooth SMI

map, the final the severe drought area prediction map, after a non-rainfall

period. Within the study area, 400,000 random points were generated and

then the SMI value was inserted at the points. Subsequently, predicted

agricultural drought maps were generated by the interpolation (natural

neighbor) of all the points.

2.2.2. SD Model: Simulation of Increased Water Price Policy

Implementation

Based on the predicted agricultural drought distribution map for

the three months after the above analysis, we identified drought-critical

areas. Then, we identified a water source that supplies water to the severe

drought area and then found an administrative area used this water source

jointly. The hypothetical policy simulated in this analysis applies to

severe drought areas and collective-use residents temporarily in a three-

month drought period. In a similar concept, in 2015, the California

government announced that it must reduce water use by 25% in cities

and towns of the severe drought area during the drought period [39,40].

To design a simulation model of a policy, it is preferable to create

a causal map that can represent the causal relationship between the policy

and changes in human behavior caused by the policy. Based on this

causal map, the model is transformed into a model that can be simulated

by entering a formula with variables and constants. These variables may

be cumulative or contain constants. Figure 4 illustrates the process,

which includes variables, where price increases result in the reduction of

water usage. The definitions of SD model variables for the simulation

are listed in Table 2.

Table 2. Variables of the SD model for policy simulation.

1 Average water usage per day in Korea [18]; 2 where 0.003785 is unit conversion (gallon to m3). Water fee is charged per m3; 3 existing monthly billing price; 4 water demand changes rate = water price changes rate (%) × the price elasticity of water demand; 5 severe drought area population + water source sharing area population.

Variable Type Equation

Daily water usage per person

Level water usage per day + (-) water

saving effort initial value = 48.60 gallon 1

Monthly water usage per person 2

Auxiliary water usage per day × 30 days ×

0.003785

Billing Auxiliary fixed fee + (monthly water usage

× billing rate)

Billing rate Constant Base = 0.92, Plan 1 = 1.10, Plan 2

= 1.28, Plan 3 = 1.46 (Unit: USD/m3)

Fixed fee Constant 1.50 USD

Recognition of increase water price

Auxiliary whether (desired billing < current

billing)

Desired billing 3 Constant 6.63 USD/person per month

Price elasticity of water demand

Constant {0.175

Water saving effort Auxiliary 4 the water demand changes rate

(%) × water usage per day

Population 5 Constant 7,969,432

Monthly water usage per local

Auxiliary Monthly Water Usage per person

× Population

The price elasticity of demand was primarily used to show the

change in water usage caused by price fluctuations; it is calculated as

follows:

Price elasticity of demand = >/>@/@ where Q is the quantity of the demanded good and P is the price of the

demanded good.

The price elasticity of demand for residential water is between

{�|�� {�|�� [10] in Korea, and we used the median value of

{�|�� @$"� �!�� |� } � ��$ � �$� �!� ��"�<� ��"� "�� @$"� �*!�

municipality, the water rate used for the simulation was composed of the

water billing based on a fixed fee and the usage rate.

We have assumed three policies based on increased water price.

The base rate is 0.92 USD/m3, which is the same as the current rate. Plan

1 was increased by 120% from the base rate to 1.10 USD/m3; Plan 2 was

increased by 140% to 1.28 USD/m3; and Plan 3 was increased by 160%

1.46 USD/m3. We then ascertained the amount of water saved for the

three-month period by changing the individual water use of each plan. In

particular, in Table 2, the desired billing implies the desire that the water

billing will return to the previous level, thus the desired billing is

calculated using the base rate.

We used the system dynamics model to simulate the amount of

water used and the amount of water acquired for six months in advance

for each rate plan. Ultimately, we want to identify water conservation

(1)

and, specifically, water conservation in the three months following the

drought.

3. Results

3.1. Predicted Agricultural Drought Severity Areas

As a result of applying the SDAP model, we found four potential

severe drought area and have confirmed that five farmland areas within

the study area had a lower SMI than other areas, excluding the

impervious areas (dark gray section in Figure 5). Following this, four

water sources that are used in the four predicted agricultural drought

severity areas (five administrative districts) were identified. The water

sources are shared by nine administrative districts, including the four

severe drought areas [18].

Figure 5. Predicted agricultural drought map (excluding for the

impervious areas) after non-rainfall during the three-month period from

22 March 2018 to 26 June 2018.

4.2. Simulation of Water Pricing Policy Effect

When water price increase policies were implemented in the nine

administrative districts during drought periods, the resulting individual

water usage and the amount of water available for local governments are

as shown in Figure 6. The effects in daily water usage began to appear

three months after the implementation of the policy and thus little effect

was observed during the period when water to be used for drought

mitigation needed to be secured.

Figure 6. Changes in the amount of the water demand by increased water

rate.

Table 3 shows how much water was secured each month after the

plan had been implemented. It is expected that the amount of water to be

secured during the three months of drought period will be considerably

low and will not be an effective way to control water demand.

Table 3. Quantity of available water for drought mitigation.

4. Discussion

4.1. Effectiveness of Water Demand Reduction Policy Price in

Korea

Korea needs an effective policy to restrict water demand that can

be implemented quickly during drought periods. However, our

simulation results show that the water demand control policy based on

water price increase during the 2015–2017 drought would not have been

effective in Korea. This result also supports other studies that have

shown that the price elasticity of water is inelastic in Korea and very high

water rate is required to manage water demand [10]. Thus, if the policy

to reduce water usage is based on only water price, it would not be

effective in Korea.

We considered that the different outcomes were not only due to

differences in the amount of water resources in each country, but also

owing to culture differences in the water rate recognition and water use.

Policy (Unit: Gallon)

One Month

Two Months

Three Months

Cumulative Amount

Plan 1 0 63,290 253,160 316,450

Plan 2 63,290 189,870 443,030 696,190

Plan 3 63,290 316,450 822,771 1,202,511

For instance, most Koreans are reluctant to increase water rates [10] even

though the water price is cheaper than in other countries [12]. Therefore,

drought mitigation and water demand reduction policies must be

developed considering the specific factors and situations within a region

or country and not based solely on a policy’s success elsewhere. While

the water pricing is one of many ways to reduce water demand, it is not

the only solution. For example, in California, a considerable amount of

water is utilized for watering residential lawns. The California

government, therefore, attempted to reduce the water use of individuals

by restricting this activity, which resulted in a significant water use

reduction [7]. In contrast, in Korea, there are not many houses with lawns.

Thus, restricting this activity in Korea would not be effective. It would

be more effective to reduce the demand of water, which is used

indiscriminately in everyday life such as for showers, car washes, and

dishwashing, because of the low water price.

A variety of results have been found after examining other studies

about the effectiveness of pricing for reducing water demand. While

some studies showed that water pricing is not effective at all [41], there

are also opposing studies that showed it was effective [42]. In addition,

there are several studies with the neutral position that the pricing policy

is not completely ineffective, but only effective in a short period of time

[43].

Our research aimed at confirming the effectiveness of water

pricing policy in Korea to secure emergency agricultural water. During

this process, we found a difference in the supply method of agricultural

water between Korea and California. While Korea uses the direct water

supply to the drought area, California supported the water indirectly. For

example, in Korea, the residential water is used directly as emergency

agricultural water during a severe drought. Korea is attempting to

introduce the concept as the ‘Smart Water Grid’ to develop a system for

managing and sharing water to support areas with water scarcity.

Similarly, Singapore already operates an integrated water management

system [44]. In contrast, California indirectly supported agricultural

water by excluding agriculture from water regulations. The water

demand reduction regulations of 25% only applied to residential,

industrial and commercial water use [5,7,39]. Thus, depending on the

circumstances of the region and country, both the water demand

reduction method for drought mitigation and the water supply methods

can vary. Therefore, in regions and countries facing a drought crisis,

appropriate water demand reduction policies should be designed to fit

the circumstance of each country using simulations considering both

price and non-price policies.

5.2. Non-Price Policy for Water Demand Reduction

In Korea, the non-price policies should be implemented with

high-level water pricing to achieve effective water demand reduction

[10]. There are some related studies on the effectiveness of non-price

factors for controlling water demand to support this view. For example,

one study showed that water use can be reduced by simply increasing the

frequency of water bills without a price control policy [45]. Another

study found that the non-price method for reducing water demand, such

as water-saving campaigns in times of drought or water-saving

equipment for showers and toilets, can also be effective [12].

The use of system dynamics techniques, including human

dynamics by policy, can be repeatedly verified and tested against policies

that have never been implemented, thus helping to develop policies

tailored to each country’s circumstances. In addition, the results of the

analysis can be used as a resource for public consensus for the

implementation of amicable policies. Furthermore, in Korea, SD can be

used to simulate and repeatedly revise policies that include both price

and non-price factors, creating an effective water management policy for

drought response.

Figure 7 presents our proposed combination model including

price and non-price factors that may be helpful for future research to

develop a plan and understand the relation between reducing water

demand and human actions. This SD model includes non-price factors

that can be used to reduce water demand, such as increasing the billing

frequency by using tax incomes generated from the increased water rate,

implementing water-saving campaigns to raise awareness, and supplying

water-saving household devices (such as water-saving faucets and toilet

seats), which would thereby lead to effective water saving. However,

these causal relationships along with the concepts and ideas depicted on

the graph are beyond the scope of this research.

Figu

re 7

. The

SD

mod

el i

nclu

ded

pric

e an

d no

n-pr

ice

fact

ors

for

savi

ng r

esid

enti

al w

ater

.

6. Conclusions

In this study, we used machine learning and remote sensing data

in Korea to predict the soil moisture map that will occur three months

after non-rainfall, as well as to identify areas with severe drought

conditions based on the predicted agricultural drought maps. The system

dynamics method was used to simulate the water price increase policy

for the study area and confirm the amount of water available during the

drought period. The simulation results showed that the amount of saved

water was not significant, and, therefore, the water pricing policy for

drought mitigation is not effective in Korea. However, the effectiveness

of the pricing policy for water demand reduction cannot be generalized

because the implement effects could be various depending on the

situation in each country, such as culture and water reserves differences.

Thus, the policy for drought should take an appropriate approach

depending on the situation of each country. In future studies, further

discussion of water conservation policies for drought mitigation is

possible through simulation using models including price and non-price

policies.

Depending on the purpose of a study, the use of appropriate

analytical methods is important. Traditional statistics such as linear

regression are useful for understanding causality, and machine learning

is powerful for predictive purposes. Therefore, the black box problem of

machine learning is not an important consideration because the SDAP

model aims to establish a policy based on prediction rather than the cause

and understanding of droughts. Also, system dynamics is a time-efficient

method when considering characteristics that take a long time to confirm

the effect of policy. Although there is a dispute over the validation of the

simulation results by system dynamics, no means exist to completely

verify the real world. Thus, the trend is worth mentioning even though

the resultant value is excluded. The results in Chapter 2 are meaningful

in terms of the fact that the random forest method can be first considered

using satellite imagery and machine learning for regression prediction.

In Chapter 3, the application of the model is meaningful because it shows

the entire process from the use of the model to the establishment of the

water management policy. While existing models or analyses have stated

only simple policy applicability, this thesis is significant because it

shows the whole policy-making process based on scientific methods.

REFERENCES

1. Masson-Delmotte, V.; Pörtner, H.-O.; Skea, J.; Zhai, P.; Roberts,

D.; Shukla, P.R.; Pirani, A.; Moufouma-Okia, W.; Péan, C.;

Pidcock, R.; Connors, S.; Matthews, J.B.R.; Chen, Y.; Zhou, X.;

Gomis, M.I.; Lonnoy, E.; Maycock, T.; Tignor, M.; Waterfield,

T. Intergovernmental Panel on Climate Change (IPCC) Special

�� oC; 2018. In Press.

2. Jay, L.; Josue, M.-A.; John, D.; Kathleen, S. Lessons from

California’s 2012–2016 Drought. J. Water Resour. Plan. Manag.

2018, 144, 4018067, doi:10.1061/(ASCE)WR.1943-

5452.0000984.

3. National Drought Information Analysis Center (NDIAC).

Available online: http://drought.kwater.or.kr (accessed on Aug

30, 2018).

4. European Agriculture Impacted by Drought and Water Scarcity

Available online: https://www.euroscientist.com/european-

agriculture-impacted-by-drought-and-water-scarcity/ (accessed

on Apr 25, 2019).

5. Quiggin, J. Drought , Climate Change and Food Prices in

Australia. 2010, doi:10.1109/ICSMC.2004.1401241.

6. Predicting droght Available online:

http://drought.unl.edu/DroughtBasics/PredictingDrought.aspx

(accessed on Jan 19, 2018).

7. Butler, A.; Charlton - Perez, A.; Domeisen, D.; Garfinkel, C.;

Gerber, E.; Hitchcock, P.; Karpechko, A.; Maycock, A.;

Sigmond, M.; Simpson, I.; Son, S.-W. Sub-Seasonal to Seasonal

��! ��"��#��;

Elsevier, 2018; ISBN 9780128117149.

8. Park, H.; Kim, K.; Lee, D.K. Prediction of Severe Drought Area

Based on Random Forest: Using Satellite Image and Topography

Data. Water 2019, 11, doi:10.3390/w11040705.

9. Hao, Z.; Singh, V.P.; Xia, Y. Seasonal Drought Prediction:

Advances, Challenges, and Future Prospects. Rev. Geophys.

2018, �$, 108–141, doi:10.1002/2016RG000549.

10. Mishra, A.K.; Desai, V.R. Drought forecasting using stochastic

models. Stoch. Environ. Res. Risk Assess. 2005, 19, 326–339,

doi:10.1007/s00477-005-0238-4.

11. Schepen, A.; Wang, Q.J.; Robertson, D.E. Combining the

strengths of statistical and dynamical modeling approaches for

forecasting Australian seasonal rainfall. J. Geophys. Res. Atmos.

2012, 117, 1–9, doi:10.1029/2012JD018011.

12. Hao, Z.; Xia, Y.; Luo, L.; Singh, V.P.; Ouyang, W.; Hao, F.

Toward a categorical drought prediction system based on U.S.

Drought Monitor (USDM) and climate forecast. J. Hydrol. 2017,

��, 300–305, doi:10.1016/j.jhydrol.2017.06.005.

13. Prasad, R.; Deo, R.C.; Li, Y.; Maraseni, T. Soil moisture

forecasting by a hybrid machine learning technique: ELM

integrated with ensemble empirical mode decomposition.

Geoderma 2018, 330, 136–161,

doi:10.1016/j.geoderma.2018.05.035.

14. Rezaeianzadeh, M.; Stein, A.; Cox, J.P. Drought Forecasting

using Markov Chain Model and Artificial Neural Networks.

Water Resour. Manag. 2016, 30, 2245–2259,

doi:10.1007/s11269-016-1283-0.

15. Cook, B.I.; Smerdon, J.E.; Seager, R.; Coats, S. Global warming

and 21st century drying. Clim. Dyn. 2014, 43, 2607–2627,

doi:10.1007/s00382-014-2075-y.

16. Wilhite, D.; Glantz, M. Understanding: the Drought

Phenomenon: The Role of Definitions. Water Int. 1985, 10, 111–

120, doi:10.1080/02508068508686328.

17. Park, S.; Im, J.; Jang, E.; Rhee, J. Drought Assessment and

Monitoring through Blending of Multi-sensor Indices Using

Machine Learning Approaches for Different Climate Regions.

%��"��&�� 2016, 217, 50,

doi:10.1016/j.agrformet.2016.01.040.

18. K., M.A.; R., D. V; P., S. V Drought Forecasting Using a Hybrid

Stochastic and Neural Network Model. J. Hydrol. Eng. 2007, 12,

626–638, doi:10.1061/(ASCE)1084-0699(2007)12:6(626).

19. Durdu, Ö.F. Application of linear stochastic models for drought

forecasting in the Büyük Menderes river basin, western Turkey.

Stoch. Environ. Res. Risk Assess. 2010, 24, 1145–1162,

doi:10.1007/s00477-010-0366-3.

20. Ali, Z.; Hussain, I.; Faisal, M.; Nazir, H.M.; Hussain, T.; Shad,

M.Y.; Mohamd Shoukry, A.; Hussain Gani, S. Forecasting

Drought Using Multilayer Perceptron Artificial Neural Network

Model. Adv. Meteorol. 2017, 2017, doi:10.1155/2017/5681308.

21. Mastrandrea, M.D.; Mach, K.J.; Plattner, G.; Matschoss, P.R.

The IPCC AR5 guidance note on consistent treatment of

� *"�� *$>>$ ��##"$�*!��*"$��!��$"�� <�<"$�#�|�

2011, 675–691, doi:10.1007/s10584-011-0178-6.

22. Maity, R.; Suman, M.; Verma, N.K. Drought prediction using a

wavelet based approach to model the temporal consequences of

different types of droughts. J. Hydrol. 2016, �'*, 417–428,

doi:10.1016/j.jhydrol.2016.05.042.

23. Fung, K.F.; Huang, Y.F.; Koo, C.H.; Soh, Y.W. Drought

forecasting: A review of modelling approaches 2007–2017. J.

Water Clim. Chang. 2019, doi:10.2166/wcc.2019.236.

24. Olang, L.; Ali, A.; Demuth, S.; Wood, E.F.; Yuan, X.; Sadri, S.;

Chaney, N.; Guan, K.; Sheffield, J.; Amani, A.; Ogallo, L. A

Drought Monitoring and Forecasting System for Sub-Sahara

African Water Resources and Food Security. Bull. Am. Meteorol.

Soc. 2013, *�, 861–882, doi:10.1175/bams-d-12-00124.1.

25. Hao, Z.; AghaKouchak, A.; Nakhjiri, N.; Farahmand, A. Global

integrated drought monitoring and prediction system. Sci. data

2014, 1, 140001, doi:10.1038/sdata.2014.1.

26. Causes of Drought: What’s the Climate Connection. Available

online: http://www.ucsusa.org (accessed on Jan 19, 2018).

27. Sruthi, S.; Aslam, M.A.M. Agricultural Drought Analysis Using

the NDVI and Land Surface Temperature Data; a Case Study of

Raichur District. Aquat. Procedia 2015, 4, 1258–1264,

doi:10.1016/j.aqpro.2015.02.164.

28. ��|�|^��\�>�"��|�|^��\�>�"��|��$#$<"�#!�*�

wetness index explains soil moisture better than bioindication

with Ellenberg’s indicator values. Ecol. Indic. 2018, +�, 172–

179, doi:10.1016/j.ecolind.2017.10.011.

29. Park, H.; Lee, D. Disaster Prediction and Policy Simulation for

Evaluating Mitigation Effects Using Machine Learning and

System Dynamics: Case Study of Seasonal Drought in Gyeonggi

Province. J. Korean Soc. Hazard Mitig. 2019, 19, 45–53,

doi:10.9798/KOSHAM.2019.19.1.45.

30. Breiman, L. Random forests. Mach. Learn. 2001, /�, 5–32,

doi:10.1023/A:1010933404324.

31. Rhee, J.; Im, J. Meteorological drought forecasting for ungauged

areas based on machine learning: Using long-range climate

forecast and remote sensing data. %��"��&�� 2017,

237–238, 105–122, doi:10.1016/j.agrformet.2017.02.011.

32. Danandeh Mehr, A.; Kahya, E.; O¨zger, M. A gene-wavelet

model for long lead time drought forecasting. J. Hydrol. 2014,

��5, 691–699, doi:10.1016/j.jhydrol.2014.06.012.

33. DeChant, C.M.; Moradkhani, H. Analyzing the sensitivity of

drought recovery forecasts to land surface initial conditions. J.

Hydrol. 2015, �6$, 89–100, doi:10.1016/j.jhydrol.2014.10.021.

34. Zhu, Y.; Wang, W.; Singh, V.P.; Liu, Y. Combined use of

meteorological drought indices at multi-time scales for

improving hydrological drought detection. Sci. Total Environ.

2016, �5�, 1058–1068, doi:10.1016/j.scitotenv.2016.07.096.

35. Yu, C.; Li, C.; Xin, Q.; Chen, H.; Zhang, J.; Zhang, F.; Li, X.;

Clinton, N.; Huang, X.; Yue, Y.; Gong, P. Dynamic assessment

of the impact of drought on agricultural yield and scale-

dependent return periods over large geographic regions. Environ.

Model. Softw. 2014, $6, 454–464,

doi:10.1016/j.envsoft.2014.08.004.

36. Park, S.; Seo, E.; Kang, D.; Im, J. Prediction of Drought on

Pentad Scale Using Remote Sensing Data and MJO Index

through Random Forest over East Asia. 2018, 1–18,

doi:10.3390/rs10111811.

37. AghaKouchak, A. A baseline probabilistic drought forecasting

framework using standardized soil moisture index: Application

to the 2012 United States drought. Hydrol. Earth Syst. Sci. 2014,

18, 2485–2492, doi:10.5194/hess-18-2485-2014.

38. Hao, Z.; Hao, F.; Singh, V.P.; Ouyang, W.; Cheng, H. An

integrated package for drought monitoring, prediction and

analysis to aid drought modeling and assessment. Environ.

Model. Softw. 2017, 91, 199–209,

doi:10.1016/j.envsoft.2017.02.008.

39. Mo, K.C.; Shukla, S.; Lettenmaier, D.P.; Chen, L.C. Do Climate

Forecast System (CFSv2) forecasts improve seasonal soil

moisture prediction? Geophys. Res. Lett. 2012, 39, 1–6,

doi:10.1029/2012GL053598.

40. Schäfer, D.; Samaniego, L.; Kumar, R.; Mai, J.; Thober, S.;

Sheffield, J. Seasonal Soil Moisture Drought Prediction over

Europe Using the North American Multi-Model Ensemble

(NMME). J. Hydrometeorol. 2015, �$, 2329–2344,

doi:10.1175/jhm-d-15-0053.1.

41. Korean Statistical Information Service (KOSIS). Available

online: http://kosis.kr (accessed on May 14, 2018).

42. Earth Explorer Available online: https://earthexplorer.usgs.gov

(accessed on May 11, 2018).

43. Environmental Geographic Information Service (EGSI).

Available online: https://egis.me.go.kr (accessed on May 2,

2018).

44. Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.;

Ferreira, L.G. Overview of the radiometric and biophysical

performance of the MODIS vegetation indices. Remote Sens.

Environ. 2002, 83, 195–213, doi:https://doi.org/10.1016/S0034-

4257(02)00096-2.

45. Handbook of drought indicators and indices; World

Meteorologcal Organizatio (WMO) & Global Water Partnership

(GWP): Geneva & Stockholm, 2016; ISBN 978-92-63-11173-9.

46. ��:��:��;��#��<:�� -derived Spectral

Indices; 3.6 version; Department of the Interior U.S. Geological

Survey (USGS), 2017;

47. Tucker, C.J. Red and photographic infrared linear combinations

for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–

150, doi:10.1016/0034-4257(79)90013-0.

48. Sydney, T. A Soil-Adjusted Vegetation Index (SAVI). Remote

Sens. Environ. 1988, 6�, 295–309, doi:10.1016/0034-

4257(88)90106-X.

49. Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A

modified soil adjusted vegetation index. Remote Sens. Environ.

1994, 48, 119–126, doi:10.1016/0034-4257(94)90134-1.

50. Beven, K.J.; Kirkby, M.J. A physically based, variable

contributing area model of basin hydrology. Hydrol. Sci. Bull.

1979, 24, 43–69, doi:10.1080/02626667909491834.

51. Burrough, P.A.; Mcdonnell, R.A. Data Models and Axioms.

Princ. Geogr. Inf. Syst. 1998, 17–34, doi:10.2307/144481.

52. How slope works Available online:

http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-

toolbox/how-slope-works.htm (accessed on Dec 21, 2018).

53. How aspect works. Available online:

http://pro.arcgis.com/en/pro-app/tool-reference/spatial-

analyst/how-aspect-works.htm (accessed on Dec 21, 2018).

54. Gao, B. NDWI—A normalized difference water index for

remote sensing of vegetation liquid water from space. Remote

Sens. Environ. 1996, �+, 257–266, doi:10.1016/S0034-

4257(96)00067-3.

55. Xu, H. Modification of normalised difference water index

(NDWI) to enhance open water features in remotely sensed

imagery. Int. J. Remote Sens. 2006, 27, 3025–3033,

doi:10.1080/01431160600589179.

56. Welikhe, P.; Quansah, J.E.; Fall, S.; McElhenney, W. Estimation

of Soil Moisture Percentage Using LANDSAT-based Moisture

Stress Index. J. Remote Sens. GIS 2017, =$, doi:10.4172/2469-

4134.1000200.

57. Bands Specifications of Landsat 8 Available online:

https://landsat.usgs.gov/provisional-landsat-8-surface-

reflectance-data-available (accessed on Dec 19, 2018).

58. Mishra, A.K.; Singh, V.P. A review of drought concepts. J.

Hydrol. 2010, 391, 202–216, doi:10.1016/j.jhydrol.2010.07.012.

59. Sandholt, I.; Rasmussen, K.; Andersen, J. A simple

interpretation of the surface temperature/vegetation index space

for assessment of surface moisture status. Remote Sens. Environ.

2002, 79, 213–224, doi:10.1016/S0034-4257(01)00274-7.

60. Zeng, Y.N.; Feng, Z.D.; Xiang, N.P. Assessment of soil moisture

using Landsat ETM+ temperature/vegetation index in semiarid

environment. Ieee Int. Geosci. Remote Sens. Symp. Proc. 2004,

1–7, 4306–4309, doi:10.1109/IGARSS.2004.1370089.

61. Panu, U.S.; Sharma, T.C. Challenges in drought research: some

perspectives and future directions. Hydrol. Sci. J. 2002, 47, S19–

S30, doi:10.1080/02626660209493019.

62. National Drought Mitigation Center(NDMC), Drought Basics.

Available online:

https://drought.unl.edu/Education/DroughtBasics.aspx (accessed

on Dec 26, 2018).

63. Leeuwen, B. Van GIS workflow for continuous soil moisture

estimation based on medium resolution satellite data. Agile 2015.

64. Gregorutti, B.; Michel, B.; Saint-Pierre, P. Correlation and

variable importance in random forests. Stat. Comput. 2017, 27,

659–678, doi:10.1007/s11222-016-9646-1.

65. Thompson, J.A.; Roecker, S.; Grunwald, S.; Owens, P.R.

Chapter 21 - Digital Soil Mapping: Interactions with and

Applications for Hydropedology A2 - Lin, Henry BT -

Hydropedology. In; Academic Press: Boston, 2012; pp. 665–709

ISBN 978-0-12-386941-8.

66. ��<��|^��"�<��|�� $>�@$"�� ">$�� <��

review of applications and future directions. ISPRS J.

Photogramm. Remote Sens. 2016, 114, 24–31,

doi:10.1016/j.isprsjprs.2016.01.011.

67. Körting, T.S.; Fonseca, L.M.G.; Câmara, G. GeoDMA—

Geographic Data Mining Analyst. Comput. Geosci. 2013, �5,

133–145, doi:https://doi.org/10.1016/j.cageo.2013.02.007.

68. Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo,

M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a

random forest classifier for land-cover classification. ISPRS J.

Photogramm. Remote Sens. 2012, $5, 93–104,

doi:https://doi.org/10.1016/j.isprsjprs.2011.11.002.

69. Ham, J.; Chen, Y.; Crawford, M.M.; Ghosh, J. Investigation of

the random forest framework for classification of hyperspectral

data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 492–501,

doi:10.1109/TGRS.2004.842481.

70. Xia, J.; Du, P.; He, X.; Chanussot, J. Hyperspectral Remote

Sensing Image Classification Based on Rotation Forest. IEEE

Geosci. Remote Sens. Lett. 2014, 11, 239–243,

doi:10.1109/LGRS.2013.2254108.

71. Gislason, P.O.; Benediktsson, J.A.; Sveinsson, J.R. Random

Forest classification of multisource remote sensing and

geographic data. In IGARSS 2004. 2004 IEEE International

Geoscience and Remote Sensing Symposium; 2004; Vol. 2, pp.

1049–1052 vol.2.

72. Pal, M. Random forest classifier for remote sensing

classification. Int. J. Remote Sens. 2005, 6$, 217–222,

doi:10.1080/01431160412331269698.

73. Cracknell, M.J.; Reading, A.M. Geological mapping using

remote sensing data: A comparison of five machine learning

algorithms, their response to variations in the spatial distribution

of training data and the use of explicit spatial information.

Comput. Geosci. 2014, $', 22–33,

doi:10.1016/j.cageo.2013.10.008.

74. Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs Neurons:

Comparison between random forest and ANN for high-

resolution prediction of building energy consumption. Energy

Build. 2017, 147, 77–89, doi:10.1016/j.enbuild.2017.04.038.

75. Rodriguez-Galiano, V.; Sanchez-Castillo, M.; Chica-Olmo, M.;

Chica-Rivas, M. Machine learning predictive models for mineral

prospectivity: An evaluation of neural networks, random forest,

regression trees and support vector machines. Ore Geol. Rev.

2015, 71, 804–818, doi:10.1016/j.oregeorev.2015.01.001.

76. Hayes, M.M.; Miller, S.N.; Murphy, M.A. High-resolution

landcover classification using random forest. Remote Sens. Lett.

2014, �, 112–121, doi:10.1080/2150704X.2014.882526.

77. Gilardi, N. Comparison of four machine learning algorithms for

spatial data analysis. Evaluation 1995, 1–16.

78. ��$\��|^�� |^��$�# *!��¡|�} �"$��*��$ ��$�>��-

layer feed-forward neural networks. Chemom. Intell. Lab. Syst.

1997, 39, 43–62.

79. Palafox, L.F.; Hamilton, C.W.; Scheidt, S.P.; Alvarez, A.M.

Automated detection of geological landforms on Mars using

Convolutional Neural Networks. Comput. Geosci. 2017, 101,

48–56, doi:10.1016/j.cageo.2016.12.015.

80. Maybank, J.; Bonsai, B.; Jones, K.; Lawford, R.; O’Brien, E.G.;

Ripley, E.A.; Wheaton, E. Drought as a natural disaster.

Atmosphere-Ocean 1995, 33, 195–222,

doi:10.1080/07055900.1995.9649532.

81. Causes of Drought Available online:

https://www.nationalgeographic.org/encyclopedia/drought/

(accessed on Apr 22, 2019).

82. Vitart, F.; Robertson, A.W. Chapter 1 - Introduction: Why Sub-

seasonal to Seasonal Prediction (S2S)? In Sub-Seasonal to

Seasonal Prediction; Robertson, A.W., Vitart, F., Eds.; Elsevier,

2019; pp. 3–15 ISBN 978-0-12-811714-9.

83. Gyeonggi Province Statistics Available online:

https://www.gg.go.kr/ggstat (accessed on May 14, 2018).

84. Gao, B.C. NDWI - A normalized difference water index for

remote sensing of vegetation liquid water from space. Remote

Sens. Environ. 1996, �+, 257–266, doi:10.1016/S0034-

4257(96)00067-3.

85. Liaw, A.; Wiener, M. Classification and Regression by

randomForest. R news 2002, 2, 18–22,

doi:10.1177/154405910408300516.

86. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J.

Classification and regression trees. Wadsworth and Brooks 1984.

87. Liu, M.; Wang, M.; Wang, J.; Li, D. Comparison of random

forest, support vector machine and back propagation neural

network for electronic tongue data classification: Application to

the recognition of orange beverage and Chinese vinegar. Sensors

Actuators, B Chem. 2013, 177, 970–980,

doi:10.1016/j.snb.2012.11.071.

88. Claesen, M.; De Moor, B. Hyperparameter Search in Machine

Learning. In The XI Metaheuristics International Conference;

Agadir, 2015; pp. 10–14.

89. Li, H.; Xu, Z.; Taylor, G.; Studer, C.; Goldstein, T. Visualizing

the Loss Landscape of Neural Nets. arXiv e-prints 2017,

arXiv:1712.09913.

disclaimers-space.snu.ac.kr/bitstream/10371/168099/1/0000001589… · prediction among artificial...

Documents