disclaimers-space.snu.ac.kr/bitstream/10371/168099/1/0000001589… · prediction among artificial...
TRANSCRIPT
저 시-비 리- 경 지 2.0 한민
는 아래 조건 르는 경 에 한하여 게
l 저 물 복제, 포, 전송, 전시, 공연 송할 수 습니다.
다 과 같 조건 라야 합니다:
l 하는, 저 물 나 포 경 , 저 물에 적 된 허락조건 명확하게 나타내어야 합니다.
l 저 터 허가를 면 러한 조건들 적 되지 않습니다.
저 에 른 리는 내 에 하여 향 지 않습니다.
것 허락규약(Legal Code) 해하 쉽게 약한 것 니다.
Disclaimer
저 시. 하는 원저 를 시하여야 합니다.
비 리. 하는 저 물 리 목적 할 수 없습니다.
경 지. 하는 저 물 개 , 형 또는 가공할 수 없습니다.
PhD Dissertation of Engineering
Prediction Model of Soil Drought
Distribution Using Machine Learning
Algorithms and Geospatial Data
February 2020
Graduate School of Seoul National University Interdisciplinary Program in Landscape Architecture
HaeKyung Park
ABSTRACT
Prediction Model of Soil Drought Distribution Using Machine Learning Algorithms and Geospatial Data
HaeKyung Park
Interdisciplinary Program in Landscape Architecture,
Graduate School, Seoul National University
Supervised by Professor DongKun Lee
This study presents the entire process of establishing a water
management policy based on scientific methods through drought
prediction. Accordingly, this thesis includes the development of the
severe drought area prediction (SDAP) model, verification of the
algorithms used in the proposed model, and an application of the
proposed model to policymaking. The core technology of the SDAP
model is the convergence of machine learning and geospatial science,
which makes it possible to use geospatial data instead of tabular data to
visualize prediction results.
The SDAP model was developed by considering the importance
and difficulty of forecasting short-term droughts and the allocation of
priorities for rapid water supply in terms of the related policy. The
background of the model’s development began from the fact that, despite
the advancements of science and technology, it has become more
difficult to predict the probability of precipitation and prepare for
droughts based on this probability due to the increasingly abnormal
climate that has been associated with global warming. In fact, during
2015–17, Korea suffered from unpredictable, severe spring droughts.
Such droughts are predicted to increase due to the fluctuation of
precipitation as warming increases according to “Global Warming
1.5 °C”, a special report by the Intergovernmental Panel on Climate
Change (IPCC). Thus, water management policy has become
increasingly important over recent decades. In particular, rural areas are
directly adversely affected during drought periods; hence, if soil
droughts are not quickly resolved, crop damage could affect economic
inflation and human life. The US National Drought Mitigation Center
has recommended that water supply priorities be set before the
occurrence of droughts and that they be implemented immediately when
a drought commences in order to minimize damage. However, accurate
drought predictions that are based on probabilistic precipitation are
difficult because short-term droughts (i.e., lasting from several weeks to
months) are at the boundary between weather and climate.
The SDAP model can estimate the spatial distribution of a soil
drought in advance by assuming the subsequent lack of rainfall over the
short-term as opposed to the “yes/no” prediction of a drought. The
characteristics of the SDAP model enable it to predict future droughts by
training the actual past droughts by machine learning using satellite
imagery and topographic data without precipitation data. Prediction
results by the SDAP model therefore assist in the selection of water
supply priority areas through the provision of visualized maps of the
relative severity of a soil drought. In addition, overlaying these maps
with water resources (reservoirs and groundwater) or land use maps can
also help to rearrange priorities in consideration of local conditions.
The study area in this research is the Gyeonggi Province, a
southern metropolitan area in Korea, which has experienced droughts
that are understood to be related to climate change. Python was used as
the programming language to develop the SDAP model. Each chapter of
this thesis consists of stand-alone papers with subtitles as follows.
Chapter 1: “Prediction of Severe Drought Area Based on
Random Forest: Using Satellite Image and Topography Data”. This
chapter covers the details of the SDAP model design, the consideration
of training areas, and the model’s coverage, advantages, and limitations.
The distribution of a soil drought is expressed as the soil moisture index
(SMI) with a float number type between 0 and 1. The model development
began with the idea that machine learning might allow training of the
mechanisms between soil moisture and surface environments (e.g.,
vegetation, topography, water, and temperature) during a drought.
Fifteen input variables corresponding to the surface environment were
generated using Landsat-8 imagery and a digital elevation model (DEM).
The training method belongs to supervised learning because it uses the
SMI after a period of 3 months without rainfall as output variables. As a
result, the trained drought (R2 = 0.91) predicted the SMI distribution after
3 months of no rain with a performance of R2 = 0.58 using current
landsat8 images and a DEM of the study area. The predicted soil
droughts were somewhat lower than the training performance, but the
spatial patterns were similar to the actual SMI after the actual droughts.
Thus, the SDAP model could predict the areas potentially more severely
affected when there was a drought.
Chapter 2: “Tree vs. Network: Which Is Better Machine Learning
Algorithm for Regression Prediction When Using Remote Sensing Data?”
In this chapter, in order to verify the random forest algorithm in the
SDAP model, the method is compared with the multi-layer perceptron
method, which is well-known as a non-linear regression method for
prediction among artificial neural network algorithms. Furthermore, an
attempt is made to decipher the reason for the random forest mainstream
in the remote sensing field, which is unlike other fields. For this reason,
15 training variables were divided into groups according to the data type,
and the training performance was compared. As a result, the analysis
showed a lesser performance when using data groups with either a too
small (-1 to 1) or large (0 to 10,000) range (e.g., map indexes or
reflectance values from satellite images) because the multi-layer
perceptron based on neural networks was sensitive to data ranges. In
contrast, the random forest algorithm performed better because it worked
based on a decision tree that was independent of data range or units.
Therefore, the random forest algorithm was verified to be more suitable
in the SDAP model that uses satellite image and machine learning.
Chapter 3: “Is Water Pricing Policy Adequate to Reduce Water
Demand for Drought Mitigation in Korea?”. In this chapter, the process
of planning policy by applying the SDAP model is suggested in detail.
Two models were independent: the SDAP model predicted the drought
severity and a system dynamics model simulated water-use savings by
assuming a water price increase that targeted the severe drought area.
Three policy scenarios were simulated using data that included
individual daily water usage, population, the water source location, tap
water prices, and the residential water price elasticity index. The results
showed that the visible effect was after 3 months of implementation,
which means that there were few water savings during droughts that
required a water supply, and that policy implementation was not
effective. If the price base policy is not effective as in Korea, it suggests
that the water control policy needs to be supplemented by extending the
system dynamics model to include the non-price factor.
In conclusion, appropriate analytical methods should be used for
the given purpose of a study. Traditional statistics such as linear
regression are useful for understanding causality, and machine learning
is powerful for predictive purposes. Therefore, the black box problem of
machine learning is not an important consideration because the SDAP
model aims to establish a policy based on prediction rather than the cause
and understanding of droughts. Also, system dynamics is a time-efficient
method when considering characteristics that take a long time to confirm
the effect of policy. Although there is a dispute over the validation of the
simulation results by system dynamics, no means exist to completely
verify the real world. Thus, the trend is worth mentioning even though
the resultant value is excluded. The results in Chapter 2 are meaningful
in terms of the fact that the random forest method can be first considered
using satellite imagery and machine learning for regression prediction.
In Chapter 3, the application of the model is meaningful because it shows
the entire process from the use of the model to the establishment of the
water management policy. While existing models or analyses have stated
only simple policy applicability, this thesis is significant because it
shows the whole policy-making process based on scientific methods.
Keywords: machine learning, random forest, artificial neural networks,
prediction, system dynamics, policy simulation, drought mitigation
Student number: 2016-32140
Publications
Please note that Chapters 1-3 of this dissertation proposal were written as stand-alone papers (see below), and therefore there is some repetition in the methods and results.
Chapter 1
Park, Haekyung, Kyungmin Kim, and Dong Kun Lee. “Prediction of
Severe Drought Area Based on Random Forest: Using Satellite Image
and Topography Data.” Water 11, no. 4 (2019). https://doi.org/10.3390/w11040705.
(Published, Patent No. 10-2019-0053212)
Chapter 2
Park, Haekyung and Dong Kun Lee. “Comparison of Prediction
Performance for Soil Drought Distribution Using Satellite Image:
Random Forest (RF) and Multi-layer Perceptron (MLP)”
(Submission in progress)
Chapter 3
Park, Haekyung, and Dong Kun Lee. “Is Water Pricing Policy Adequate
to Reduce Water Demand for Drought Mitigation in Korea?” Water 11,
no. 6 (2019): 1256. https://doi.org/10.3390/w11061256.
(Published)
Contents INTRODUCTION
Chapter 1. MODEL DESIGN
Subtitle: Prediction of Severe Drought Area Based on Random Forest
using Satellite Image and Topography Data
1. Introduction
2. Methodology
3. Results
4. Discussion
5. Conclusion
Chapter 2. MODEL ALGORITHM
Subtitle: Tree vs. Network: Which is better Machine Learning Algorithm
for Regression Prediction when using Remote Sensing Data?
1. Introduction
2. Methodology
3. Results
4. Discussion
5. Conclusion
Chapter 3. APPLICATION to POLICY
Subtitle: Is Water Pricing Policy Adequate to Reduce Water Demand for
Drought Mitigation in Korea?
1. Introduction
2. Methodology
3. Results
4. Discussion
5. Conclusion
CONCLUSION
REFERENCES
List of Tables Chapter 1
Table 1. Feature categories and their input variables. .....................
Table 2. Evaluation of the training performance of the drought function.
..........................................................................................................
Table 3. The importance of land surface factor on drought function.
..........................................................................................................
Chapter 2
Table 1. Data description .................................................................
Table 2. Hyperparameter ranges for RF and MLP regression. ........
Table 3. Training groups according to data characteristics. ............
Table 4. Optimal hyperparameters of machine learning using RF and
MLP ..................................................................................................
Chapter 3
Table 1. Variables of the SDAP model for prediction severe drought area.
..........................................................................................................
Table 2. Variables of the SD model for policy simulation. .............
Table 3. Quantity of available water for drought mitigation. ..........
List of Figures Chapter 1
Figure 1. Study and training areas for short-term agricultural drought
prediction. .........................................................................................
Figure 2. Structure of the severe drought area prediction (SDAP) model.
xti: input variable of the training area; y: actual soil moisture index (SMI);
xpi: input variable of the study area; and �p: predicted SMI of the study
area....................................................................................................
Figure 3. Verification of the training performance and error distribution.
..........................................................................................................
Figure 4. Feature importance of drought function. ..........................
Figure 5. Final predicted SMI map and actual SMI. .......................
Figure 6. Scatter plot between the actual and the predicted SMI of the
study area. .........................................................................................
Figure 7. (a) Relation between the actual and predicted SMI over an area
150 times (8249 km2) larger than the training area; (b) Distance from the
training area and distribution of erroneous samples. ........................
Chapter 2
Figure 1. SDAP model structure and analysis sequence in this study.
..........................................................................................................
Figure 2. Study and training area to estimate soil drought distribution.
..........................................................................................................
Figure 3. Structure of RF tree for regression. ..................................
Figure 4. Structure of MLP for regression.......................................
Figure 5. Training performance of regression model using RF and MLP
..........................................................................................................
Figure 6. Drought prediction performance using RF and MLP.......
Figure 7. Actual drought and predictions using RF and MLP .........
Figure 8. Comparison of training performance according to groups with
different data. ....................................................................................
Chapter 3
Figure 1. Location of the study area. ...............................................
Figure 2. Framework of linked the severe drought area prediction
(SDAP) model and the system dynamics (SD) model. ....................
Figure 3. Structure of the SDAP Model. .........................................
Figure 4. Structure of the SD model ................................................
Figure 5. Predicted agricultural drought map (excluding for the
impervious areas) after non-rainfall during the three-month period from
22 March 2018 to 26 June 2018. ......................................................
Figure 6. Changes in the amount of the water demand by increased water
rate. ...................................................................................................
Figure 7. The SD model included price and non-price factors for saving
residential water. ...............................................................................
According to “Global Warming 1.5 °C”, the special report of the
Intergovernmental Panel on Climate Change (IPCC), as warming
progresses, abnormal weather and precipitation fluctuations occur [1];
thus, flood and drought will increase. Such events are already occurring
worldwide. For example, during 2012–2016, California suffered an
unprecedented drought [2], and during 2015–2017, Korea also
experienced damage from an unexpected drought [3].
During drought conditions, rural areas face considerable
problems that are not experienced by urban areas [4]. Hence, if soil
droughts are not quickly resolved, crop damage could affect economic
inflation and human life [5]. The US National Drought Mitigation Center
has recommended that water supply priorities be set before the
occurrence of droughts and that they be implemented immediately when
a drought commences in order to minimize damage [6]. However,
accurate drought predictions that are based on probabilistic precipitation
are difficult because short-term droughts (i.e., lasting from several weeks
to months) are at the boundary between weather and climate [7]. The
SDAP model was developed by considering the importance and
difficulty of forecasting short-term droughts and the allocation of
priorities for rapid water supply in terms of the related policy [8].
The SDAP model can estimate the spatial distribution of a soil
drought in advance by assuming the subsequent lack of rainfall over the
short-term as opposed to the “yes/no” prediction of a drought. The
characteristics of the SDAP model enable it to predict future droughts by
training the actual past droughts by machine learning using satellite
imagery and topographic data without precipitation data. Prediction
results by the SDAP model therefore assist in the selection of water
supply priority areas through the provision of visualized maps of the
relative severity of a soil drought. In addition, overlaying these maps
with water resources (reservoirs and groundwater) or land use maps can
also help to rearrange priorities in consideration of local conditions.
The SDAP model can estimate the spatial distribution of a soil
drought in advance by assuming the subsequent lack of rainfall over the
short-term as opposed to the “yes/no” prediction of a drought [8]. The
characteristics of the SDAP model enable it to predict future droughts by
training the actual past droughts by machine learning using satellite
imagery and topographic data without precipitation data [8]. Prediction
results by the SDAP model therefore assist in the selection of water
supply priority areas through the provision of visualized maps of the
relative severity of a soil drought. In addition, overlaying these maps
with water resources (reservoirs and groundwater) or land use maps can
also help to rearrange priorities in consideration of local conditions.
The core technology of the SDAP model is the convergence of
machine learning and geospatial science, which makes it possible to use
geospatial data instead of tabular data to visualize prediction results.
Python was used as the programming language to develop the model.
The selected study area is southern Gyeonggi Province in Korea, which
has experienced severe drought due to climate change, even though it is
not a drought area historically. Each chapter of this thesis consists of
stand-alone papers with subtitles as follows.
� Chapter 1 is the design section of the SDAP model, and deals
with the random forest algorithm, model learning of drought,
model coverage, and the advantages and limitations of the model.
� Chapter 2 is the verification section of the algorithm used in the
SDAP model, which was achieved by comparing the
performance between the random forest and multi-layer
perceptron. In addition, analysis was performed to determine the
data characteristics and the reason why the remote sensing data
was more advantageous in random forest.
� Chapter 3 is the SDAP model application section where the
SDAP model is linked to the system dynamics (SD) model. The
SDAP model predicts the severe drought area and the SD model
then simulates the policy effectiveness targeting this area.
MODEL DESIGN
Subtitle: Prediction of Severe Drought Area Based on Random
Forest using Satellite Image and Topography Data
1. Introduction
The ultimate objective of drought prediction is to prepare a
mitigation plan in advance, rather than resolve intellectual curiosity
about nature. Drought forecasting plays an important role in mitigating
the negative effects of drought [9]; hence, various approaches for
predicting droughts have constantly been attempted, such as stochastic
methods, combined statistical and dynamical models, categorical
prediction, machine learning approaches, and hybrid models [10–14].
However, drought prediction is still challenging because, in addition to
precipitation deficit, complicated interactions among other variables,
such as temperature, evapotranspiration, land surface processes, and
human activities, also contribute to droughts [6,9,15]. Further,
meteorological anomalies, due to climate changes have made it
increasingly difficult to predict precipitation, which forms the basis for
forecasting drought.
1
Agricultural droughts determined by soil moisture must be
predicted several months ahead for proper and rapid resource allocation
[9], because this allocation can mitigate the effects of upcoming droughts
by supplying timely water and guaranteeing suitable crop growth and
availability of food resources. Droughts are generally classified into four
categories, namely, meteorological, hydrological, agricultural, and
socio-economic droughts [16].
In recent years, the prevalence of machine learning
methodologies, and frequent droughts and floods around the world, have
increased the prediction of agricultural drought. However, the
uncertainties of prediction caused by combination of meteorological
factors [17–20], still remain a problem. According to the
Intergovernmental Panel on Climate Change (IPCC) AR5 guidance note,
the complex use of different models, complexity of models, and
inclusion of additional processes in the analysis are the main reasons for
the increase in uncertainties [21]. Thus, the complex models used for the
integrated analysis of meteorological and agricultural drought have
increased the uncertainty in the results [21]. However, agricultural
drought prediction methods that do not include spatial information, and
are only based on precipitation, such those put forth in [19,22], cannot
predict the spatial distribution of agricultural drought. Another difficulty
in predicting agricultural drought stems from predictions based on the
meteorological pattern, such as patterns of past precipitation. Even well-
designed agricultural drought models, based on meteorological factors,
cannot make accurate predictions because of the shifts in existing
patterns, due to climate change. According to a study of agricultural
forecasting models from 2007 to 2017, (see Table 1 in [23]),
precipitation was used as an input variable in almost all the models.
Previous studies have attempted to develop drought prediction
models by using remote sensing data. For example, Sheffield et al. [24]
applied a downscaling technique to predict seasonal droughts by
combining hydrological and agricultural models. However, such
prediction models are difficult to use because they require decades of
weather data. Hao et al. [25] designed a system that predicts soil moisture
and severe drought affected areas by focusing on seasonal droughts,
which is similar to the approach used in our model. However, because of
its global scale and large system, it is difficult to use for developing
regional drought mitigation plans.
Therefore, alternative agricultural drought-forecasting methods
that are easy to use and scalable are required to realize practical drought
mitigation plans. Moreover, these methods must consider the uncertainty
of the precipitation forecasts and spatial information. To achieve this
objective herein, we designed a model, called the severe drought area
prediction (SDAP) model, to estimate soil moisture index (SMI) maps
after several months using surface factors. All variables were composed
of surface factors derived from remote sensing data, which are related to
soil moisture, to obtain spatial information of agricultural drought.
The SDAP model predicts the area of severe agricultural drought
in terms of preparation information to help mitigate agricultural drought
in case there exists a possibility of meteorological drought. This model
was proposed to help planning of priority and rapid water allocation for
predominantly drought-affected risk area in occurring a drought.
Therefore, prediction, in the SDAP model, does not imply a prediction
of the occurrence of drought. Rather, it provides information on the
future development and evolution of agricultural drought, assuming that
rainfall-deficit conditions continue to prevail, in preparation for drought
mitigation plans. We believe this is a realistic approach that avoids the
uncertainty problem associated with meteorological drought forecasting.
In order to design the SDAP model, we classified the land surface
factors that affect the soil moisture into four categories, which are
vegetation, topographic, thermal, and water factors during the non-
rainfall period. Thermal increase is one of the core factors that increases
the risk of agricultural drought by causing loss of soil moisture due to
increased evaporation [26], while the vegetation factor delays the loss of
soil moisture by slowing the increase in surface heat [27]. The
topography factor is another important determinant of soil moisture [28].
The water content status is also a factor related to the soil moisture
remaining after the drought period [9]. These environmental factors,
which are used as the initial land conditions as a predictor for drought
forecasting [9], are regressed on the soil moisture after the non-rainfall
period and can then predict soil moisture during short-term drought [29].
In this regard, our model is designed to regress 15 input variables
(derived from four surface factors) to the soil moisture index using the
random forest (RF) algorithm.
The RF algorithm, proposed by Breiman [30], is an ensemble
technique that uses an average of the multiple decision trees to reduce
overfitting and lead to sound regression performance. Furthermore, the
RF algorithm is advantageous for the analysis of large datasets; thus, it
is suitable for analysis involving satellite images. It generates a tree-
based regression function; hence, the scale adjustment between different
features is not necessary, which is a considerable advantage in cases
involving multiple features with different scales, as is the case in the
current study.
The novelty of this study is as follows: First, we used simple data.
We derived the drought function from the selected training area in the
years when drought existed. In contrast, other studies (see [10,12,31–36])
trained with the entire study area or used data for years. Second, once
created, the drought function was repeatedly used for the same season
and local conditions. This drought function was based on the regression
information for soil moisture according to the land surface and climatic
conditions in the area during a non-rainfall period. Third, the
precipitation data were not used because rainfall-deficit was assumed.
Fourth, we used topographical data, which were an important factor for
soil moisture at the local scale, as a variable. The agricultural drought
forecasting models that were developed [13,37–40] were mostly
analyzed on the global scale and mainly used satellite imagery. In
contrast, our SDAP model uses terrain data for satellite imagery to
improve the model to fit the local scale.
To design the SDAP model, the following approach was used.
We first defined four surface factors that affect the maintenance and
decrease of the SMI during non-rainfall periods. We then selected input
variables corresponding to this category of surface factors. We generated
drought function with the RF algorithm using input variables and three
months afterward SMI to train agricultural drought. Additionally, we
have identified the order of feature importance that affects drought
training (regression) among the input variables (features). The drought
function f(x) used to predict the SMI of the study area. We verified the
training and prediction performance of the drought function using the
actual SMI value.
2. Methodology
2.1. Study and Training Area
Our study area covered 12 administrative districts in the western
part of Korea, where irregular droughts have occurred in spring since
mid-2010. This area is located between 36°50’ N and 37°35’ N latitudes
and 126°30’ E and 127°35’ E longitudes (Figure 1), has been severely
affected by droughts in the springs of 2014–2017.
Figure 1. Study and training areas for short-term agricultural drought
prediction.
In the given study area, training areas for training drought should
be selected based on the following criteria. First, the area with the least
amount of precipitation should be selected to minimize the influence of
meteorological factors. In that regard, we have reviewed all of the
precipitation data [41] corresponding to the weather-observation points
in the study area. Second, it needed to find a place that includes the
various type of land cover. The extent should include enough sample to
train the drought.
In this study, we only used a part of the study area as the training
sample because we cannot control for the influence of meteorological
effects for analysis; moreover, the representative of the drought function
cannot be guaranteed. In addition, the use of the entire study area
increases the number of training samples, which significantly increases
the learning time, rendering it inefficient.
From among the candidate areas for training drought, we
extracted as a square region of 7.5 km × 7.5 km (56 km2, Figure 1) as the
final training area. The study area (approximately 3575 km2) is 20 times
larger than the training area.
2.2. Data and Trained Drought Period
We used Landsat-8 (Level 1) and the Shuttle Radar Topography
Mission (SRTM) 30 m DEM (ARC 1) data downloaded from the United
States Geological Survey (USGS) EarthExplorer [42]. In addition, the
land cover data for selecting the training area were retrieved from the
Korea’s Environmental Information Service [43]. The software for the
analysis was ArcGIS Pro version. The programming language was
Python 3.6.1 version for a 64-bit Windows 10 platform.
The duration of the drought to be used for training in our study
was a spring drought of approximately 3 months (97 days) corresponding
to the period from March 19 2017–June 23 2017. This period was
determined by considering the date of the Landsat-8 image, drought
beginning, and just before drought ending due to rain. The spring in 2017
was the worst drought since 2010, when precipitation was less than 25%
of the normal annual precipitation in the study area.
2.3. Variables
We defined four categories of surface factors (i.e., vegetation,
topographic, water, and thermal factor of the group) that can affect the
soil moisture during short-term drought periods based on previous
studies. We then selected 15 input variables corresponding to the
operational definitions of the four categories mentioned earlier. Table 1
presents these variables with their abbreviations.
Tab
le 1
. Fea
ture
cat
egor
ies
and
thei
r in
put
vari
able
s.
1 Thi
s fa
ctor
lim
it s
oil m
oist
ure
loss
; 2 this
fac
tor
is a
ffec
ting
soi
l moi
stur
e co
nten
t; 3 th
is f
acto
r is
rel
ated
to s
urfa
ce
wat
er;
4 thi
s fa
ctor
is
rela
ted
to t
he s
oil
moi
stur
e ev
apor
atio
n.
Lan
d su
rfac
e fa
ctor
s In
put
vari
able
s (1
5)
Abb
revi
atio
n
Dat
a
Veg
etat
ion
1
Enh
ance
d ve
geta
tion
ind
ex [
44–4
6]
EV
I B
and
2, 4
, 5
Nor
mal
ized
dif
fere
nce
vege
tati
on I
ndex
[45
–47]
N
DV
I B
and
4, 5
S
oil-
adju
sted
veg
etat
ion
inde
x [4
5,46
,48]
S
AV
I B
and
4, 5
M
odif
ied
soil
-adj
uste
d ve
geta
tion
ind
ex [
46,4
9]
MS
AV
I B
and
Top
ogra
phy
2 T
opog
raph
ic w
etne
ss i
ndex
[50
] T
WI
SR
TM
DE
M
Slo
pe [
51,5
2]
S
RT
M D
EM
A
spec
t [5
1,53
]
SR
TM
DE
M
Wat
er 3
Nor
mal
ized
dif
fere
nce
moi
stur
e in
dex
[46,
54]
ND
MI
Ban
d 5,
6
Mod
ific
atio
n of
nor
mal
ized
dif
fere
nce
wat
er i
ndex
[55
] M
ND
WI
Ban
d 3,
6
Moi
stur
e st
ress
ind
ex [
56]
MS
I B
and
5, 6
The
rmal
4
Nea
r in
frar
ed [
57]
NIR
B
and
5 S
hort
-wav
elen
gth
infr
ared
1 [
57]
SW
IR1
Ban
d 6
Sho
rt-w
avel
engt
h in
frar
ed 2
[57
] S
WIR
2 B
and
7 T
herm
al i
nfra
red
1 [5
7]
TIR
S1
Ban
d 10
The
rmal
inf
rare
d 2
[57]
T
IRS
2 B
and
11
We calculated all the input variables as follows, according to the
USGS guidelines [46] and references, as shown in Table 1. The thermal
factor consists of the five bands of Landsat-8, without the calculation.
��� = 2.5 × ��� � ����� + 6.0 × �� – 7.5 × ��� + 1.0
� �� = ��� � ����� + ��
���� = ��� � ����� � �� + × 1.5
����� = 2 × ��� + 1 – �(2 × ��� + 1)� – 8 × (��� � ��)2
��� = ln �tan �
where a (m2) is the upstream contributing area and b is the slope.
� �� = ��� � �������� + �����
�� �� = ����� � ���������� + ����� where Green is a green wavelength band, which is band 3 in the Landsat-
8 images.
��� = ��������
(1)
(2)
(3)
(4)
(5)
(7)
(8)
(6)
Soil moisture can be estimated using various methods and indices,
such as Soil Moisture Anomaly (SMA), Evapotranspiration Deficit
Index (ETDI), Soil Moisture Deficit Index (SMDI), and Soil Water
Storage (SWS) [45]. Accordingly, the appropriate indices must be
selected depending on the purpose of the drought analysis [58]. We
reviewed several methods to obtain the SMI for this study and found that
the SMI derived by Sandholt et al. [59], calculated using NDVI and LST,
using moderate-resolution satellite images, (such as Landsat-8) was the
most appropriate for short-term drought prediction between spring and
autumn. Welikhe et al. [56] suggested that this SMI calculation is
particularly advantageous in estimating soil moisture during growing
periods. The results of their study indicated that this SMI showed the
highest correlation with real soil moisture at a depth of 20 cm. The
formula used to calculate the SMI is presented as follows [59,60]:
��� = ����� � ������� � ����!
where Ts max and Ts min is the maximum and minimum surface temperature
observation for a given NDVI.
Subsequently, we constructed a drought function using the
aforementioned 15 input variables of the training area on March 19, 2017
and the SMI output variables on June 23, 2017 via RF. Figure 2 shows
the structure of the SDAP model including generation of the drought
function and prediction of the drought-severe area.
(9)
2.4. Drought Function
We performed machine learning via RF, using 62,500 (250 × 250)
data samples from the training area, in order to generate the drought
function f(x). All samples were split by training (75%, n = 46,875) and
test (25%, n = 15,625) datasets after 100 shuffles for all samples. We
then used the RandomForestRegressor python function of the ensemble
module of the scikit-learn library for machine learning. When fitting the
f(x), max_depth is the most important parameter that can be used to
prevent overfitting or underfitting of data during training. The optimal
max_depth was 14 and was found by tuning the coefficient of
determination (R2) of the training dataset maximized, while minimizing
the R2 difference between the training and the test dataset.
We verified the performance of f(x) using root mean square error
(RMSE), normalized RMSE (NRMSE), mean absolute error (MAE) and
R2. In addition, we confirmed the spatial distribution of error by mapping.
���� = "�# $ (% � %&)�#'*�
����� = ����%��� � %��! × 100(%)
��� = 1� -|% � %&|#'*�
where y is the actual SMI, � is the predicted SMI, ymax is a maximum
actual SMI and ymin is a minimum actual SMI.
(10)
(12)
(11)
Figu
re 2
. Str
uctu
re o
f th
e se
vere
dro
ught
are
a pr
edic
tion
(S
DA
P)
mod
el. x
ti: i
nput
var
iabl
e of
the
tra
inin
g ar
ea;
y:
actu
al s
oil
moi
stur
e in
dex
(SM
I);
x pi:
inpu
t va
riab
le o
f th
e st
udy
area
; an
d � p
: pr
edic
ted
SM
I of
the
stu
dy a
rea.
2.5. Feature Importance
The tree-based RF regression can confirm the importance of
features using the feature_importances_ function of the ensemble
module of scikit-learn libraries after fitting. We found the effect of each
input variable on the retention and loss of soil moisture, during non-
rainfall periods, by sorting the input variables according to their
individual importance using two functions. In addition, the feature’s
importance of each category (i.e., vegetation, topography, water, and
thermal factor) was summed to obtain the category importance order.
2.6. SMI Prediction on the Study Area
We estimated the SMI of the study area for June 2015, using the
data of 14 March 2015, by applying the drought function to predict the
agricultural drought of 18 June 2015. These are suitable data for
predicting agricultural drought because a real drought happened in that
area between March and July in 2015. The best verification for the SDAP
model is the drought case since 2017, which is the year of the data source
of the drought function. However, drought did not occur after 2017;
hence, the closest case of the past was an alternative. As previously
mentioned, prediction in this study means the future some months after;
thus, the context of year does not matter.
We tried to use the Landsat-8 image on 18 June 2015 (after 97
days from 14 March 2015) for verification, but we alternatively used an
image from 4 July 2015 (after 113 days from 14 March 2015) because of
the many clouds in the image. For reference, the drought function trained
the drought of 97 days.
The final prediction SMI map (the agricultural drought map) of
the study area was obtained by the following process (see prediction
section in Figure 2). We generated 400,000 random points of the study
area. The predicted SMI value of those random points (�p) were obtained
via f(x) using the input variables (xpi) on 14 March 2015. After that, the
final SMI map was completed by interpolating the SMI value of the
random points. We validated this predicted SMI value of random points
with the actual SMI calculated using the Landsat-8 (4 July 2015) and
SRTM DEM.
3. Results
3.1. Training Performance
Confirmation of the training performance and error distribution
of the drought function are shown in Figure 3 and Table 2. R2 showed a
training performance of 0.91, and the error distribution was not
concentrated on a specific land cover. Thus, it was considered to be a
drought function (training area) that can represent drought in this area.
Figure 3. Verification of the training performance and error distribution.
Table 2. Evaluation of the training performance of the drought function.
1 The actual maximum SMI value is 1; 2 The actual minimum SMI value is 0.
3.2. Feature Importance
Figure 4 shows the importance of each input variable for f(x).
Table 3 lists the total importance by category (i.e., thermal, water,
topography, and vegetation features). The results showed that the
thermal factor was the most important, but only SWIR1 had low
explanatory power on the soil moisture. Particularly, among the thermal
images of Landsat-8, TIRS1 was identified as a better predictor of soil
moisture than TIRS2. The slope was the most important of the
topographic features. The importance of water or vegetation features was
similar.
RMSE NRMSE MAE R2 Max. SMI 1 Min. SMI 2 Max. Error
0.05294 5.29% 0.03980 0.91 0.97940 0.05684 0.30352
Figure 4. Feature importance of drought function.
Table 3. The importance of land surface factor on drought function.
3.3. SMI Prediction and Validation
Figure 5a illustrates the map of agricultural droughts after 3
months (97 days) of non-rainfall period, predicted using the input
variables of 14 March 2015 and the drought function. Actual SMI map
calculated from the Landsat-8 images on 4 July 2015, 119 days after 14
March 2015.
Land Surface Factors Importance
Thermal 0.659
Topography 0.238
Water 0.059
Vegetation 0.044
Figure 5. Final predicted SMI map and actual SMI.
Figure 6 shows a scatter plot of the predicted SMI and the actual
SMI for validation. From the validation results, R2 = 0.58, which was
lower than the training performance (R2 = 0.91). However, it shows
clearly the potential severe drought area comparing with Figure 5b as the
actual SMI of drought.
Figure 6. Scatter plot between the actual and the predicted SMI of the
study area.
4. Discussion
4.1. General Information on the SDAP Model
The appropriate training area, the selection of appropriate
drought period, and the number of suitable random points to cover the
study area are important in utilizing the SDAP model. Therefore, the
following factors should be considered in this model. First, in the training
area selection, similar proportions of different land cover types should
be chosen. If a particular land cover type is dominant in the training area,
the model is prone to errors in areas under the less representative land
cover types. In addition, irrigated areas should be avoided to insulate
against human influence on the output variable SMI.
Second, the SDAP model uses only land surface factors; hence,
it is recommended to develop the drought function using data from the
year in which the precipitation was the lowest (i.e., the drought year for
modeling should have had the least rainfall possible for minimizing the
interferences of meteorological factors). In this study, the short-term
drought function was trained using data from March to June (97 days).
However, drought periods can be freely selected within a one-year period
(e.g., May–July), depending on the application, region, and country. It
means this model has transferability to other locations with different
conditions by generating drought function using the local data of the
study area to be analyzed.
Third, enough random points should be allocated for the
estimation and subsequent generation of the prediction map after
interpolation. Given that the SMI is a float number between 0 and 1, if
few points are available for estimation, the SMI pattern will not appear
after the interpolation. In fact, providing a threshold for the number of
appropriate points is difficult (it will vary depending on the scope of the
study area and the variety of land cover). However, increasing the
number of points by an adequate margin is recommended when the
results do not exhibit a clear pattern. In this study, we confirmed suitable
pattern generation for more than 400,000 points, whereas 300,000 points
were insufficient to show the trend of agricultural drought.
4.2. Coverage of the Trained Drought Function
This study shows that the function trained by the RF algorithm
was predicted as R2 = 0.58 over an area approximately 20 times larger
than the training area. However, determining the predictable coverage
using the trained function is difficult, given regional deviations, but
prediction accuracy is negatively correlated with the distance from the
training area and decreases as the prediction area increases.
We conducted further analysis to verify the predictable coverage.
We specifically predicted the SMI of a training area approximately 150
times larger than the training area and obtained a decrease in R2 from
0.58 to 0.39. As shown in Figure 7a, a region with a sharp decrease in
correlation between the actual and predicted SMI was observed. The
distribution of these samples was confirmed on the map. Figure 7b shows
that the highest error mostly occurred far from the training area in largely
heterogeneous areas. In contrast, the samples with a low correlation
occurring near the training area were mostly identified as water bodies,
and the error can be caused by the lack of sufficient water samples for
training.
Figure 7. (a) Relation between the actual and predicted SMI over an area
150 times (8249 km2) larger than the training area; (b) Distance from the
training area and distribution of erroneous samples.
4.3. SDAP Model Advantages
4.3.1. Prediction of Severe Drought Areas
Information on drought severity is the most practical and
important information for mitigating an upcoming drought [61]. The
objective of this study was to predict the distribution of soil moisture
assuming non-rainfall conditions. According to the National Drought
Mitigation Center (NDMC), it is necessary to make a plan for drought
before beginning, confirm the priority of water dependence on
agriculture and the community, and to immediately apply this step-by-
step when water begins to become insufficient [62]. In this step, the
SDAP model can help by supporting spatial information on droughts in
advance, and allow effective and planned resource allocation to reduce
the impact of upcoming droughts.
The SDAP model can show future severe agricultural drought
areas through non-rainfall periods because the method can reduce the
uncertainty by separating two models (i.e., meteorological and
agricultural droughts) with different features in the analytical results [21].
Under non-rainfall conditions, the SDAP model works on the
assumption that meteorological drought forecasting is preceded. Thus,
the SDAP model can analyze drought without considering
meteorological data.
However, the absence of meteorological data for analysis does
not mean that we have excluded climate factors. Machine learning on the
growth of soil drought is a learning of the trend caused by climate and
seasonal changes in the area. Therefore, the further away from the
training area, the more the climate and the weather become different;
hence, the prediction error of the SMI increases.
We found that one of the studies [13] predicted soil moisture
using hybrid machine learning without climate data, with a similar
concept to the SDAP model. However, that study cannot show the spatial
distribution of the SMI because of the usage of the actual measured data
from the soil of only seven points, not the remote sensing data. In fact,
the prediction by [13] aimed to only predict the soil moisture of seven
trained points using training data (corresponding to the validation of the
performance of the drought function in our study); hence, this is just an
assessment of the training performance, and not prediction. However, in
our study, the training data were only used to generate the drought
function, and different data from another year were used for prediction.
Another study [22], using only meteorological data, cannot be used in
practical drought mitigation plans because of the exclusion of the spatial
distribution of agricultural drought in the prediction results.
In contrast, the results of the SDAP model can show clear SMI
trends including the severe area of the SMI (Figure 5). The mitigation of
Short-term drought requires the SMI spatial trend information rather than
accurate SMI values [9,63]. In this model, although the SMI error
(RMSE = 0.382, MAE = 0.375) of the predicted study area increased
compared to the training performance of the drought function (RMSE =
0.052, MAE = 0.039), our model has achieved the aim sufficiently
because it can show the agricultural drought trend clearly.
4.3.2. Feature Importance
We found that the thermal factor is one of the major factors
affecting soil moisture during a drought period by confirming the
findings of Sruthi and Aslam [27], who noted that increased temperature
reduces soil moisture. Furthermore, TIRS1 in the thermal factor was the
most important feature in the drought function (Table 3). However, one
should be cautious when considering the feature importance. For
example, relative vegetation features have been shown to be of low
importance in our model. However, according to Sruthi and Aslam,
vegetation and temperature were strongly correlated with each other [27];
hence, vegetation might play an important role in preserving soil
moisture. Therefore, feature importance should be carefully considered
based on other studies.
We obtained this variable importance information because we
used the RF algorithm for drought training. The RF algorithm has the
advantage of providing variable importance; hence, it is also used to
extract important variables and remove unnecessary variables when
creating a model [64]. One case involved the estimation of the relative
importance of variables related to soil hydrological properties in the field
of hydrology using this importance function [65].
However, whether the correlation between the SMI and feature
importance is negative or positive, is unknown. Therefore, it is also
helpful to reference the multi-linear regression coefficients to understand
feature importance. Comparing the feature importance of the drought
functions, calculated from various regions and periods, can provide
important insights into drought studies related to soil moisture.
4.4. Limitations and Applications of the SDAP Model
The predicted SMI, using machine learning applied to satellite
images with the resolution of 30 m, is a suitable drought information for
planning the mitigation of short-term drought at the regional level
because the local spatial distribution is acquired. However, some general
remote sensing limitations exist, such as difficulty in obtaining data of
the right date to be analyzed because of cloud or revisit cycle of the
Landsat-8. This limitation can be addressed by considering using other
satellite images with a higher temporal resolution. This solution remains
to be confirmed in further studies. Along with the general limitations of
remote sensing, this SDAP model has the following limitations:
First, the SDAP model applies only to the prediction of short-
term droughts, within several months, because the SMI is a suitable
measure for that timescale [56]. In practice, the soil will become dry,
regardless of the surface state if non-rainfall conditions persist for longer
periods. Therefore, the SDAP model is suitable for short-term droughts
that can be mitigated by human action within a few months.
Second, the prediction of droughts may be difficult if the area has
no drought experience or different seasons, because this model is based
on learning from past droughts. Therefore, various drought functions by
period or season must be considered to make a suitable prediction of the
situation.
Third, this model must be based on a sequential approach to
finding solutions by analyzing droughts by each expert group. Thus, it is
not suitable for an integrated method of calculating meteorological,
agricultural, and hydrological droughts at one time because this model
predicts the trend of agricultural drought change after the meteorological
drought analysis is preceded. For example, the spatial information of
agricultural drought, which is the result of this model, is then subject to
a primary review against the corresponding hydrological drought. After
all the analyses have been conducted, and the areas of concern droughts
have been identified, priority plans for water reserves (e.g., water
demand control) and water allocation for these areas should be
established. Therefore, the proposed model is only suitable to prepare for
short-term droughts and find focalized solutions by each expert group,
instead of performing an integrated analysis of various models with high
uncertainty.
5. Conclusions
Agricultural drought is a disaster that must be managed
effectively as it directly affects food security. For this reason, several
models have been developed to predict agricultural drought. However,
regardless of how good a model is, inaccuracies arise in the prediction of
droughts, due to meteorological anomalies where the model is based on
repeated patterns or historical precipitation data. The proposed SDAP
model was hence developed by accepting those uncertainties, rather than
overcome meteorological uncertainties. This SDAP model does not
depend on precipitation data because it predicts potential drought-severe
areas under the assumption that rainfall-deficit conditions already exist.
In addition, existing other models can be catastrophic in the event
that an unpredicted drought occurs. However, given that it makes
prediction under the condition of no rainfall such as that done in our
model, there is a benefit for society even if no drought occurs. Although
our model is designed for a short-term drought, accumulation of these
predictions also helps us identify areas that are susceptible to drought so
that mitigation plans can be prepared from a long-term perspective.
Given that the ultimate objective for predicting droughts is to
develop mitigation plans, the model must be a simple, practical, and
useful such as the SDAP model. This model is easy to understand as it
uses a regression algorithm compared to other huge prediction systems.
Further, the analysis of various case studies for many regions and
drought functions in SDAP model can provide meaningful insights into
the key factors associated with drought. At present, we are working on a
comparison study of machine learning algorithms to improve the SDAP
model as well as a case study involving the application of the SDAP
model.
MODEL ALGORITHM
Subtitle: Tree vs. Network: Which is better Machine Learning Algorithm
for Regression Prediction when using Remote Sensing Data?
1. Introduction
Random forest (RF) is one of the ensemble technique based on
decision-tree, is most used as a classifier for machine learning on remote
sensing (RS) and geographic data [66]. This is because the importance
of variables to parameter selection can be easily derived when using RF
[64], and hence obtaining relevant variables from high-dimensional data
becomes straightforward [67]. Furthermore, there are many studies [68–
73] on the relation between RF and RS for classification, and even an RS
study [36] without fully explains such appropriateness of using RF exists,
as RF is already known to be suitable for RS applications. However,
artificial neural networks (ANNs) seems to be most widely used in other
fields. In fact, one study provides more accurate results than RF and even
stated that RF has been neglected in some fields, such as building
research [74]. Nevertheless, RF algorithms remains dominant in RS
along with ANNs.
2
In addition, some studies have already confirmed that RF can
outperform ANNs as a machine learning approach for RS data.
Rodriguez-Galiano et al. [75] obtained a better performance from RF
than from ANNs when mapping mineral prospects for classification
using Landsat images. Hayes et al. [76] found that an RF classifier of
land cover combining RS data with subsidiary data is effective and very
accurate. However, such studies were restricted to simple comparisons
of RF and ANN classification. Unlike similar comparative studies
[73,75,77], which only evaluated algorithm performance, we aimed to
comprehensively unveil the relation between RS data characteristics and
training performance based on RF and a basic ANNs architecture, multi-
layer perceptron (MLP) [78]. Furthermore, we tried to both identify
whether RF or MLP regression prediction performed better on RS data
using the severe drought area prediction (SDAP) model introduced in our
previous study. As RS data, it means reflectance, elevation, and map
index values and was exclusively considered the regression problem for
regression. To illustrate, other problems such as object detection in
satellite images, which are implemented using convolutional neural
networks [79], are beyond the scope of this study.
Most drought prediction models are based on precipitation data
regardless of the type of drought because agricultural droughts (soil
drought) and hydrologic drought occur sequentially after meteorological
droughts [80]; eventually, these lead to a socio-economic drought. In
addition, droughts generally occur by the interruption of periodic
weather patterns that disrupt hydrological circulation [81]. However, in
recent years, unprecedented droughts are increasingly occurring given
that climate change deviates precipitation more frequently from the
regular patterns. Thus, predicting droughts is becoming more
challenging despite scientific progress.
Although uncertainty in precipitation forecasts, most drought
prediction models, published from 2007 to 2017 (see [23]), still mainly
rely on patterns from rainfall data, or precipitation data. Consequently,
these models require either massive weather data or large systems
observed over several years or even decades, sometimes requiring a
global scale (e.g., [13,37–40]). Thus, accurate drought prediction usually
depends on the availability of huge historical data records. Sometimes,
the scale problems by semi-global analysis are not suitable to implement
regional drought mitigation plans. Furthermore, even though the models
even provide very exact drought probability, it could be insufficient to
devise mitigation policies given not using geospatial data (see [22]).
When developing the SDAP model, we focused on not using
precipitation data and spatially predict short-term drought; specifically,
soil drought distribution (agricultural drought) as it is directly related to
food security was dealt with [8]. This period, several months, is neither
climate nor weather, which is almost impossible to predict [82]. Thus,
short-term drought prediction based on precipitation is unrealistic.
Unlike drought probability methods, the SDAP is not a binary
(yes/no) drought prediction model but provides the spatial distribution of
soil drought assuming 3 months of no rainfall [8]. This model is intended
to give information for allocating water resources with priority to areas
where soil droughts are predicted to be more severe by assuming
meteorological drought. The structure of SDAP is illustrated in Figure 1.
Figu
re 1
. SD
AP
mod
el s
truc
ture
and
ana
lysi
s se
quen
ce i
n th
is s
tudy
.
SDAP requires 15 input variables and 1 output variable [8]. The
inputs were selected based on existing studies [26–28] related to the loss
and maintenance of soil moisture during meteorological droughts and
correspond to four surface factors (i.e., vegetation, water, thermal, and
topographic factor). The output, the soil moisture index (SMI), is one of
the representative indexes for soil drought [45]. In this study, we
compared the performance of the SDAP algorithm by implementing
regression with either RF and MLP and following the analysis sequence
itemized with circled numbers in Figure 1.
2. Methodology
2.1. Study Area and Training Area for Supervised Learning
This study covers an area (~3,575 km2) of the southern part of
Korea's metropolitan area, which is located between the longitudes
126°30'–127°30' and latitudes 36°30'–37°30' (Figure 2). This area has
historically been stress-free from drought but experienced unprecedented
severe spring drought in 2015–2017 [83]. Korean experts on climate
change consider that the abnormal droughts may be increasing gradually.
Hence, drought in this area should be predicted in a timely manner to
allocate water in advance, especially when unexpected meteorological
droughts occur. Thus, this area was suitable for applying the SDAP
model.
Figure 2. Study and training area to estimate soil drought distribution.
The training area (~56 km2) for supervised learning used part of
the study area (Figure 2) and selected drought period for training was
March–June 2017 (3 months), when the spring drought was worst during
2015–2017 in the study area; the worse drought gave a better prediction
because it means the lessor intervention of meteorological factors [8].
Although the size or shape of the training area is not important,
it is essential to include a variety of land use, exclude human intervention
factors such as irrigation, and have sufficient data to learn natural
phenomena [8]. We used 62,500 samples from the training area for
learning the drought; in our previous studies [8], the predictive
performance declined if less than 50,000 samples were available.
Non-rainfall of the samples must be guaranteed for good training
because weather conditions can vary locally. Thus, we thoroughly
reviewed rainfall data from all weather stations in the study area and
selected the training area near a station where drought was predominant
[8]. These procedure to select the training area ensures non-rainfall and
to use the training area instead of the whole provides time efficiency.
2.2. Data description and Processing
Features (15 input variables) from the 62,500 samples were
extracted using a Landsat-8 image from March 19, 2017 and the digital
elevation model of the Shuttle Radar Topography Mission. The SMI was
then calculated using a Landsat-8 image from June 23, 2017. All data
were downloaded from the Earth Explore site [42]. The details and
calculation formulas for the variables are listed in Table 1.
The programming language for the machine learning
implementation was Python 3.6 running on 64-bit Microsoft Windows
10. RF was implemented using RandomForestRegressor from the Scikit-
learn library, and MLP was implemented using the Keras library. ArcGIS
Pro was used to calculate the image index map and topography attributes.
The 62,500 samples were divided into 46,875 (75%) samples for training
datasets and 15,625 (25%) for test datasets after 100 shuffles. In addition,
6875 samples (15% of the training dataset) were used as the validation
dataset for the MLP.
Tab
le 1
. Dat
a de
scri
ptio
n S
urfa
ce
fact
or
Var
iabl
e S
ymbo
l F
orm
ula
Val
ue t
ype
Dat
a ty
pe
Val
id r
ange
Veg
etat
ion
Enh
ance
d V
eget
atio
n In
dex
E
VI
[44]
2.
5 ×
((N
IR –
Red
)/(N
IR +
6.0
×
Red
– 7
.5 ×
Blu
e +
1))
F
loat
In
dex
-1 t
o 1
Nor
mal
ized
Dif
fere
nce
Veg
etat
ion
Inde
x
ND
VI
[47]
(N
IR –
Red
)/(N
IR –
Red
) F
loat
In
dex
-1 t
o 1
Soi
l-A
djus
ted
Veg
etat
ion
Inde
x S
AV
I [4
8]
((N
IR –
Red
)/(N
IR –
Red
+ 0
.5))
×
(1 +
0.5
) F
loat
In
dex
-1 t
o 1
Mod
ifie
d S
oil-
Adj
uste
d V
eget
atio
n In
dex
MS
AV
I [4
9]
(2 ×
NIR
+ 1
– s
qrt(
(2 ×
NIR
+ 1
)2 –
8 ×
(N
IR –
Red
)))/
2 F
loat
In
dex
-1 t
o 1
Wat
er
Nor
mal
ized
Dif
fere
nce
Moi
stur
e In
dex
N
DM
I [8
4]
(NIR
- S
WIR
1) /
(N
IR +
SW
IR1)
F
loat
In
dex
-1 t
o 1
Mod
ific
atio
n of
Nor
mal
ized
D
iffe
renc
e W
ater
Ind
ex
MN
DW
I [5
5]
(Gre
en –
SW
IR1)
/ (
Gre
en +
S
WIR
1)
Flo
at
Inde
x -1
to
1
Moi
stur
e S
tres
s In
dex
MS
I [5
6]
Mid
IR /
NIR
F
loat
In
dex
-1 t
o 1
The
rmal
[5
7]
Nea
r In
frar
ed
NIR
0.
851–
0.87
9 �
m
Inte
ger
Ref
lect
ance
0
to 1
0,00
0
Sho
rt-W
avel
engt
h In
frar
ed 1
S
WIR
1 1.
566–
1.65
1 �
m
Inte
ger
Ref
lect
ance
0
to 1
0,00
0
Sho
rt-W
avel
engt
h In
frar
ed 2
S
WIR
2 2.
107–
2.29
4 �
m
Inte
ger
Ref
lect
ance
0
to 1
0,00
0
The
rmal
Inf
rare
d 1
TIR
S1
10.6
0–11
.19
�m
In
tege
r R
efle
ctan
ce
0 to
10,
000
The
rmal
Inf
rare
d 2
TIR
S2
11.5
0–12
.51
�m
In
tege
r R
efle
ctan
ce
0 to
10,
000
Top
ogra
phy
Top
ogra
phic
Wet
ness
Ind
ex
TW
I [5
0]
Ln
����
�tan
��
Flo
at
Inde
x 0
to 5
0
Slo
pe
- D
egre
e of
slo
pe
Flo
at
Deg
ree
0 to
90
Asp
ect
- D
egre
e of
asp
ect
Flo
at
Deg
ree
-1 t
o 36
0
Tar
get
Soi
l m
oist
ure
inde
x S
MI
[59]
(
T max
– T
)/(T
max
– T
min
) F
loat
In
dex
0 to
1
2.3. Machine learning algorithm for regression prediction
2.3.1. Random forest
RF algorithm proposed by Breiman [30] is an ensemble
technique that uses the average of multiple decision trees to establish a
regression model. Two parameters are the most important to perform RF
regression, namely max_feature, the number of variables to be randomly
selected for segmentation at a node, and max_depth, the depth of the tree.
Figure 3 illustrates a regression tree with max_feature set to 3 and
max_depth set to 4 to show RF tree structure using our study data.
Figure 3. Structure of RF tree for regression.
The procedure from training to prediction using RF regression
can be summarized as follows [73,74,85]. First, bootstrap samples are
drawn from the training data. Then, regression trees for each bootstrap
sample are grown by randomly setting a predefined number of variables
(max_feature) to split at each node of the decision trees until there are no
further splits or until a limit is reached, such as the depth of the tree
(max_depth). At each node, the best split is selected among a randomly
selected subset of input variables. The Gini index is used to set the best
split threshold of input values for a given output value [73,86]. These
steps are repeated until N trees are created, as in T1(X), T2(X), …, TN(X),
where X = x1, x2, . . ., xm is an m-dimensional vector of inputs. Finally,
an ensemble model, the RF regression model, is created by averaging the
N regression trees, and prediction can be obtained using the resulting
function:
/34(9) = 1� - �(:);;*�
2.3.2. Multi-layer perceptron
There are many types of ANNs, however, their operation
principles are similar, with nonlinear classification/regression being the
most basic architecture for an ANN [78]. As our research is limited to
regression for prediction, we only provide a summary of neural networks
and prediction procedures using MLP, which is a multi-layer forward
neural network and one of the most basic types of ANNs [20,87]. MLP
is widely used for solving regression problems that require supervised
learning using classification or quantity data for prediction.
A neural network aims to find relationship between input and
output by minimizing the prediction error between the actual and desired
outputs (i.e., class or quantity labels). As the neurons can find the
(1)
relationship for regression between multiple inputs and one output; we
used this architecture to implement regression, the relationship between
the 15 surface factors and soil moisture during the drought period.
In a neural network, neurons are placed in layers, and they have
stronger connections to those neurons in other layers with more
correlation (weight). MLP consists of an input layer, one or more hidden
layers, and an output layer, as illustrated in Figure 4.
Figure 4. Structure of MLP for regression.
Training using MLP for prediction needs selecting a suitable
structure among layers (including the number of neurons per layer) and
requires the proper initialization of weights and learning rate to prevent
overfitting [75]. Thus, most important issue for the researchers is to find
the optimal hyperparameters.
Generally, the following procedure is adopted for quantity
prediction using supervised learning and MLP [20,74]. First, data are
divided into training, test, and validation datasets. Then, to create a
regression model, optimal hyperparameters architectured by tuning.
Finally, the regression model is validated and then used for prediction.
2.4. Training and Validation of Regression Model
Optimization of regression model using machine learning is
achieved by finding the hyperparameters that minimize a predefined loss
function on given data [88]. We performed hyperparameter optimization
using grid search and the hyperparameter ranges listed in Table 2. In
MLP training, we used the activation function lelu in the hidden layers.
In addition, we performed data scaling; that is, the input variables were
normalized to 0–1, which is the main difference with RF.
Table 2. Hyperparameter ranges for RF and MLP regression.
1 Mean squared error
Both optimized models, fRF(x) and fMLP(x) in the area shown in
Figure 1, were validated using various evaluation metrics, namely
coefficient of determination R2, mean squared error (MSE), mean
absolute error (MAE), and root-mean-square error (RMSE):
RF MLP
Criterion MSE1 Loss function MSE
max_depth 2–20 Hidden layers 1–20
max_feature 2–15 Neurons 3–30
n_estimators 100–2000 Epochs 100–2000
��� = 1� -(% � %&)�#'*�
��� = 1� -|% � %&|#'*�
���� = <1� -(% � %&)�#'*�
2.5. Prediction Performance
To compare the prediction performance of each drought function,
fRF(x) and fMLP(x), we predicted SMI of the 3-month later (June, 2015)
using the records from March 14, 2015 as input data. As no drought has
occurred in study area since 2017, we predicted and validated the
regression models using data from 2015; the drought function doesn't
matter before years or after. Distances from training areas and the
seasons of input data are rather considerations, as verified in our previous
study [78].
The predicted SMIs was validated by considering the SMI
calculated using the Landsat-8 image from July 14, 2015 as reference;
although the best verification would be using the image from June 18,
2015, the image from this date was very cloudy.
(2)
(4)
(3)
2.6. Data Grouping and Training
To verify the training and prediction performance according to
data characteristics, we divided the datasets into three groups by
characteristics of the 15 input variables: Group 1 consists of map indices
with float number type, Group 2 is the reflectance group with integer
number type, and Group 3 consist of degrees with float number type.
More details of the groups are listed in Table 3. Seven groups were
considered for training by taking all the group combinations, and RF and
MLP algorithms was conducted and evaluated by the comparison metrics.
Tab
le 3
. Tra
inin
g gr
oups
acc
ordi
ng t
o da
ta c
hara
cter
isti
cs.
Trai
ning
Gro
up
(No.
var
iabl
es)
Inpu
t Var
iabl
es
D
ata
Cha
ract
erist
ics
Gro
up 1
(7
) E
VI,
ND
VI,
SA
VI,
MS
AV
I, N
DM
I,
MN
DW
I, M
SI
D
ata
type
fl
oat
D
ata
feat
ures
in
dex
V
alid
ran
ge
-1 t
o 1
Gro
up 2
(5
) N
IR, S
WIR
1, S
WIR
2, T
IRS
1, T
IRS
2
Dat
a ty
pe
inte
ger
D
ata
feat
ures
re
flec
tanc
e
Val
id r
ange
0
to 1
0000
Gro
up 3
(3
) T
WI,
Slo
pe, A
spec
t
Dat
a ty
pe
floa
t
Dat
a fe
atur
es
topo
grap
hic
attr
ibut
es
V
alid
ran
ge
-1 t
o 36
0 (-
1: f
lat)
Gro
ups
1 an
d 2
(12)
EV
I, N
DV
I, S
AV
I, M
SA
VI,
ND
MI,
M
ND
WI,
MS
I, N
IR, S
WIR
1, S
WIR
2,
TIR
S1,
TIR
S2
D
ata
type
fl
oat
and
inte
ger
D
ata
feat
ures
in
dex
and
refl
ecta
nce
V
alid
ran
ge
-1 t
o 1
and
0 to
100
00
Gro
ups
1 an
d 3
(10)
E
VI,
ND
VI,
SA
VI,
MS
AV
I, N
DM
I,
MN
DW
I, M
SI,
TW
I, S
lope
, Asp
ect
D
ata
type
fl
oat
D
ata
feat
ures
in
dex
and
degr
ee
V
alid
ran
ge
-1 t
o 1
and
-1 t
o 36
0
Gro
ups
2 an
d 3
(8)
NIR
, SW
IR1,
SW
IR2,
TIR
S1,
TIR
S2,
T
WI,
Slo
pe, A
spec
t
D
ata
type
In
tege
r +
flo
at
D
ata
feat
ures
re
flec
tanc
e +
deg
ree
V
alid
ran
ge
0 to
100
00 &
–1
to 3
60
Gro
ups
1, 2
, and
3
(15)
EV
I, N
DV
I, S
AV
I, M
SA
VI,
ND
MI,
M
ND
WI,
MS
I, N
IR, S
WIR
1, S
WIR
2,
TIR
S1,
TIR
S2,
TW
I, S
lope
, Asp
ect
D
ata
type
fl
oat,
inte
ger,
and
flo
at
D
ata
feat
ures
in
dex,
ref
lect
ance
, and
deg
ree
V
alid
ran
ge
-1 t
o 1,
0 t
o 10
000,
and
-1
to 3
60
Training for both RF and MLP was set up under the same
conditions to verify differences in training and prediction performance
by data characteristics. RF considered max_depth of 14 and max_feature
being the maximum number of inputs, whereas MLP considered the
number of hidden layers as 3 and as many neurons per hidden layer as
the number of inputs. Training proceeded until a limit of 1000 iterations.
3. Results
3.1. Training Performance: RF vs. MLP
The hyperparameter optimization results are listed in Table 4, and
the training performance is shown in Figure 5, where RF regressor
clearly outperforms MLP.
Table 4. Optimal hyperparameters of machine learning using RF and
MLP
1 Mean squared error; 2 In order of hidden layers, there are 30 neurons in the first hidden layer, 15 in the second, and 7 in the third.
RF MLP
Criterion MSE 1 Loss function MSE
max_depth 14 Hidden layers 3
max_feature 15 Neurons 2 30, 15, 7
n_estimators 1000 Epochs 1000
Figure 5. Training performance of regression model using RF and MLP
3.2. Prediction Performance: RF vs. MLP
Figure 6 shows the prediction performance of each drought
function using RF and MLP, fRF(x) and fMLP(x), respectively. RF
retrieved R2 = 0.587, being slightly lower than its training performance
(0.915). However, it provides a clearer drought spatial pattern that is
closer to the actual value compared to the results from MLP (Figure 7).
The prediction of soil drought using MLP notably differs from the actual
soil drought pattern, as shown in Figure 7.
Figure 6. Drought prediction performance using RF and MLP
Figure 7. Actual drought and predictions using RF and MLP
3.3. Training Performance According to Data Characteristics
Figure 8 shows the results of training performance from the seven
training groups by data characteristics. RF learning outperformed MLP
in all training groups, with notably higher performance in groups 1 and
2, and only a slightly superior performance in group 3.
Figure 8. Comparison of training performance according to groups with
different data.
4. Discussion
4.1. RF vs. MLP Regression on RS Data
Algorithm selection depends on the data characteristics and it is
most natural to use an index map or reflectance when comprising a
regression variable using satellite images. The results of this study
showed that RF might outperform MLP in this case. This may explain
the persistent use of RF as a powerful algorithm in RS field, unlike other
fields in which ANNs are the mainstream. RS is the most active field for
research on classification/regression based on satellite images. Below,
we summarize the data characteristics that allow RF to outperform
possibly MLP on RS data.
First, the small data range of index maps, which generally have
normalized values between –1 and 1, may not give significant effect RF
performance. In contrast, data types with small ranges and intervals, such
as map indices, might be more disadvantageous when using MLP.
According to our results, the difference in training performance between
the two evaluated algorithms was the largest in group 1, which consists
of index maps (Figure 8). This can be also supported by the results from
group 3, where the range was between –1 and 360, and the results
retrieved the smallest difference in training performance between RF and
MLP.
The second consideration is the usage of the large valid range
data, such as reflectance values, which lead to disadvantageous to
training when using MLP. Network-based ANNs require data scaling
because they are sensitive to the Euclidean distance between data. On the
other hand, RF does not depend on the distance between datapoints and
requires no scaling, because the variable contributions are determined
from the decision tree. This can be supported by the training performance
in group 2 (reflectance values group) and it was the second-largest
difference after group 1. Thus, RF can give better performance than MLP
when using either reflectance values with a wide range or index map
values with a small range.
Some studies further support our findings. In [73,75], variables
derived from satellite images showed that RF performance is better,
whereas a study on weather data and energy consumption showed that
MLP perform better [74].
4.2. MLP and Data Precision
MLP is one of the most basic regression and prediction
algorithms using ANN architectures, and by itself would be insufficient
to explain the performance of ANNs in general compared to RF learning.
Hence, we constructed to perform regression and further analysis using
a long short-term memory structure, called LSTM and one of the ANNs.
Still, the performance evaluation retrieved similar results to the MLP
(MSE = 0.0044, MAE = 0.0497, R2 = 0.833).
Moreover, we considered data precision as well as the range in
this study. Although we have proved the training performance differed
among the evaluated algorithms due to data range, the results are not
enough to determine the worse performance of MLP without discussion
of the contribution of data precision because the training can be affected
by the length in floating-point data. Thus, we performed further analysis
of MLP by considering only 3 instead of original 6 decimal places.
However, this experiment did not show a significant difference (MSE =
0.0043, MAE = 0.0501, R2 = 0.839) comparing with result of 6 decimal
places, and hence we can conclude that the performance difference
between the RF and MLP algorithms mostly depends on the range of data
regardless of the value precision.
4.3. Drought Prediction Using Machine Learning
We determined the suitability of RF for short-term drought
prediction if using regression and RS data. In the same vein, according
to [23], the thirty-one drought prediction models using ANNs were based
on precipitation data, whereas those using RS data and regression used
RF. Thus, ANNs might be more suitable when using precipitation data
to predict drought, and RF might be better when using RS data,
especially from satellite images. In this study, we did not specifically
mention topographic data besides satellite imagery. This is because we
did not identify a large difference (but, there were small differences, RF
was better) in training performance between two algorithms due to
topographic data. Remarkably, although topographic data did not serve
for soil drought predictions below 0.4 (see group 3 in Figure 8), it is a
very important variable to increase prediction accuracy. When we trained
drought using RF, the R2 value remained below 0.85 without topographic
information.
4.4. Limitations
Optimization of the model when using MLP depends not only on
a designed structure but on underlying reasons affecting the loss function
that are not well-known [89]. Thus, MLP analysis in this study might not
guarantee optimization, however, we believe that the hyperparameters
determined through numerous tests reached optimal values.
Our study is limited to regression (or classification) problems for
prediction. In addition, we did not consider the combination of RS data
with other data types. Thus, complementary studies are required to
support our findings. Nevertheless, our study provides insights on the
superior performance of RF as a machine learning approach for
regression or classification using RS data.
5. Conclusions
When applying machine learning, the appropriate algorithm
should be selected according to the data characteristics and analysis
purpose. From our results, when using data with narrow or wide value
ranges, such as index maps or reflectance values, RF notably
outperforms MLP. Thus, RF may be a better selection over MLP for
regression/classification using satellite images given the data
characteristics of the derived variables.
APPLICATION to POLICY
Subtitle: Is Water Pricing Policy Adequate to Reduce Water Demand
for Drought Mitigation in Korea?
1. Introduction
Historically, Korea has been relatively rich in water supplies and
has not had significant threats to water supply in drought. However, the
southern Seoul metropolitan area suffered from unprecedented spring
droughts during 2015–2017 [1,2]. The Korean government recognized
the necessity for developing drought mitigation policies after this
drought and begun proposing new policies since 2018. The proposed
policies are mainly targeted to long-term droughts preparation using
facilities such as dams and reservoirs to increase water supply [3].
Brears (2017) suggested that prior to planning to increase this
water supply, improved water supply could be achieved by reducing
existing water demand [4]. These reductions could be accomplished
through the conservation of water by reducing usage derived from a
water pricing system, setting public water saving targets, and
encouraging people to save water by changing their lifestyle [4].
3
When looking at California’s drought management policies that
were implemented during a drought they experienced in 2012–2016 [5],
the state seems to have improved its existing water usage, per Brears’s
(2017) [4] suggestions. The California government mentioned that the
water pricing policy implemented in the last drought was an effective
tool to reduce water demand and that it played an important role in
conserving water over the long and short-term as a result [6]. California’s
successful policy for reduction by pricing of water demand provides a
lesson [7] to Korea that existing water usage should be changed. Water
is not an endless resource; therefore water demand reduction policies
need to be implemented in addition to long-term policies that address
increasing water supply [8,9].
However, In Korea, implementation of pricing policy for water
demand reduction is challenging. The effectiveness of water pricing
policy in Korea has not been well known, and the water price elasticity
of demand in Korea has been quite different depending on the
researchers, data, and models used [10]. Not only was there a negative
response from politicians regarding the attempt to research water pricing
policies [11] but Korean citizens are sensitive to raising water prices [10],
even though the water rates are comparatively cheaper [12]. The Korean
government might have focused on long-term drought mitigation, such
as the water supply expansion, because of the difficulties of
implementing a pricing policy to reduce the water demand. Thus,
Korea’s current drought mitigation policy is still insufficient to
considering both long and short-term drought by improving existing
water usage.
In this regard, our study simulated the policy effect by assuming
that the water price policy was implemented during the spring drought
in Korea. In particular, we have estimated the amount of available
emergency agricultural water derived from the reduction in residential
water usage during the drought period. To be specific, during the drought
of Korea in 2015–2017, there was significant damage to agriculture,
while the use of residential water was inconvenienced a little. Most cities
worldwide do not face the risk of running out of the drinking water, but
agriculture is not safe from the effects of drought [8]. Agricultural
droughts require immediate mitigation because the crop death or stunted
growth by the lack of water cannot be recover. Thus, the prompt water
supply for agricultural drought mitigation is important in a drought
period.
This study investigated whether the policy of water usage
regulation by price would be effective in these severe drought areas in
Korea. We predicted the severe agricultural drought area in the region
spatially and simulated the effects of water pricing policy using the
severe drought area prediction (SDAP) model [13] and the system
dynamics (SD) model. The SDAP model was developed by our previous
research [13] based on machine learning (ML). ML can produce good
prediction performance based on big data; hence, it is already widely
used as a means of prediction in many fields. In addition, SD was
developed by Jay W. Forrester, a professor at Massachusetts Institute of
Technology (MIT), [14] and is widely used in various fields, such as the
military, politics, society, economy, and environment [15]. The SD
model, which is composed of several causal connections, looks similar
to the structural equation in that there are several independent and
dependent variables. However, unlike the structural equation, it includes
the concept of time and loop, which allows human behavior to be
analyzed dynamically. Therefore, SD is a useful tool to analyze the
effects of the policy changes in advance [16].
Based on the results from these models, we discussed whether
the water pricing policy is appropriate to reduce water demand for
drought mitigation in Korea, and the reason for the occurrence of
differences between the countries. In addition, we stated the water
demand reduction policy using the non-pricing that could be
complemented when pricing policies were not effective.
2. Methodology
2.1. Study area and Data
We conducted a case study in the southern Seoul metropolitan
area in Korea, called Gyeonggi Province. This study area is located
�� ����������� ��������������������� ������������� ������������
longitudes (Figure 1), has been severely affected by unprecedented
droughts during the spring from 2015 to 2017.
The south Gyeonggi Province in Korea has an average annual
precipitation of approximately 1300 mm. During the spring drought of
the south Gyeonggi Province from 2015 to 2017, however, the province
only had 50% of the usual annual precipitation [2]. Accordingly, we
simulated a policy of the water pricing effect in this southern Seoul
metropolitan area on the assumption that there was a drought in 2018 and
that water rates were raised.
Figure 1. Location of the study area.
The data used in the SDAP model were the Landsat 8 images and
the Shuttle Radar Topography Mission (SRTM) digital elevation model
(DEM) with 30 m resolution downloaded from the USGS EarthExplorer
[17]. The programming language for analysis was Python 3.6.1 version
for 64-bit windows platform and the software for spatial data processing
was ArcGIS pro.
The data used in SD model, the daily usage of water per person,
water price, population, and the information of the water source was
referenced from My Water website [18] and Korean Statistical
Information Service (KOSIS) [1]. Policy simulation was conducted
using Vensim PLE version for Windows.
2.2. SDAP and SD Model
This study has the framework of two linked models, as shown in
Figure 2, which is a structure that simulates policy by SD model based
on the result of SDAP model [13]. The SDAP model (Figure 3) predicts
the spatial distribution pattern of soil moisture after non-rainfall period
using drought function trained by random forest (RF) algorithm [13].
The SD model (Figure 4) estimates the amount of water available to the
provincial government by simulating the price increase policy for the
drought-tolerant areas predicted in the previous process. We estimate the
effectiveness of the policy through the estimated amount of water
resources.
Similar to this, simulations method using linked two models
(spatial information model and simulation model) already exist [19], and
this is continuously influenced by the internal parameters between the
two models connected to each other. In contrast, two models of our study
are driven independently like modules so that the parameters of the
simulation model are adjusted sequentially based on the predicted results
from the spatial information model. Therefore, these models are easy to
understand and apply. In addition, it can be improved by model and be
used separately for each model depending on the purpose of use.
Figu
re 2
. Fra
mew
ork
of l
inke
d th
e se
vere
dro
ught
are
a pr
edic
tion
(S
DA
P)
mod
el a
nd t
he s
yste
m d
ynam
ics
(SD
)
mod
el.
Figu
re 3
. Str
uctu
re o
f th
e S
DA
P M
odel
.
Figu
re 4
. Str
uctu
re o
f th
e S
D m
odel
2.2.1. SDAP Model: Prediction Drought Spatial Distribution
Agricultural drought was trained and predicted by the following
concept [13]. The soil moisture after non-rainfall periods remains
different depending on the condition of the present land surface [20]. In
this regard, we classified the land surface factors that affect the soil
moisture into four categories: vegetation, topographic, water, and
thermal factors during the non-rainfall period [13]. Thermal factors
reduce soil moisture, whereas vegetation retards the loss of soil moisture
by slowing down the increase in land surface heat [21]. Topography is
another important determinant of soil moisture [22]. The land initial
conditions such as existing water-containing state are also related to the
soil moisture remaining after a drought period [20]. Thus, the present
environmental conditions, such as these land surface factors, being
regressed on the soil moisture after the non-rainfall period will make
enable the short-term drought prediction of soil moisture.
Table 1 shows the 15 features (variables) that correspond to four
land surface factor. These 15 features are regressed on the soil moisture
index (SMI) of three months later of no precipitation and it is the input
variables for the RF regression [13].
Table 1. Variables of the SDAP model for prediction severe drought area.
1 �!"�������#��$#��"��#"�� ���*$ �$�"�� <�!��>���!�*!�*��*��������� �@�$�� �**�>�����$ � Y� ��� Z� �*��� ��\�^� �� ��� �!� ��$#� `#"���� � �radians.
Land Surface Factors
Input Variables Formula / Description Referen -ces
Vegetation
Enhanced vegetation index (EVI)
2.5 × ((NIR {�Red)/(NIR + 6.0 × Red {��|��Z�Blue + 1)
[23–25]
Normalized difference vegetation index
(NDVI) (NIR { Red)���}��{���� [24–26]
Soil-adjusted vegetation index
(SAVI)
((NIR { Red)/(NIR {�Red + B)) × (1 + 0.5)
[24,25,27]
Modified soil-adjusted vegetation Index
(MSAVI)
(2 × NIR Y���{�sqrt((2 × NIR + 1)2 {���Z��NIR {�Red)))/2
[25,28]
Topography
Topographic wetness index (TWI)
Ln (�/tan �) 1 [29]
Slope Degree of slope [30,31] Aspect Degree of aspect [30,32]
Water
Normalized difference moisture index (NDMI)
(NIR { SWIR1)/(NIR + SWIR1)
[25,33]
Modification of normalized difference water index (MNDWI)
(Green { SWIR1)/(Green + SWIR1)
[34]
Moisture stress index (MSI)
MidIR/NIR [35]
Thermal
Near infrared (NIR) 0.851–0.879 �> [36] Short-wavelength
infrared 1 (SWIR1) 1.566–1.651 �> [36]
Short-wavelength infrared 2 (SWIR2)
2.107–2.294 �> [36]
Thermal infrared sensor 1 (TIRS1)
10.60–11.19 �> [36]
Thermal infrared sensor 2 (TIRS2)
11.50–12.51 �> [36]
The RF algorithm that produces the drought function (hereafter
f(x)) for predicting the SMI is one of the machine learning methods
developed by Breiman [37]. The f(x) was trained the actual drought that
occurred in 2017 in this region, from March 23 to June 23
(approximately three months). We verified the f(x) in our previous study.
The training performance of f(x) was R2 = 0.91 and it can predict the SMI
of the same period drought in the other year as R2 = 0.58. Additionally,
this f(x) is characterized by the fact that the closer the drought is from the
selected training area to be trained, the higher is the accuracy, and the f(x)
should be separately generated for each region and period. We predicted
the soil moisture index of 26 June 2018 using the 15 features of 22 March
2018 and the f(x).
The SMI from Sandholt et al. [38] was used as an output variable
(target variable) for RF regression, which is suitable for representing soil
moisture during the growing season of crops [35] since this index
includes a vegetation index. Thus, the SMI is effective at predicting
agricultural drought in the spring-summer period, which is the season
examined in this study. For reference, this SMI has a real value between
0 and 1 and is most correlated with soil moisture at 20 cm soil depth [35].
We performed the following procedure to obtain a smooth SMI
map, the final the severe drought area prediction map, after a non-rainfall
period. Within the study area, 400,000 random points were generated and
then the SMI value was inserted at the points. Subsequently, predicted
agricultural drought maps were generated by the interpolation (natural
neighbor) of all the points.
2.2.2. SD Model: Simulation of Increased Water Price Policy
Implementation
Based on the predicted agricultural drought distribution map for
the three months after the above analysis, we identified drought-critical
areas. Then, we identified a water source that supplies water to the severe
drought area and then found an administrative area used this water source
jointly. The hypothetical policy simulated in this analysis applies to
severe drought areas and collective-use residents temporarily in a three-
month drought period. In a similar concept, in 2015, the California
government announced that it must reduce water use by 25% in cities
and towns of the severe drought area during the drought period [39,40].
To design a simulation model of a policy, it is preferable to create
a causal map that can represent the causal relationship between the policy
and changes in human behavior caused by the policy. Based on this
causal map, the model is transformed into a model that can be simulated
by entering a formula with variables and constants. These variables may
be cumulative or contain constants. Figure 4 illustrates the process,
which includes variables, where price increases result in the reduction of
water usage. The definitions of SD model variables for the simulation
are listed in Table 2.
Table 2. Variables of the SD model for policy simulation.
1 Average water usage per day in Korea [18]; 2 where 0.003785 is unit conversion (gallon to m3). Water fee is charged per m3; 3 existing monthly billing price; 4 water demand changes rate = water price changes rate (%) × the price elasticity of water demand; 5 severe drought area population + water source sharing area population.
Variable Type Equation
Daily water usage per person
Level water usage per day + (-) water
saving effort initial value = 48.60 gallon 1
Monthly water usage per person 2
Auxiliary water usage per day × 30 days ×
0.003785
Billing Auxiliary fixed fee + (monthly water usage
× billing rate)
Billing rate Constant Base = 0.92, Plan 1 = 1.10, Plan 2
= 1.28, Plan 3 = 1.46 (Unit: USD/m3)
Fixed fee Constant 1.50 USD
Recognition of increase water price
Auxiliary whether (desired billing < current
billing)
Desired billing 3 Constant 6.63 USD/person per month
Price elasticity of water demand
Constant {0.175
Water saving effort Auxiliary 4 the water demand changes rate
(%) × water usage per day
Population 5 Constant 7,969,432
Monthly water usage per local
Auxiliary Monthly Water Usage per person
× Population
The price elasticity of demand was primarily used to show the
change in water usage caused by price fluctuations; it is calculated as
follows:
Price elasticity of demand = >/>@/@ where Q is the quantity of the demanded good and P is the price of the
demanded good.
The price elasticity of demand for residential water is between
{�|���� � �� {�|���� [10] in Korea, and we used the median value of
{�|���� @$"� �!��� �����|� } � ������$ � �$� �!� ��"�<� ���"� "��� @$"� �*!�
municipality, the water rate used for the simulation was composed of the
water billing based on a fixed fee and the usage rate.
We have assumed three policies based on increased water price.
The base rate is 0.92 USD/m3, which is the same as the current rate. Plan
1 was increased by 120% from the base rate to 1.10 USD/m3; Plan 2 was
increased by 140% to 1.28 USD/m3; and Plan 3 was increased by 160%
1.46 USD/m3. We then ascertained the amount of water saved for the
three-month period by changing the individual water use of each plan. In
particular, in Table 2, the desired billing implies the desire that the water
billing will return to the previous level, thus the desired billing is
calculated using the base rate.
We used the system dynamics model to simulate the amount of
water used and the amount of water acquired for six months in advance
for each rate plan. Ultimately, we want to identify water conservation
(1)
and, specifically, water conservation in the three months following the
drought.
3. Results
3.1. Predicted Agricultural Drought Severity Areas
As a result of applying the SDAP model, we found four potential
severe drought area and have confirmed that five farmland areas within
the study area had a lower SMI than other areas, excluding the
impervious areas (dark gray section in Figure 5). Following this, four
water sources that are used in the four predicted agricultural drought
severity areas (five administrative districts) were identified. The water
sources are shared by nine administrative districts, including the four
severe drought areas [18].
Figure 5. Predicted agricultural drought map (excluding for the
impervious areas) after non-rainfall during the three-month period from
22 March 2018 to 26 June 2018.
4.2. Simulation of Water Pricing Policy Effect
When water price increase policies were implemented in the nine
administrative districts during drought periods, the resulting individual
water usage and the amount of water available for local governments are
as shown in Figure 6. The effects in daily water usage began to appear
three months after the implementation of the policy and thus little effect
was observed during the period when water to be used for drought
mitigation needed to be secured.
Figure 6. Changes in the amount of the water demand by increased water
rate.
Table 3 shows how much water was secured each month after the
plan had been implemented. It is expected that the amount of water to be
secured during the three months of drought period will be considerably
low and will not be an effective way to control water demand.
Table 3. Quantity of available water for drought mitigation.
4. Discussion
4.1. Effectiveness of Water Demand Reduction Policy Price in
Korea
Korea needs an effective policy to restrict water demand that can
be implemented quickly during drought periods. However, our
simulation results show that the water demand control policy based on
water price increase during the 2015–2017 drought would not have been
effective in Korea. This result also supports other studies that have
shown that the price elasticity of water is inelastic in Korea and very high
water rate is required to manage water demand [10]. Thus, if the policy
to reduce water usage is based on only water price, it would not be
effective in Korea.
We considered that the different outcomes were not only due to
differences in the amount of water resources in each country, but also
owing to culture differences in the water rate recognition and water use.
Policy (Unit: Gallon)
One Month
Two Months
Three Months
Cumulative Amount
Plan 1 0 63,290 253,160 316,450
Plan 2 63,290 189,870 443,030 696,190
Plan 3 63,290 316,450 822,771 1,202,511
For instance, most Koreans are reluctant to increase water rates [10] even
though the water price is cheaper than in other countries [12]. Therefore,
drought mitigation and water demand reduction policies must be
developed considering the specific factors and situations within a region
or country and not based solely on a policy’s success elsewhere. While
the water pricing is one of many ways to reduce water demand, it is not
the only solution. For example, in California, a considerable amount of
water is utilized for watering residential lawns. The California
government, therefore, attempted to reduce the water use of individuals
by restricting this activity, which resulted in a significant water use
reduction [7]. In contrast, in Korea, there are not many houses with lawns.
Thus, restricting this activity in Korea would not be effective. It would
be more effective to reduce the demand of water, which is used
indiscriminately in everyday life such as for showers, car washes, and
dishwashing, because of the low water price.
A variety of results have been found after examining other studies
about the effectiveness of pricing for reducing water demand. While
some studies showed that water pricing is not effective at all [41], there
are also opposing studies that showed it was effective [42]. In addition,
there are several studies with the neutral position that the pricing policy
is not completely ineffective, but only effective in a short period of time
[43].
Our research aimed at confirming the effectiveness of water
pricing policy in Korea to secure emergency agricultural water. During
this process, we found a difference in the supply method of agricultural
water between Korea and California. While Korea uses the direct water
supply to the drought area, California supported the water indirectly. For
example, in Korea, the residential water is used directly as emergency
agricultural water during a severe drought. Korea is attempting to
introduce the concept as the ‘Smart Water Grid’ to develop a system for
managing and sharing water to support areas with water scarcity.
Similarly, Singapore already operates an integrated water management
system [44]. In contrast, California indirectly supported agricultural
water by excluding agriculture from water regulations. The water
demand reduction regulations of 25% only applied to residential,
industrial and commercial water use [5,7,39]. Thus, depending on the
circumstances of the region and country, both the water demand
reduction method for drought mitigation and the water supply methods
can vary. Therefore, in regions and countries facing a drought crisis,
appropriate water demand reduction policies should be designed to fit
the circumstance of each country using simulations considering both
price and non-price policies.
5.2. Non-Price Policy for Water Demand Reduction
In Korea, the non-price policies should be implemented with
high-level water pricing to achieve effective water demand reduction
[10]. There are some related studies on the effectiveness of non-price
factors for controlling water demand to support this view. For example,
one study showed that water use can be reduced by simply increasing the
frequency of water bills without a price control policy [45]. Another
study found that the non-price method for reducing water demand, such
as water-saving campaigns in times of drought or water-saving
equipment for showers and toilets, can also be effective [12].
The use of system dynamics techniques, including human
dynamics by policy, can be repeatedly verified and tested against policies
that have never been implemented, thus helping to develop policies
tailored to each country’s circumstances. In addition, the results of the
analysis can be used as a resource for public consensus for the
implementation of amicable policies. Furthermore, in Korea, SD can be
used to simulate and repeatedly revise policies that include both price
and non-price factors, creating an effective water management policy for
drought response.
Figure 7 presents our proposed combination model including
price and non-price factors that may be helpful for future research to
develop a plan and understand the relation between reducing water
demand and human actions. This SD model includes non-price factors
that can be used to reduce water demand, such as increasing the billing
frequency by using tax incomes generated from the increased water rate,
implementing water-saving campaigns to raise awareness, and supplying
water-saving household devices (such as water-saving faucets and toilet
seats), which would thereby lead to effective water saving. However,
these causal relationships along with the concepts and ideas depicted on
the graph are beyond the scope of this research.
Figu
re 7
. The
SD
mod
el i
nclu
ded
pric
e an
d no
n-pr
ice
fact
ors
for
savi
ng r
esid
enti
al w
ater
.
6. Conclusions
In this study, we used machine learning and remote sensing data
in Korea to predict the soil moisture map that will occur three months
after non-rainfall, as well as to identify areas with severe drought
conditions based on the predicted agricultural drought maps. The system
dynamics method was used to simulate the water price increase policy
for the study area and confirm the amount of water available during the
drought period. The simulation results showed that the amount of saved
water was not significant, and, therefore, the water pricing policy for
drought mitigation is not effective in Korea. However, the effectiveness
of the pricing policy for water demand reduction cannot be generalized
because the implement effects could be various depending on the
situation in each country, such as culture and water reserves differences.
Thus, the policy for drought should take an appropriate approach
depending on the situation of each country. In future studies, further
discussion of water conservation policies for drought mitigation is
possible through simulation using models including price and non-price
policies.
Depending on the purpose of a study, the use of appropriate
analytical methods is important. Traditional statistics such as linear
regression are useful for understanding causality, and machine learning
is powerful for predictive purposes. Therefore, the black box problem of
machine learning is not an important consideration because the SDAP
model aims to establish a policy based on prediction rather than the cause
and understanding of droughts. Also, system dynamics is a time-efficient
method when considering characteristics that take a long time to confirm
the effect of policy. Although there is a dispute over the validation of the
simulation results by system dynamics, no means exist to completely
verify the real world. Thus, the trend is worth mentioning even though
the resultant value is excluded. The results in Chapter 2 are meaningful
in terms of the fact that the random forest method can be first considered
using satellite imagery and machine learning for regression prediction.
In Chapter 3, the application of the model is meaningful because it shows
the entire process from the use of the model to the establishment of the
water management policy. While existing models or analyses have stated
only simple policy applicability, this thesis is significant because it
shows the whole policy-making process based on scientific methods.
REFERENCES
1. Masson-Delmotte, V.; Pörtner, H.-O.; Skea, J.; Zhai, P.; Roberts,
D.; Shukla, P.R.; Pirani, A.; Moufouma-Okia, W.; Péan, C.;
Pidcock, R.; Connors, S.; Matthews, J.B.R.; Chen, Y.; Zhou, X.;
Gomis, M.I.; Lonnoy, E.; Maycock, T.; Tignor, M.; Waterfield,
T. Intergovernmental Panel on Climate Change (IPCC) Special
�������� ��� ���������������oC; 2018. In Press.
2. Jay, L.; Josue, M.-A.; John, D.; Kathleen, S. Lessons from
California’s 2012–2016 Drought. J. Water Resour. Plan. Manag.
2018, 144, 4018067, doi:10.1061/(ASCE)WR.1943-
5452.0000984.
3. National Drought Information Analysis Center (NDIAC).
Available online: http://drought.kwater.or.kr (accessed on Aug
30, 2018).
4. European Agriculture Impacted by Drought and Water Scarcity
Available online: https://www.euroscientist.com/european-
agriculture-impacted-by-drought-and-water-scarcity/ (accessed
on Apr 25, 2019).
5. Quiggin, J. Drought , Climate Change and Food Prices in
Australia. 2010, doi:10.1109/ICSMC.2004.1401241.
6. Predicting droght Available online:
http://drought.unl.edu/DroughtBasics/PredictingDrought.aspx
(accessed on Jan 19, 2018).
7. Butler, A.; Charlton - Perez, A.; Domeisen, D.; Garfinkel, C.;
Gerber, E.; Hitchcock, P.; Karpechko, A.; Maycock, A.;
Sigmond, M.; Simpson, I.; Son, S.-W. Sub-Seasonal to Seasonal
���������������������������������������! ������"�����#����;
Elsevier, 2018; ISBN 9780128117149.
8. Park, H.; Kim, K.; Lee, D.K. Prediction of Severe Drought Area
Based on Random Forest: Using Satellite Image and Topography
Data. Water 2019, 11, doi:10.3390/w11040705.
9. Hao, Z.; Singh, V.P.; Xia, Y. Seasonal Drought Prediction:
Advances, Challenges, and Future Prospects. Rev. Geophys.
2018, �$, 108–141, doi:10.1002/2016RG000549.
10. Mishra, A.K.; Desai, V.R. Drought forecasting using stochastic
models. Stoch. Environ. Res. Risk Assess. 2005, 19, 326–339,
doi:10.1007/s00477-005-0238-4.
11. Schepen, A.; Wang, Q.J.; Robertson, D.E. Combining the
strengths of statistical and dynamical modeling approaches for
forecasting Australian seasonal rainfall. J. Geophys. Res. Atmos.
2012, 117, 1–9, doi:10.1029/2012JD018011.
12. Hao, Z.; Xia, Y.; Luo, L.; Singh, V.P.; Ouyang, W.; Hao, F.
Toward a categorical drought prediction system based on U.S.
Drought Monitor (USDM) and climate forecast. J. Hydrol. 2017,
���, 300–305, doi:10.1016/j.jhydrol.2017.06.005.
13. Prasad, R.; Deo, R.C.; Li, Y.; Maraseni, T. Soil moisture
forecasting by a hybrid machine learning technique: ELM
integrated with ensemble empirical mode decomposition.
Geoderma 2018, 330, 136–161,
doi:10.1016/j.geoderma.2018.05.035.
14. Rezaeianzadeh, M.; Stein, A.; Cox, J.P. Drought Forecasting
using Markov Chain Model and Artificial Neural Networks.
Water Resour. Manag. 2016, 30, 2245–2259,
doi:10.1007/s11269-016-1283-0.
15. Cook, B.I.; Smerdon, J.E.; Seager, R.; Coats, S. Global warming
and 21st century drying. Clim. Dyn. 2014, 43, 2607–2627,
doi:10.1007/s00382-014-2075-y.
16. Wilhite, D.; Glantz, M. Understanding: the Drought
Phenomenon: The Role of Definitions. Water Int. 1985, 10, 111–
120, doi:10.1080/02508068508686328.
17. Park, S.; Im, J.; Jang, E.; Rhee, J. Drought Assessment and
Monitoring through Blending of Multi-sensor Indices Using
Machine Learning Approaches for Different Climate Regions.
%������"����&������ � 2016, 217, 50,
doi:10.1016/j.agrformet.2016.01.040.
18. K., M.A.; R., D. V; P., S. V Drought Forecasting Using a Hybrid
Stochastic and Neural Network Model. J. Hydrol. Eng. 2007, 12,
626–638, doi:10.1061/(ASCE)1084-0699(2007)12:6(626).
19. Durdu, Ö.F. Application of linear stochastic models for drought
forecasting in the Büyük Menderes river basin, western Turkey.
Stoch. Environ. Res. Risk Assess. 2010, 24, 1145–1162,
doi:10.1007/s00477-010-0366-3.
20. Ali, Z.; Hussain, I.; Faisal, M.; Nazir, H.M.; Hussain, T.; Shad,
M.Y.; Mohamd Shoukry, A.; Hussain Gani, S. Forecasting
Drought Using Multilayer Perceptron Artificial Neural Network
Model. Adv. Meteorol. 2017, 2017, doi:10.1155/2017/5681308.
21. Mastrandrea, M.D.; Mach, K.J.; Plattner, G.; Matschoss, P.R.
The IPCC AR5 guidance note on consistent treatment of
� *"��� ��������*$>>$ ��##"$�*!��*"$����!��$"�� <�<"$�#�|�
2011, 675–691, doi:10.1007/s10584-011-0178-6.
22. Maity, R.; Suman, M.; Verma, N.K. Drought prediction using a
wavelet based approach to model the temporal consequences of
different types of droughts. J. Hydrol. 2016, �'*, 417–428,
doi:10.1016/j.jhydrol.2016.05.042.
23. Fung, K.F.; Huang, Y.F.; Koo, C.H.; Soh, Y.W. Drought
forecasting: A review of modelling approaches 2007–2017. J.
Water Clim. Chang. 2019, doi:10.2166/wcc.2019.236.
24. Olang, L.; Ali, A.; Demuth, S.; Wood, E.F.; Yuan, X.; Sadri, S.;
Chaney, N.; Guan, K.; Sheffield, J.; Amani, A.; Ogallo, L. A
Drought Monitoring and Forecasting System for Sub-Sahara
African Water Resources and Food Security. Bull. Am. Meteorol.
Soc. 2013, *�, 861–882, doi:10.1175/bams-d-12-00124.1.
25. Hao, Z.; AghaKouchak, A.; Nakhjiri, N.; Farahmand, A. Global
integrated drought monitoring and prediction system. Sci. data
2014, 1, 140001, doi:10.1038/sdata.2014.1.
26. Causes of Drought: What’s the Climate Connection. Available
online: http://www.ucsusa.org (accessed on Jan 19, 2018).
27. Sruthi, S.; Aslam, M.A.M. Agricultural Drought Analysis Using
the NDVI and Land Surface Temperature Data; a Case Study of
Raichur District. Aquat. Procedia 2015, 4, 1258–1264,
doi:10.1016/j.aqpro.2015.02.164.
28. ���������|�|^��\�>�"����|�|^��\�>�"����|��$#$<"�#!�*�
wetness index explains soil moisture better than bioindication
with Ellenberg’s indicator values. Ecol. Indic. 2018, +�, 172–
179, doi:10.1016/j.ecolind.2017.10.011.
29. Park, H.; Lee, D. Disaster Prediction and Policy Simulation for
Evaluating Mitigation Effects Using Machine Learning and
System Dynamics: Case Study of Seasonal Drought in Gyeonggi
Province. J. Korean Soc. Hazard Mitig. 2019, 19, 45–53,
doi:10.9798/KOSHAM.2019.19.1.45.
30. Breiman, L. Random forests. Mach. Learn. 2001, /�, 5–32,
doi:10.1023/A:1010933404324.
31. Rhee, J.; Im, J. Meteorological drought forecasting for ungauged
areas based on machine learning: Using long-range climate
forecast and remote sensing data. %������"����&������ � 2017,
237–238, 105–122, doi:10.1016/j.agrformet.2017.02.011.
32. Danandeh Mehr, A.; Kahya, E.; O¨zger, M. A gene-wavelet
model for long lead time drought forecasting. J. Hydrol. 2014,
��5, 691–699, doi:10.1016/j.jhydrol.2014.06.012.
33. DeChant, C.M.; Moradkhani, H. Analyzing the sensitivity of
drought recovery forecasts to land surface initial conditions. J.
Hydrol. 2015, �6$, 89–100, doi:10.1016/j.jhydrol.2014.10.021.
34. Zhu, Y.; Wang, W.; Singh, V.P.; Liu, Y. Combined use of
meteorological drought indices at multi-time scales for
improving hydrological drought detection. Sci. Total Environ.
2016, �5�, 1058–1068, doi:10.1016/j.scitotenv.2016.07.096.
35. Yu, C.; Li, C.; Xin, Q.; Chen, H.; Zhang, J.; Zhang, F.; Li, X.;
Clinton, N.; Huang, X.; Yue, Y.; Gong, P. Dynamic assessment
of the impact of drought on agricultural yield and scale-
dependent return periods over large geographic regions. Environ.
Model. Softw. 2014, $6, 454–464,
doi:10.1016/j.envsoft.2014.08.004.
36. Park, S.; Seo, E.; Kang, D.; Im, J. Prediction of Drought on
Pentad Scale Using Remote Sensing Data and MJO Index
through Random Forest over East Asia. 2018, 1–18,
doi:10.3390/rs10111811.
37. AghaKouchak, A. A baseline probabilistic drought forecasting
framework using standardized soil moisture index: Application
to the 2012 United States drought. Hydrol. Earth Syst. Sci. 2014,
18, 2485–2492, doi:10.5194/hess-18-2485-2014.
38. Hao, Z.; Hao, F.; Singh, V.P.; Ouyang, W.; Cheng, H. An
integrated package for drought monitoring, prediction and
analysis to aid drought modeling and assessment. Environ.
Model. Softw. 2017, 91, 199–209,
doi:10.1016/j.envsoft.2017.02.008.
39. Mo, K.C.; Shukla, S.; Lettenmaier, D.P.; Chen, L.C. Do Climate
Forecast System (CFSv2) forecasts improve seasonal soil
moisture prediction? Geophys. Res. Lett. 2012, 39, 1–6,
doi:10.1029/2012GL053598.
40. Schäfer, D.; Samaniego, L.; Kumar, R.; Mai, J.; Thober, S.;
Sheffield, J. Seasonal Soil Moisture Drought Prediction over
Europe Using the North American Multi-Model Ensemble
(NMME). J. Hydrometeorol. 2015, �$, 2329–2344,
doi:10.1175/jhm-d-15-0053.1.
41. Korean Statistical Information Service (KOSIS). Available
online: http://kosis.kr (accessed on May 14, 2018).
42. Earth Explorer Available online: https://earthexplorer.usgs.gov
(accessed on May 11, 2018).
43. Environmental Geographic Information Service (EGSI).
Available online: https://egis.me.go.kr (accessed on May 2,
2018).
44. Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.;
Ferreira, L.G. Overview of the radiometric and biophysical
performance of the MODIS vegetation indices. Remote Sens.
Environ. 2002, 83, 195–213, doi:https://doi.org/10.1016/S0034-
4257(02)00096-2.
45. Handbook of drought indicators and indices; World
Meteorologcal Organizatio (WMO) & Global Water Partnership
(GWP): Geneva & Stockholm, 2016; ISBN 978-92-63-11173-9.
46. ����:����:����;���#���<:��������� �������-derived Spectral
Indices; 3.6 version; Department of the Interior U.S. Geological
Survey (USGS), 2017;
47. Tucker, C.J. Red and photographic infrared linear combinations
for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–
150, doi:10.1016/0034-4257(79)90013-0.
48. Sydney, T. A Soil-Adjusted Vegetation Index (SAVI). Remote
Sens. Environ. 1988, 6�, 295–309, doi:10.1016/0034-
4257(88)90106-X.
49. Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A
modified soil adjusted vegetation index. Remote Sens. Environ.
1994, 48, 119–126, doi:10.1016/0034-4257(94)90134-1.
50. Beven, K.J.; Kirkby, M.J. A physically based, variable
contributing area model of basin hydrology. Hydrol. Sci. Bull.
1979, 24, 43–69, doi:10.1080/02626667909491834.
51. Burrough, P.A.; Mcdonnell, R.A. Data Models and Axioms.
Princ. Geogr. Inf. Syst. 1998, 17–34, doi:10.2307/144481.
52. How slope works Available online:
http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-
toolbox/how-slope-works.htm (accessed on Dec 21, 2018).
53. How aspect works. Available online:
http://pro.arcgis.com/en/pro-app/tool-reference/spatial-
analyst/how-aspect-works.htm (accessed on Dec 21, 2018).
54. Gao, B. NDWI—A normalized difference water index for
remote sensing of vegetation liquid water from space. Remote
Sens. Environ. 1996, �+, 257–266, doi:10.1016/S0034-
4257(96)00067-3.
55. Xu, H. Modification of normalised difference water index
(NDWI) to enhance open water features in remotely sensed
imagery. Int. J. Remote Sens. 2006, 27, 3025–3033,
doi:10.1080/01431160600589179.
56. Welikhe, P.; Quansah, J.E.; Fall, S.; McElhenney, W. Estimation
of Soil Moisture Percentage Using LANDSAT-based Moisture
Stress Index. J. Remote Sens. GIS 2017, =$, doi:10.4172/2469-
4134.1000200.
57. Bands Specifications of Landsat 8 Available online:
https://landsat.usgs.gov/provisional-landsat-8-surface-
reflectance-data-available (accessed on Dec 19, 2018).
58. Mishra, A.K.; Singh, V.P. A review of drought concepts. J.
Hydrol. 2010, 391, 202–216, doi:10.1016/j.jhydrol.2010.07.012.
59. Sandholt, I.; Rasmussen, K.; Andersen, J. A simple
interpretation of the surface temperature/vegetation index space
for assessment of surface moisture status. Remote Sens. Environ.
2002, 79, 213–224, doi:10.1016/S0034-4257(01)00274-7.
60. Zeng, Y.N.; Feng, Z.D.; Xiang, N.P. Assessment of soil moisture
using Landsat ETM+ temperature/vegetation index in semiarid
environment. Ieee Int. Geosci. Remote Sens. Symp. Proc. 2004,
1–7, 4306–4309, doi:10.1109/IGARSS.2004.1370089.
61. Panu, U.S.; Sharma, T.C. Challenges in drought research: some
perspectives and future directions. Hydrol. Sci. J. 2002, 47, S19–
S30, doi:10.1080/02626660209493019.
62. National Drought Mitigation Center(NDMC), Drought Basics.
Available online:
https://drought.unl.edu/Education/DroughtBasics.aspx (accessed
on Dec 26, 2018).
63. Leeuwen, B. Van GIS workflow for continuous soil moisture
estimation based on medium resolution satellite data. Agile 2015.
64. Gregorutti, B.; Michel, B.; Saint-Pierre, P. Correlation and
variable importance in random forests. Stat. Comput. 2017, 27,
659–678, doi:10.1007/s11222-016-9646-1.
65. Thompson, J.A.; Roecker, S.; Grunwald, S.; Owens, P.R.
Chapter 21 - Digital Soil Mapping: Interactions with and
Applications for Hydropedology A2 - Lin, Henry BT -
Hydropedology. In; Academic Press: Boston, 2012; pp. 665–709
ISBN 978-0-12-386941-8.
66. ��<�����|^��"�<����|��� �$>�@$"���� �">$��� �� <����
review of applications and future directions. ISPRS J.
Photogramm. Remote Sens. 2016, 114, 24–31,
doi:10.1016/j.isprsjprs.2016.01.011.
67. Körting, T.S.; Fonseca, L.M.G.; Câmara, G. GeoDMA—
Geographic Data Mining Analyst. Comput. Geosci. 2013, �5,
133–145, doi:https://doi.org/10.1016/j.cageo.2013.02.007.
68. Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo,
M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a
random forest classifier for land-cover classification. ISPRS J.
Photogramm. Remote Sens. 2012, $5, 93–104,
doi:https://doi.org/10.1016/j.isprsjprs.2011.11.002.
69. Ham, J.; Chen, Y.; Crawford, M.M.; Ghosh, J. Investigation of
the random forest framework for classification of hyperspectral
data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 492–501,
doi:10.1109/TGRS.2004.842481.
70. Xia, J.; Du, P.; He, X.; Chanussot, J. Hyperspectral Remote
Sensing Image Classification Based on Rotation Forest. IEEE
Geosci. Remote Sens. Lett. 2014, 11, 239–243,
doi:10.1109/LGRS.2013.2254108.
71. Gislason, P.O.; Benediktsson, J.A.; Sveinsson, J.R. Random
Forest classification of multisource remote sensing and
geographic data. In IGARSS 2004. 2004 IEEE International
Geoscience and Remote Sensing Symposium; 2004; Vol. 2, pp.
1049–1052 vol.2.
72. Pal, M. Random forest classifier for remote sensing
classification. Int. J. Remote Sens. 2005, 6$, 217–222,
doi:10.1080/01431160412331269698.
73. Cracknell, M.J.; Reading, A.M. Geological mapping using
remote sensing data: A comparison of five machine learning
algorithms, their response to variations in the spatial distribution
of training data and the use of explicit spatial information.
Comput. Geosci. 2014, $', 22–33,
doi:10.1016/j.cageo.2013.10.008.
74. Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs Neurons:
Comparison between random forest and ANN for high-
resolution prediction of building energy consumption. Energy
Build. 2017, 147, 77–89, doi:10.1016/j.enbuild.2017.04.038.
75. Rodriguez-Galiano, V.; Sanchez-Castillo, M.; Chica-Olmo, M.;
Chica-Rivas, M. Machine learning predictive models for mineral
prospectivity: An evaluation of neural networks, random forest,
regression trees and support vector machines. Ore Geol. Rev.
2015, 71, 804–818, doi:10.1016/j.oregeorev.2015.01.001.
76. Hayes, M.M.; Miller, S.N.; Murphy, M.A. High-resolution
landcover classification using random forest. Remote Sens. Lett.
2014, �, 112–121, doi:10.1080/2150704X.2014.882526.
77. Gilardi, N. Comparison of four machine learning algorithms for
spatial data analysis. Evaluation 1995, 1–16.
78. ��$\�����|^����� �������|^��$�# *!����¡|�} �"$��*��$ ��$�>����-
layer feed-forward neural networks. Chemom. Intell. Lab. Syst.
1997, 39, 43–62.
79. Palafox, L.F.; Hamilton, C.W.; Scheidt, S.P.; Alvarez, A.M.
Automated detection of geological landforms on Mars using
Convolutional Neural Networks. Comput. Geosci. 2017, 101,
48–56, doi:10.1016/j.cageo.2016.12.015.
80. Maybank, J.; Bonsai, B.; Jones, K.; Lawford, R.; O’Brien, E.G.;
Ripley, E.A.; Wheaton, E. Drought as a natural disaster.
Atmosphere-Ocean 1995, 33, 195–222,
doi:10.1080/07055900.1995.9649532.
81. Causes of Drought Available online:
https://www.nationalgeographic.org/encyclopedia/drought/
(accessed on Apr 22, 2019).
82. Vitart, F.; Robertson, A.W. Chapter 1 - Introduction: Why Sub-
seasonal to Seasonal Prediction (S2S)? In Sub-Seasonal to
Seasonal Prediction; Robertson, A.W., Vitart, F., Eds.; Elsevier,
2019; pp. 3–15 ISBN 978-0-12-811714-9.
83. Gyeonggi Province Statistics Available online:
https://www.gg.go.kr/ggstat (accessed on May 14, 2018).
84. Gao, B.C. NDWI - A normalized difference water index for
remote sensing of vegetation liquid water from space. Remote
Sens. Environ. 1996, �+, 257–266, doi:10.1016/S0034-
4257(96)00067-3.
85. Liaw, A.; Wiener, M. Classification and Regression by
randomForest. R news 2002, 2, 18–22,
doi:10.1177/154405910408300516.
86. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J.
Classification and regression trees. Wadsworth and Brooks 1984.
87. Liu, M.; Wang, M.; Wang, J.; Li, D. Comparison of random
forest, support vector machine and back propagation neural
network for electronic tongue data classification: Application to
the recognition of orange beverage and Chinese vinegar. Sensors
Actuators, B Chem. 2013, 177, 970–980,
doi:10.1016/j.snb.2012.11.071.
88. Claesen, M.; De Moor, B. Hyperparameter Search in Machine
Learning. In The XI Metaheuristics International Conference;
Agadir, 2015; pp. 10–14.
89. Li, H.; Xu, Z.; Taylor, G.; Studer, C.; Goldstein, T. Visualizing
the Loss Landscape of Neural Nets. arXiv e-prints 2017,
arXiv:1712.09913.