wide-area land cover mapping with sentinel-1 imagery using … · 2020-02-27 · wide-area land...

39
Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models Sanja ˇ cepanovi´ c a , Oleg Antropov a,b , Pekka Laurila a , Vladimir Ignatenko a , Jaan Praks c a ICEYE Oy, Helsinki, Finland b VTT Technical Research Centre of Finland, Helsinki, Finland c Aalto University, Helsinki, Finland Abstract Land cover mapping is essential for monitoring the environment and under- standing the effects of human activities on it. The automatic approaches to land cover mapping (i.e., image segmentation) mostly used traditional machine learning that requires heuristic feature design. On the natural images, deep learning has outperformed traditional machine learning approaches on a range of tasks, including the image segmentation. On remote sensing images, recent studies are demonstrating successful application of specific deep learning mod- els or their adaptations to particular small-scale land cover mapping tasks (e.g., to classify wetland complexes). However, it is not readily clear which of the existing state-of-the-art models for natural images are the best candidates to be taken for the particular remote sensing task and data. In this study, we answer that question for mapping the fundamental land cover classes using the satellite imaging radar data. We took ESA Sentinel-1 C- band SAR images available at no cost to users as representative data. CORINE land cover map produced by the Finnish Environment Institute was used as a reference, and the models were trained to distinguish between the 5 Level- 1 CORINE classes. We selected seven among the state-of-the-art semantic segmentation models so that they cover a diverse set of approaches: U-Net, DeepLabV3+, PSPNet, BiSeNet, SegNet, FC-DenseNet, and FRRN-B. These models were pre-trained on the ImageNet dataset and further fine-tuned in this study. Specifically, we used 14 ESA Sentinel-1 scenes acquired during the sum- mer season in Finland, which are representative of the land cover in the country. Upon the evaluation and benchmarking, all the models demonstrated solid performance. The best model, FC-DenseNet (Fully Convolutional DenseNets), achieved the overall accuracy of 90.7%. Except for the producer accuracy of two classes (urban and water bodies), FC-DenseNet has outperformed all the other models across the accuracy measures and the classes. Overall, our results indicate that the semantic segmentation models are suitable for efficient wide- area mapping using satellite SAR imagery. Our results also provide baseline accuracy against which the newly proposed models should be evaluated and Preprint submitted to arxiv.org February 27, 2020 arXiv:1912.05067v2 [eess.IV] 26 Feb 2020

Upload: others

Post on 03-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Wide-Area Land Cover Mapping with Sentinel-1Imagery using Deep Learning Semantic Segmentation

Models

Sanja Scepanovica Oleg Antropovab Pekka Laurilaa Vladimir IgnatenkoaJaan Praksc

aICEYE Oy Helsinki FinlandbVTT Technical Research Centre of Finland Helsinki Finland

cAalto University Helsinki Finland

Abstract

Land cover mapping is essential for monitoring the environment and under-standing the effects of human activities on it The automatic approaches toland cover mapping (ie image segmentation) mostly used traditional machinelearning that requires heuristic feature design On the natural images deeplearning has outperformed traditional machine learning approaches on a rangeof tasks including the image segmentation On remote sensing images recentstudies are demonstrating successful application of specific deep learning mod-els or their adaptations to particular small-scale land cover mapping tasks (egto classify wetland complexes) However it is not readily clear which of theexisting state-of-the-art models for natural images are the best candidates to betaken for the particular remote sensing task and data

In this study we answer that question for mapping the fundamental landcover classes using the satellite imaging radar data We took ESA Sentinel-1 C-band SAR images available at no cost to users as representative data CORINEland cover map produced by the Finnish Environment Institute was used asa reference and the models were trained to distinguish between the 5 Level-1 CORINE classes We selected seven among the state-of-the-art semanticsegmentation models so that they cover a diverse set of approaches U-NetDeepLabV3+ PSPNet BiSeNet SegNet FC-DenseNet and FRRN-B Thesemodels were pre-trained on the ImageNet dataset and further fine-tuned in thisstudy Specifically we used 14 ESA Sentinel-1 scenes acquired during the sum-mer season in Finland which are representative of the land cover in the country

Upon the evaluation and benchmarking all the models demonstrated solidperformance The best model FC-DenseNet (Fully Convolutional DenseNets)achieved the overall accuracy of 907 Except for the producer accuracy oftwo classes (urban and water bodies) FC-DenseNet has outperformed all theother models across the accuracy measures and the classes Overall our resultsindicate that the semantic segmentation models are suitable for efficient wide-area mapping using satellite SAR imagery Our results also provide baselineaccuracy against which the newly proposed models should be evaluated and

Preprint submitted to arxivorg February 27 2020

arX

iv1

912

0506

7v2

[ee

ssI

V]

26

Feb

2020

suggest the DenseNet-based models are the first candidate for this task

Keywords synthetic aperture radar deep learning semantic segmentationland cover mapping image classification Sentinel-1 data C-band CORINE

1 Introduction

Mapping of land cover and its change has a critical role in the characteri-zation of the current state of the environment The changes in land cover canbe due either to human activities as well as caused by climate changes on aregional scale The land cover on the other hand affects climate through waterand energy exchange with the atmosphere and by changing carbon balance Be-cause of this land cover belongs to the Essential Climate Variables [1] Hencetimely assessment of land cover and its change is one of the most importantapplications in satellite remote sensing Thematic maps are needed annuallyfor various purposes in medium resolution (circa 250 m) with less than 15measurement uncertainty and in high resolution (10-30 m) with less than 5uncertainty

CORINE Land Cover (CLC) is a notable example of a consistent Pan-European land cover mapping initiative [2 3] coordinated by the EuropeanEnvironment Agency (EEA)1 CORINE stands for coordination of informationon the environment It is an on-going long-term effort providing most harmo-nized classification land cover data in Europe with updates approximately every4 years The CORINE maps are an important source of land cover informationsuitable for operational purposes also for various customer groups in EuropeIt has altogether 44 classes though many of them are not strictly ecologicalclasses but rather land use classes On the continental scale CORINE providesa harmonized map with 25 ha minimum mapping unit (MMU) for areal phe-nomena and a minimum width of 100 m for linear phenomena [4] Nationalland cover maps in the CORINE framework can exhibit smaller mapping unitsIn Finland the latest revision of CORINE land cover map at the time of thisstudy was 2012 round produced by the Finnish Environment Institute Themap has an MMU of 20mtimes 20m and was produced by a combined automatedand manual interpretation of the high-resolution satellite optical data followedby the data integration with existing basic map layers [5]

The state-of-the-art approaches used for land cover mapping mainly rely onthe satellite optical imagery The key role is played by the Landsat imageryoften augmented by the MODIS or SPOT-5 imagery [6 7 8] Other sources ofinformation employed for land cover mapping include Digital Elevation Models(DEM) and very high-resolution imagery [9] When it comes to the large-scaleand multitemporal land cover mapping a more recent optical imagery source isCopernicus Sentinel-2 With a revisit of 5 days it has become another key datasource [10]

1httpslandcopernicuseupan-europeancorine-land-cover

2

International programs such as the European Space Agencyrsquos (ESArsquos) Coper-nicus [11] behind the Sentinel satellites are taking significant efforts to makeEarth Observation (EO) data freely available for commercial and non-commercialpurposes The Copernicus programme is a multi-billion investment by the EUand ESA aiming to provide essential services based on accurate and timely datafrom satellites Its main goals are to improve the ways of managing the environ-ment to help mitigate the effects of climate change and enable the creation ofnew applications and services such as for environmental monitoring and urbandevelopment

The provision of free satellite data for mapping in the framework of suchprograms also enables application of methods that could not be used earlierbecause they require vast and representative datasets for training for exam-ple deep learning In recent years deep learning has brought about severalbreakthroughs in the pattern recognition and computer vision [12 13 14] Thesuccess of the deep learning models can be attributed to both their deep mul-tilayer structure creating nonlinear functions and hence allowing extractionof hierarchical sets of features from the data and to their end-to-end trainingscheme allowing for simultaneous learning of the features from the raw inputand predicting the task at hand In this way the heuristic feature design isremoved This is advantageous compared to the traditional machine learningmethods (eg support vector machine (SVM) and random forest (RF)) whichrequire a multistage feature engineering procedure In deep learning such aprocedure is replaced with a simple end-to-end deep learning workflow One ofthe key requirements for successful application of deep learning methods is alarge amount of data available from which the model can automatically learnthe representative features for the prediction task [15] The availability of opensatellite imagery such as from Copernicus offers just that

The land cover mapping systems based solely on optical imagery suffer fromissues with cloud cover and weather conditions especially in the tropical areasand with a lack of illumination in the polar regions Among the free satellitedata offered by the Copernicus programme are synthetic aperture radar (SAR)images from the Sentinel-1 satellites SAR is an active radar imaging techniquethat does not require illumination and is not hampered by cloud-cover due topenetration of microwave radiation through clouds The utilisation of SAR im-agery hence would allow mapping such challenging regions and increasing themapping frequency in the orchestrated efforts like CORINE One of the signifi-cant issues previously was the absence of timely and consistent high-resolutionwide-area SAR coverage With the advent of Copernicus Sentinel-1 satellitesoperational use of imaging radar data becomes feasible for consistent wide-areamapping The first Copernicus Sentinel-1 mission was launched in April 2014Firstly Sentinel-1A alone was capable of providing C-band SAR data in upto four imaging modes with a revisit time of 12 days Once Sentinel-1B waslaunched in 2016 the revisit time has reduced to 6 days [11]

We studied wide-area SAR-based land cover mapping by methodologicallycombining the two discussed recent advances the improved methods for large-scale image processing using deep learning and the availability of SAR imagery

3

from the Sentinel-1 satellites

11 Land Cover Mapping with SAR Imagery

While using optical satellite data is still a mainstream in land cover andland cover change mapping [16 17 18 19 5] SAR data has been getting moreattention as more suitable sensors appear To date several studies have in-vestigated the suitability of SAR for land cover mapping focusing primarilyat L-band C-band and X-band polarimetric [20 21] multitemporal and multi-frequency SAR [22] [23] as well as at the combined use of SAR and opticaldata [24 25 26 27 28]

Independently of the imagery used the majority of land cover mappingmethods so far are based on traditional supervised classification techniques [29]Widely used classifiers are support vector machines (SVM) decision trees ran-dom forests (RF) and maximum likelihood classifiers (MLC) [9 7 30 29]However extracting a large number of features needed for classification iethe feature engineering process is time intensive and requires lots of expertwork in developing an fine-tuning classification approaches This limits the ap-plications of the traditional supervised classification methods on a large scale

Backscattered microwave radiation is composed from multiple fundamen-tal scattering mechanisms determined by the vegetation water content surfaceroughness soil moisture horizontal and vertical structure of the scatterers aswell as imaging geometry during the datatake Accordingly a considerable num-ber of classes can be differentiated in SAR images [31 20] However majorityof SAR classification algorithms use fixed SAR observables (eg polarimetricfeatures) to infer specific land cover classes despite the large temporal seasonaland environmental variability between different geographical sites This leadsto a lack of generalisation capability and a need to use extensive and represen-tative reference data and SAR data The latter means the need to account fornot only all variation of SAR signatures for a specific class but also the needto consider seasonal effects as changes in moisture of soil and vegetation aswell as frozen state of land [32] that strongly affect SAR backscatter On theother hand when using multitemporal approaches such seasonal variation canbe used as an effective discriminator among different land cover classes

When using exclusively SAR data for land cover mapping reported accuracyoften turn out to be relatively low for operational land cover mapping andchange monitoring Methodologically reported solutions utilized supervisedapproaches linking SAR observables and class labels to pixels superpixels orobjects in parametric or nonparametric manner [20 21 33 31 19 34 35 3637 38 39 40 41]

However tackling relatively large number of classes was considered only inseveral studies often with relatively low reported accuracies For instance in[42] it was found that P-band PolSAR imagery was unsatisfactory for mappingmore than five classes with the iterated conditional mode (ICM) contextualclassifier applied to several polarimetric parameters They achieved a Kappavalue of 768 when mapping four classes Classification performance of the L-band ALOS PALSAR and C-band RADARSAT-2 images was compared in the

4

moist tropics [43] L-band provided 722 classification accuracy for a coarseland cover classification system and C-band only 547In a similar study inLao PDR ALOS PALSAR data were found to be mostly useful as a back-up option to optical ALOS AVNIR data[19] Multitemporal Radarsat1 datawith HH polarization and ENVISAT ASAR data with VV polarization (bothC-band) were studied for classification of five land cover classes in Korea withmoderate accuracy [44] Waske et al [30] applied boosted decision tree andrandom forests to multi-temporal C-band SAR data reaching accuracy up to84 Several studies [21] [20] investigated specifically SAR suitability for theboreal zone with reported accuracy up to 83 depending on the classificationtechnique (maximum likelihood probabilistic neural networks etc) when fivesuper-classes (based on CORINE data) were used

The potential of Sentinel-1 imagery for CORINE-type thematic mappingwas assessed in a study that used Sentinel-1A data for mapping class composi-tion in Thuringia [31] Long-time series of Sentinel-1 SAR data are consideredespecially suitable for crop type mapping [45 46 47 48] with increased numberof studies attempting land cover mapping in general [49 50]

Moreover as Sentinel-1 data are presently the only free source of SAR dataroutinely available for wide-area mapping at no cost for users it seems thebest candidate data for development and testing of improved classification ap-proaches Previous studies indicate a necessity for developing and testing newmethodological approaches that can be effectively applied to a large-scale anddeal with the variability of SAR observables concerning ecological land coverclasses We suggest adopting state-of-the-art deep learning approaches for thispurpose

12 Deep Learning in Remote Sensing

The advances in the deep learning techniques for computer vision in particu-lar Convolutional Neural Networks (CNNs) [12 51] have led to the applicationof deep learning in several domains that rely on computer vision Examples areself-driving cars image search engines medical diagnostics and augmented re-ality Deep learning approaches are starting to be adopted in the remote sensingdomain as well

Zhu et al [52] provide a discussion on the specificities of remote sensingimagery (compared to ordinary RGB images) that result in specific deep learningchallenges in this area For example remote sensing data are georeferencedoften multi-modal with particular imaging geometries there are interpretationdifficulties and the ground-truth or labelled data needed for deep learning is stilloften lacking Additionally most of the state-of-the-art CNNs are developed forthree-channel input images (ie RGB) and so certain adaptations are neededto apply them on the remote sensing data [53]

Nevertheless several research papers tackling remote sensing imagery withdeep learning techniques were published in recent years Zhang et al [54] reviewthe field and find applications to image preprocessing [55] target recognition[56 57] classification [58 59 60] and semantic feature extraction and sceneunderstanding [61 62 63 64] The deep learning approaches are found to

5

outperform standard methods applied up to several years ago ie SVMs andRFs [65 66]

When it comes to deep learning for land cover or land use mapping applica-tions have been limited to optical satellite [53 67 53 59] or aerial [68] imageryand hyperspectral imagery [60 67] owing to the similarity of these images toordinary RGB images studied in computer vision [53]

When it comes to SAR images Zhang et al [54] found that there is alreadya significant success in applying deep learning techniques for object detectionand scene understanding However for classification on SAR data applicationsare scarce and advances are yet to be achieved [54] Published research includesdeep learning for crop types mapping using combined optical and SAR imagery[66] as well as the use of SAR images exclusively [69] However those meth-ods applied deep learning only to some part of the task at hand and not inan end-to-end fashion Wang et al [59] for instance just used deep neuralnetworks for merging over-segmented elements which are produced using tradi-tional segmentation approaches Similarly Tuia et al [60] applied deep learningto extract hierarchical features which they further fed into a multiclass logis-tic classifier Duan et al [69] used first unsupervised deep learning and thencontinued with a couple of supervised labelling tasks Chen et al [67] applieda deep learning technique (stacked autoencoders) to discover the features butthen they still used traditional machine learning (SVM logistic regression) forthe image segmentation Unlike those methods we applied the deep learning inan end-to-end fashion ie from supervised feature extraction to the land classprediction This makes our approach more flexible robust and adaptable tothe SAR data from new regions as well as more efficient

When it comes to the end-to-end approaches for SAR classification thereare several studies where the focus was on a small area and on a specific landcover mapping task For instance Mohammadimanesh et al [70] used fullypolarimetric SAR (PolSAR) imagery from RADARSAT-2 to classify wetlandcomplexes for which they have developed a specifically tailored semantic seg-mentation model However the authors have tackled a small test area (around10km times 10km) and have not explored how their model generalizes to othertypes of areas Similarly Wang et al [71] adapted existing CNN models into afixed-feature-size CNN that they have evaluated on a small scale RADARSAT-2 or AIRSAR (ie airborne SAR data) In both cases they have used moreadvanced fully polarimetric SAR imagery at better resolution as opposed toSentinel-1 which means the imagery with more input information to the deeplearning models Importantly it is only Sentinel-1 that offers open operationaldata with up to every 6 days repeat Because of this the discussed approachesdeveloped and tested specifically for PolSAR imagery at a higher resolution can-not be considered applicable for a wide-area mapping yet Similarly Ahishaliet al [72] applied end-to-end approaches to SAR data They have also workedwith single polarized COSMO-SkyMed imagery However all the imagery theyconsidered was X-band SAR contrary to C-band imagery we use here and againonly on a small scale The authors proposed a compact CNN model that theyfound had outperformed some of the off-the-shelf CNN methods such as Xcep-

6

tion and Inception-ResNet-v2 It is important to note that compared to thosethe off-the-shelf models that we consider here are more sophisticated semanticsegmentation models some which employ Xception or ResNet but only as amodule in their feature extraction parts

In summary the capabilities of the deep learning approaches for the clas-sification have been investigated to a lesser extent for SAR imagery than foroptical imagery The attempts to use SAR data for land cover classificationwere relatively limited in scope area or the number of used SAR scenes Par-ticularly wide-area land cover mapping was never addressed The reasons forthis include comparatively poor availability of SAR data compared to optical(greatly changed since the advent of Sentinel-1) complex scattering mechanismsleading to ambiguous SAR signatures for different classes (which makes SARimage segmentation more difficult than the optical image segmentation [73])as well as the speckle noise caused by the coherent nature of the SAR imagingprocess

13 Study goals

Present study addresses the identified research gap of a lack of wide-arealand cover mapping using SAR data We achieve this by training fine-tuningand evaluating a set of suitable state-of-the-art deep learning models from theclass of semantic segmentation models and demonstrating their suitability forland cover mapping Moreover our work is the first to examine and demonstratethe suitability of deep learning for land cover mapping from SAR images on alarge-scale ie across the whole country

Specifically we applied the semantic segmentation models on the SAR im-ages taken over Finland We focused on the images of Finland because thereis the land cover mask of a suitable resolution that can be used for training la-bels (ie CORINE) The training is performed with the seven selected models(SegNet [74] PSPNet [75] BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80]FRRN-B [81] and FC-DenseNet [82]) which have encoder modules pre-trainedon the large RGB image corpus ImageNet 20122 Those models are freely avail-able3 In other words we reused semantic segmentation architectures developedfor natural images with pre-trained weights on RGB images and we fine-tunedthem on the SAR images Our results (with over 90 overall accuracy) demon-strate the effectiveness of the deep learning methods for the land cover mappingwith SAR data

In addition to having the high-resolution CORINE map that can serve asa ground-truth (labels) for training the deep learning models another reasonthat we selected Finland is that it is a northern country with frequent cloudcover which means that using optical imagery for wide-area mapping is oftennot feasible Hence demonstrating the usability of radar imagery for land covermapping is particularly useful here

2httpimage-netorgchallengesLSVRC20123httpsgithubcomtensorflowmodelstreemasterresearchslimpre-trained-models

7

Even though Finland is a relatively small country there is still considerableheterogeneity present in terms of land cover types and how they appear in theSAR images Namely SAR backscattering is sensitive to several factors thatlikely differ between countries or between distant areas within a country Ex-amples of such factors are moisture levels terrain variation and soil roughnesspredominant forest biome and tree species proportions types of shorter vege-tation and crops in agricultural areas and specific types of built environmentsWe did not contain our study to a particular area of Finland where the SARsignatures might be consistent but we obtained the images across a wide areaHence demonstrating the suitability of our methods in this setting hints at theirpotential generalizability Namely it means that similarly as we did here thesemantic segmentation models can be fine-tuned and adapted to work on datafrom other regions or countries with the different SAR signatures

On the other hand we took into account that the same areas will appearsomewhat different on the SAR images across different seasons Scattering char-acteristics of many land cover classes change considerably between the summerand winter months and sometimes even within weeks during seasonal changes[83 20] These include snow cover and melting freezethaw of soils ice onrivers and lakes crops growing cycle leaf-on and leaf-off conditions in decidu-ous trees Because of this in the present study we focused only on the scenesacquired during the summer season However we did allow our training datasetto contain several images of the same area taken during different times duringthe summer season This way not only spatial but also temporal variation ofSAR signatures is introduced

Our contributions can be summarised as follows

C1 We thoroughly benchmarked seven selected state-of-the-art semantic seg-mentation models covering a diverse set of approaches for land cover map-ping using Sentinel-1 SAR imagery We provide insights on the best mod-els in terms of both accuracy and efficiency

C2 Our results demonstrate the power of deep learning models along with SARimagery for accurate wide-area land cover mapping in the cloud obscuredboreal zone and polar regions

2 Deep Learning Terminology

As with other representation learning models the power of deep learningmodels comes from their ability to learn rich features (representations) fromthe dataset automatically [15] The automatically learned features are usu-ally better suited for the classifier or other task at hand than hand-engineeredfeatures Moreover thanks to a large number of layers employed it has beenproven that the deep learning networks can discover hierarchical representationsso that the higher level representations are expressed in terms of the lower levelsimpler ones For example in the case of images the low-level representationsthat can be discovered are edges and using them the mid-level ones can be

8

Table 1 Terminology for the main tasks in computer vision and its use in the deep learningversus remote sensing communities

Deep learn-ing

Remote sens-ing

Task description

Classification[13]

Image An-notationScene Under-standing SceneClassification

Assigning a whole image to a classbased on what is (mainly) repre-sented in it for example a ship oiltank sea or land

Object De-tectionLocalizationRecognition[15]

AutomaticTargetRecognition

Detecting (and localizing) presenceof particular objects in an imageThese algorithms can detect sev-eral objects in the given image Forinstance ship detection in SAR im-ages

SemanticSegmenta-tion [84]

Image Clas-sificationClustering

Assigning a class to each pixel in animage based on which image objector region it belongs to These algo-rithms not only detect and localizeobjects in the image but also out-put their exact areas and bound-aries

expressed such as corners and shapes and this helps to express the high-levelrepresentations such as object elements and their identities [15]

The deep learning models in computer vision can be grouped according totheir main task in three categories In Table 1 we provide a description forthose categories However the deep learning terminology for those tasks doesnot always correspond well to the terminology used in the remote sensing com-munity Relevant to our task a number of remote sensing studies uses the termclassification in the context of land cover mapping inherently meaning pixel- orregion-based classification which in the deep learning terminology correspondsto semantic segmentation In Table 1 we list the corresponding terminologythat we encountered being used for each task in both the deep learning and re-mote sensing communities This is helpful to disambiguate when talking aboutdifferent and recognize when talking about the same tasks in the two domainsIn the present study the focus is on land cover mapping Hence we tackle se-mantic segmentation in the deep learning terminology and image classificationie pixel-wise classification in the remote sensing terminology

Convolutional Neural Networks (CNNs) [12 13] are the deep learning model

9

that has transformed the computer vision field Initially CNNs are defined totackle the image classification (deep learning terminology) task Their structureis inspired by the visual perception of mammals [85] CNNs are named afterone of the most important operations which is particular to them comparedto other neural networks ie convolutions Mathematically a convolution is acombination of two other functions A convolution is applied on the image bysliding a filter (kernel) of a given size k times k which is usually small comparedto the original image size Different purpose filters are designed for examplea filter can serve as a vertical edge detector Application of such a convolutionoperation on an image results in a feature map Another common operationthat is usually applied after a convolution is pooling Pooling reduces the sizeof the feature map while providing robustness to the extracted features Com-mon CNNs end with a fully connected layer which is used for final predictionscommonly for image classification By employing a large number of convolu-tional layers (depth) CNNs are able to extract gradually more complex andabstract features The first CNN model to demonstrate their impressive effec-tiveness in image classification (of hand digits) was LeNet [12] Several yearslater Krizhevsky et al [13] developed AlexNet the deep CNN to dramaticallypush the limits of classification accuracy on the famous ImageNet computer vi-sion challenge [86] Since then a variety of CNN-based models are proposedSome notable examples are VGG network [14] ResNet [87] DenseNet [88] andInception V3 [89] The effectiveness of CNNs has been also proven in variousreal-world applications [90 91]

Once CNNs have proven their effectiveness to classify images Long et al[84] were the first to discover how they can augment a given CNN model tomake it suitable for the semantic segmentation task ndash they proposed the FullyConvolutional Neural Network (FCN) framework This generic architecture canbe used to adapt any CNN network used for classification into a segmentationmodel Namely the authors have shown that by replacing the last fully con-nected layer with an appropriate convolutions layer so that they will upsampleand restore the resolution of the input at the output layer CNNs can be trans-formed to classify each individual pixel (instead of the whole image) The basicidea is as follows The encoder is used to learn the feature maps and is usu-ally based on a pre-trained deep CNN for classification such as ResNet VGGor Inception The decoder part serves to upsample the discriminative featuresthat the encoder has learned from the coarse-level feature map to the fine pixellevel Long et al [84] have shown that this upsampling (backward) computa-tion can be efficiently performed using backward convolutions (deconvolutions)Moreover this means that the specific CNN models such as those mentionedabove can all be incorporated in the FCN framework for segmentation givingrise to FCN-AlexNet [84] FCN-ResNet [87] FCN-VGG16 [84] FCN-DenseNet[82] etc We present a diagram of the generic FCN architecture in Figure 1

10

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 2: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

suggest the DenseNet-based models are the first candidate for this task

Keywords synthetic aperture radar deep learning semantic segmentationland cover mapping image classification Sentinel-1 data C-band CORINE

1 Introduction

Mapping of land cover and its change has a critical role in the characteri-zation of the current state of the environment The changes in land cover canbe due either to human activities as well as caused by climate changes on aregional scale The land cover on the other hand affects climate through waterand energy exchange with the atmosphere and by changing carbon balance Be-cause of this land cover belongs to the Essential Climate Variables [1] Hencetimely assessment of land cover and its change is one of the most importantapplications in satellite remote sensing Thematic maps are needed annuallyfor various purposes in medium resolution (circa 250 m) with less than 15measurement uncertainty and in high resolution (10-30 m) with less than 5uncertainty

CORINE Land Cover (CLC) is a notable example of a consistent Pan-European land cover mapping initiative [2 3] coordinated by the EuropeanEnvironment Agency (EEA)1 CORINE stands for coordination of informationon the environment It is an on-going long-term effort providing most harmo-nized classification land cover data in Europe with updates approximately every4 years The CORINE maps are an important source of land cover informationsuitable for operational purposes also for various customer groups in EuropeIt has altogether 44 classes though many of them are not strictly ecologicalclasses but rather land use classes On the continental scale CORINE providesa harmonized map with 25 ha minimum mapping unit (MMU) for areal phe-nomena and a minimum width of 100 m for linear phenomena [4] Nationalland cover maps in the CORINE framework can exhibit smaller mapping unitsIn Finland the latest revision of CORINE land cover map at the time of thisstudy was 2012 round produced by the Finnish Environment Institute Themap has an MMU of 20mtimes 20m and was produced by a combined automatedand manual interpretation of the high-resolution satellite optical data followedby the data integration with existing basic map layers [5]

The state-of-the-art approaches used for land cover mapping mainly rely onthe satellite optical imagery The key role is played by the Landsat imageryoften augmented by the MODIS or SPOT-5 imagery [6 7 8] Other sources ofinformation employed for land cover mapping include Digital Elevation Models(DEM) and very high-resolution imagery [9] When it comes to the large-scaleand multitemporal land cover mapping a more recent optical imagery source isCopernicus Sentinel-2 With a revisit of 5 days it has become another key datasource [10]

1httpslandcopernicuseupan-europeancorine-land-cover

2

International programs such as the European Space Agencyrsquos (ESArsquos) Coper-nicus [11] behind the Sentinel satellites are taking significant efforts to makeEarth Observation (EO) data freely available for commercial and non-commercialpurposes The Copernicus programme is a multi-billion investment by the EUand ESA aiming to provide essential services based on accurate and timely datafrom satellites Its main goals are to improve the ways of managing the environ-ment to help mitigate the effects of climate change and enable the creation ofnew applications and services such as for environmental monitoring and urbandevelopment

The provision of free satellite data for mapping in the framework of suchprograms also enables application of methods that could not be used earlierbecause they require vast and representative datasets for training for exam-ple deep learning In recent years deep learning has brought about severalbreakthroughs in the pattern recognition and computer vision [12 13 14] Thesuccess of the deep learning models can be attributed to both their deep mul-tilayer structure creating nonlinear functions and hence allowing extractionof hierarchical sets of features from the data and to their end-to-end trainingscheme allowing for simultaneous learning of the features from the raw inputand predicting the task at hand In this way the heuristic feature design isremoved This is advantageous compared to the traditional machine learningmethods (eg support vector machine (SVM) and random forest (RF)) whichrequire a multistage feature engineering procedure In deep learning such aprocedure is replaced with a simple end-to-end deep learning workflow One ofthe key requirements for successful application of deep learning methods is alarge amount of data available from which the model can automatically learnthe representative features for the prediction task [15] The availability of opensatellite imagery such as from Copernicus offers just that

The land cover mapping systems based solely on optical imagery suffer fromissues with cloud cover and weather conditions especially in the tropical areasand with a lack of illumination in the polar regions Among the free satellitedata offered by the Copernicus programme are synthetic aperture radar (SAR)images from the Sentinel-1 satellites SAR is an active radar imaging techniquethat does not require illumination and is not hampered by cloud-cover due topenetration of microwave radiation through clouds The utilisation of SAR im-agery hence would allow mapping such challenging regions and increasing themapping frequency in the orchestrated efforts like CORINE One of the signifi-cant issues previously was the absence of timely and consistent high-resolutionwide-area SAR coverage With the advent of Copernicus Sentinel-1 satellitesoperational use of imaging radar data becomes feasible for consistent wide-areamapping The first Copernicus Sentinel-1 mission was launched in April 2014Firstly Sentinel-1A alone was capable of providing C-band SAR data in upto four imaging modes with a revisit time of 12 days Once Sentinel-1B waslaunched in 2016 the revisit time has reduced to 6 days [11]

We studied wide-area SAR-based land cover mapping by methodologicallycombining the two discussed recent advances the improved methods for large-scale image processing using deep learning and the availability of SAR imagery

3

from the Sentinel-1 satellites

11 Land Cover Mapping with SAR Imagery

While using optical satellite data is still a mainstream in land cover andland cover change mapping [16 17 18 19 5] SAR data has been getting moreattention as more suitable sensors appear To date several studies have in-vestigated the suitability of SAR for land cover mapping focusing primarilyat L-band C-band and X-band polarimetric [20 21] multitemporal and multi-frequency SAR [22] [23] as well as at the combined use of SAR and opticaldata [24 25 26 27 28]

Independently of the imagery used the majority of land cover mappingmethods so far are based on traditional supervised classification techniques [29]Widely used classifiers are support vector machines (SVM) decision trees ran-dom forests (RF) and maximum likelihood classifiers (MLC) [9 7 30 29]However extracting a large number of features needed for classification iethe feature engineering process is time intensive and requires lots of expertwork in developing an fine-tuning classification approaches This limits the ap-plications of the traditional supervised classification methods on a large scale

Backscattered microwave radiation is composed from multiple fundamen-tal scattering mechanisms determined by the vegetation water content surfaceroughness soil moisture horizontal and vertical structure of the scatterers aswell as imaging geometry during the datatake Accordingly a considerable num-ber of classes can be differentiated in SAR images [31 20] However majorityof SAR classification algorithms use fixed SAR observables (eg polarimetricfeatures) to infer specific land cover classes despite the large temporal seasonaland environmental variability between different geographical sites This leadsto a lack of generalisation capability and a need to use extensive and represen-tative reference data and SAR data The latter means the need to account fornot only all variation of SAR signatures for a specific class but also the needto consider seasonal effects as changes in moisture of soil and vegetation aswell as frozen state of land [32] that strongly affect SAR backscatter On theother hand when using multitemporal approaches such seasonal variation canbe used as an effective discriminator among different land cover classes

When using exclusively SAR data for land cover mapping reported accuracyoften turn out to be relatively low for operational land cover mapping andchange monitoring Methodologically reported solutions utilized supervisedapproaches linking SAR observables and class labels to pixels superpixels orobjects in parametric or nonparametric manner [20 21 33 31 19 34 35 3637 38 39 40 41]

However tackling relatively large number of classes was considered only inseveral studies often with relatively low reported accuracies For instance in[42] it was found that P-band PolSAR imagery was unsatisfactory for mappingmore than five classes with the iterated conditional mode (ICM) contextualclassifier applied to several polarimetric parameters They achieved a Kappavalue of 768 when mapping four classes Classification performance of the L-band ALOS PALSAR and C-band RADARSAT-2 images was compared in the

4

moist tropics [43] L-band provided 722 classification accuracy for a coarseland cover classification system and C-band only 547In a similar study inLao PDR ALOS PALSAR data were found to be mostly useful as a back-up option to optical ALOS AVNIR data[19] Multitemporal Radarsat1 datawith HH polarization and ENVISAT ASAR data with VV polarization (bothC-band) were studied for classification of five land cover classes in Korea withmoderate accuracy [44] Waske et al [30] applied boosted decision tree andrandom forests to multi-temporal C-band SAR data reaching accuracy up to84 Several studies [21] [20] investigated specifically SAR suitability for theboreal zone with reported accuracy up to 83 depending on the classificationtechnique (maximum likelihood probabilistic neural networks etc) when fivesuper-classes (based on CORINE data) were used

The potential of Sentinel-1 imagery for CORINE-type thematic mappingwas assessed in a study that used Sentinel-1A data for mapping class composi-tion in Thuringia [31] Long-time series of Sentinel-1 SAR data are consideredespecially suitable for crop type mapping [45 46 47 48] with increased numberof studies attempting land cover mapping in general [49 50]

Moreover as Sentinel-1 data are presently the only free source of SAR dataroutinely available for wide-area mapping at no cost for users it seems thebest candidate data for development and testing of improved classification ap-proaches Previous studies indicate a necessity for developing and testing newmethodological approaches that can be effectively applied to a large-scale anddeal with the variability of SAR observables concerning ecological land coverclasses We suggest adopting state-of-the-art deep learning approaches for thispurpose

12 Deep Learning in Remote Sensing

The advances in the deep learning techniques for computer vision in particu-lar Convolutional Neural Networks (CNNs) [12 51] have led to the applicationof deep learning in several domains that rely on computer vision Examples areself-driving cars image search engines medical diagnostics and augmented re-ality Deep learning approaches are starting to be adopted in the remote sensingdomain as well

Zhu et al [52] provide a discussion on the specificities of remote sensingimagery (compared to ordinary RGB images) that result in specific deep learningchallenges in this area For example remote sensing data are georeferencedoften multi-modal with particular imaging geometries there are interpretationdifficulties and the ground-truth or labelled data needed for deep learning is stilloften lacking Additionally most of the state-of-the-art CNNs are developed forthree-channel input images (ie RGB) and so certain adaptations are neededto apply them on the remote sensing data [53]

Nevertheless several research papers tackling remote sensing imagery withdeep learning techniques were published in recent years Zhang et al [54] reviewthe field and find applications to image preprocessing [55] target recognition[56 57] classification [58 59 60] and semantic feature extraction and sceneunderstanding [61 62 63 64] The deep learning approaches are found to

5

outperform standard methods applied up to several years ago ie SVMs andRFs [65 66]

When it comes to deep learning for land cover or land use mapping applica-tions have been limited to optical satellite [53 67 53 59] or aerial [68] imageryand hyperspectral imagery [60 67] owing to the similarity of these images toordinary RGB images studied in computer vision [53]

When it comes to SAR images Zhang et al [54] found that there is alreadya significant success in applying deep learning techniques for object detectionand scene understanding However for classification on SAR data applicationsare scarce and advances are yet to be achieved [54] Published research includesdeep learning for crop types mapping using combined optical and SAR imagery[66] as well as the use of SAR images exclusively [69] However those meth-ods applied deep learning only to some part of the task at hand and not inan end-to-end fashion Wang et al [59] for instance just used deep neuralnetworks for merging over-segmented elements which are produced using tradi-tional segmentation approaches Similarly Tuia et al [60] applied deep learningto extract hierarchical features which they further fed into a multiclass logis-tic classifier Duan et al [69] used first unsupervised deep learning and thencontinued with a couple of supervised labelling tasks Chen et al [67] applieda deep learning technique (stacked autoencoders) to discover the features butthen they still used traditional machine learning (SVM logistic regression) forthe image segmentation Unlike those methods we applied the deep learning inan end-to-end fashion ie from supervised feature extraction to the land classprediction This makes our approach more flexible robust and adaptable tothe SAR data from new regions as well as more efficient

When it comes to the end-to-end approaches for SAR classification thereare several studies where the focus was on a small area and on a specific landcover mapping task For instance Mohammadimanesh et al [70] used fullypolarimetric SAR (PolSAR) imagery from RADARSAT-2 to classify wetlandcomplexes for which they have developed a specifically tailored semantic seg-mentation model However the authors have tackled a small test area (around10km times 10km) and have not explored how their model generalizes to othertypes of areas Similarly Wang et al [71] adapted existing CNN models into afixed-feature-size CNN that they have evaluated on a small scale RADARSAT-2 or AIRSAR (ie airborne SAR data) In both cases they have used moreadvanced fully polarimetric SAR imagery at better resolution as opposed toSentinel-1 which means the imagery with more input information to the deeplearning models Importantly it is only Sentinel-1 that offers open operationaldata with up to every 6 days repeat Because of this the discussed approachesdeveloped and tested specifically for PolSAR imagery at a higher resolution can-not be considered applicable for a wide-area mapping yet Similarly Ahishaliet al [72] applied end-to-end approaches to SAR data They have also workedwith single polarized COSMO-SkyMed imagery However all the imagery theyconsidered was X-band SAR contrary to C-band imagery we use here and againonly on a small scale The authors proposed a compact CNN model that theyfound had outperformed some of the off-the-shelf CNN methods such as Xcep-

6

tion and Inception-ResNet-v2 It is important to note that compared to thosethe off-the-shelf models that we consider here are more sophisticated semanticsegmentation models some which employ Xception or ResNet but only as amodule in their feature extraction parts

In summary the capabilities of the deep learning approaches for the clas-sification have been investigated to a lesser extent for SAR imagery than foroptical imagery The attempts to use SAR data for land cover classificationwere relatively limited in scope area or the number of used SAR scenes Par-ticularly wide-area land cover mapping was never addressed The reasons forthis include comparatively poor availability of SAR data compared to optical(greatly changed since the advent of Sentinel-1) complex scattering mechanismsleading to ambiguous SAR signatures for different classes (which makes SARimage segmentation more difficult than the optical image segmentation [73])as well as the speckle noise caused by the coherent nature of the SAR imagingprocess

13 Study goals

Present study addresses the identified research gap of a lack of wide-arealand cover mapping using SAR data We achieve this by training fine-tuningand evaluating a set of suitable state-of-the-art deep learning models from theclass of semantic segmentation models and demonstrating their suitability forland cover mapping Moreover our work is the first to examine and demonstratethe suitability of deep learning for land cover mapping from SAR images on alarge-scale ie across the whole country

Specifically we applied the semantic segmentation models on the SAR im-ages taken over Finland We focused on the images of Finland because thereis the land cover mask of a suitable resolution that can be used for training la-bels (ie CORINE) The training is performed with the seven selected models(SegNet [74] PSPNet [75] BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80]FRRN-B [81] and FC-DenseNet [82]) which have encoder modules pre-trainedon the large RGB image corpus ImageNet 20122 Those models are freely avail-able3 In other words we reused semantic segmentation architectures developedfor natural images with pre-trained weights on RGB images and we fine-tunedthem on the SAR images Our results (with over 90 overall accuracy) demon-strate the effectiveness of the deep learning methods for the land cover mappingwith SAR data

In addition to having the high-resolution CORINE map that can serve asa ground-truth (labels) for training the deep learning models another reasonthat we selected Finland is that it is a northern country with frequent cloudcover which means that using optical imagery for wide-area mapping is oftennot feasible Hence demonstrating the usability of radar imagery for land covermapping is particularly useful here

2httpimage-netorgchallengesLSVRC20123httpsgithubcomtensorflowmodelstreemasterresearchslimpre-trained-models

7

Even though Finland is a relatively small country there is still considerableheterogeneity present in terms of land cover types and how they appear in theSAR images Namely SAR backscattering is sensitive to several factors thatlikely differ between countries or between distant areas within a country Ex-amples of such factors are moisture levels terrain variation and soil roughnesspredominant forest biome and tree species proportions types of shorter vege-tation and crops in agricultural areas and specific types of built environmentsWe did not contain our study to a particular area of Finland where the SARsignatures might be consistent but we obtained the images across a wide areaHence demonstrating the suitability of our methods in this setting hints at theirpotential generalizability Namely it means that similarly as we did here thesemantic segmentation models can be fine-tuned and adapted to work on datafrom other regions or countries with the different SAR signatures

On the other hand we took into account that the same areas will appearsomewhat different on the SAR images across different seasons Scattering char-acteristics of many land cover classes change considerably between the summerand winter months and sometimes even within weeks during seasonal changes[83 20] These include snow cover and melting freezethaw of soils ice onrivers and lakes crops growing cycle leaf-on and leaf-off conditions in decidu-ous trees Because of this in the present study we focused only on the scenesacquired during the summer season However we did allow our training datasetto contain several images of the same area taken during different times duringthe summer season This way not only spatial but also temporal variation ofSAR signatures is introduced

Our contributions can be summarised as follows

C1 We thoroughly benchmarked seven selected state-of-the-art semantic seg-mentation models covering a diverse set of approaches for land cover map-ping using Sentinel-1 SAR imagery We provide insights on the best mod-els in terms of both accuracy and efficiency

C2 Our results demonstrate the power of deep learning models along with SARimagery for accurate wide-area land cover mapping in the cloud obscuredboreal zone and polar regions

2 Deep Learning Terminology

As with other representation learning models the power of deep learningmodels comes from their ability to learn rich features (representations) fromthe dataset automatically [15] The automatically learned features are usu-ally better suited for the classifier or other task at hand than hand-engineeredfeatures Moreover thanks to a large number of layers employed it has beenproven that the deep learning networks can discover hierarchical representationsso that the higher level representations are expressed in terms of the lower levelsimpler ones For example in the case of images the low-level representationsthat can be discovered are edges and using them the mid-level ones can be

8

Table 1 Terminology for the main tasks in computer vision and its use in the deep learningversus remote sensing communities

Deep learn-ing

Remote sens-ing

Task description

Classification[13]

Image An-notationScene Under-standing SceneClassification

Assigning a whole image to a classbased on what is (mainly) repre-sented in it for example a ship oiltank sea or land

Object De-tectionLocalizationRecognition[15]

AutomaticTargetRecognition

Detecting (and localizing) presenceof particular objects in an imageThese algorithms can detect sev-eral objects in the given image Forinstance ship detection in SAR im-ages

SemanticSegmenta-tion [84]

Image Clas-sificationClustering

Assigning a class to each pixel in animage based on which image objector region it belongs to These algo-rithms not only detect and localizeobjects in the image but also out-put their exact areas and bound-aries

expressed such as corners and shapes and this helps to express the high-levelrepresentations such as object elements and their identities [15]

The deep learning models in computer vision can be grouped according totheir main task in three categories In Table 1 we provide a description forthose categories However the deep learning terminology for those tasks doesnot always correspond well to the terminology used in the remote sensing com-munity Relevant to our task a number of remote sensing studies uses the termclassification in the context of land cover mapping inherently meaning pixel- orregion-based classification which in the deep learning terminology correspondsto semantic segmentation In Table 1 we list the corresponding terminologythat we encountered being used for each task in both the deep learning and re-mote sensing communities This is helpful to disambiguate when talking aboutdifferent and recognize when talking about the same tasks in the two domainsIn the present study the focus is on land cover mapping Hence we tackle se-mantic segmentation in the deep learning terminology and image classificationie pixel-wise classification in the remote sensing terminology

Convolutional Neural Networks (CNNs) [12 13] are the deep learning model

9

that has transformed the computer vision field Initially CNNs are defined totackle the image classification (deep learning terminology) task Their structureis inspired by the visual perception of mammals [85] CNNs are named afterone of the most important operations which is particular to them comparedto other neural networks ie convolutions Mathematically a convolution is acombination of two other functions A convolution is applied on the image bysliding a filter (kernel) of a given size k times k which is usually small comparedto the original image size Different purpose filters are designed for examplea filter can serve as a vertical edge detector Application of such a convolutionoperation on an image results in a feature map Another common operationthat is usually applied after a convolution is pooling Pooling reduces the sizeof the feature map while providing robustness to the extracted features Com-mon CNNs end with a fully connected layer which is used for final predictionscommonly for image classification By employing a large number of convolu-tional layers (depth) CNNs are able to extract gradually more complex andabstract features The first CNN model to demonstrate their impressive effec-tiveness in image classification (of hand digits) was LeNet [12] Several yearslater Krizhevsky et al [13] developed AlexNet the deep CNN to dramaticallypush the limits of classification accuracy on the famous ImageNet computer vi-sion challenge [86] Since then a variety of CNN-based models are proposedSome notable examples are VGG network [14] ResNet [87] DenseNet [88] andInception V3 [89] The effectiveness of CNNs has been also proven in variousreal-world applications [90 91]

Once CNNs have proven their effectiveness to classify images Long et al[84] were the first to discover how they can augment a given CNN model tomake it suitable for the semantic segmentation task ndash they proposed the FullyConvolutional Neural Network (FCN) framework This generic architecture canbe used to adapt any CNN network used for classification into a segmentationmodel Namely the authors have shown that by replacing the last fully con-nected layer with an appropriate convolutions layer so that they will upsampleand restore the resolution of the input at the output layer CNNs can be trans-formed to classify each individual pixel (instead of the whole image) The basicidea is as follows The encoder is used to learn the feature maps and is usu-ally based on a pre-trained deep CNN for classification such as ResNet VGGor Inception The decoder part serves to upsample the discriminative featuresthat the encoder has learned from the coarse-level feature map to the fine pixellevel Long et al [84] have shown that this upsampling (backward) computa-tion can be efficiently performed using backward convolutions (deconvolutions)Moreover this means that the specific CNN models such as those mentionedabove can all be incorporated in the FCN framework for segmentation givingrise to FCN-AlexNet [84] FCN-ResNet [87] FCN-VGG16 [84] FCN-DenseNet[82] etc We present a diagram of the generic FCN architecture in Figure 1

10

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 3: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

International programs such as the European Space Agencyrsquos (ESArsquos) Coper-nicus [11] behind the Sentinel satellites are taking significant efforts to makeEarth Observation (EO) data freely available for commercial and non-commercialpurposes The Copernicus programme is a multi-billion investment by the EUand ESA aiming to provide essential services based on accurate and timely datafrom satellites Its main goals are to improve the ways of managing the environ-ment to help mitigate the effects of climate change and enable the creation ofnew applications and services such as for environmental monitoring and urbandevelopment

The provision of free satellite data for mapping in the framework of suchprograms also enables application of methods that could not be used earlierbecause they require vast and representative datasets for training for exam-ple deep learning In recent years deep learning has brought about severalbreakthroughs in the pattern recognition and computer vision [12 13 14] Thesuccess of the deep learning models can be attributed to both their deep mul-tilayer structure creating nonlinear functions and hence allowing extractionof hierarchical sets of features from the data and to their end-to-end trainingscheme allowing for simultaneous learning of the features from the raw inputand predicting the task at hand In this way the heuristic feature design isremoved This is advantageous compared to the traditional machine learningmethods (eg support vector machine (SVM) and random forest (RF)) whichrequire a multistage feature engineering procedure In deep learning such aprocedure is replaced with a simple end-to-end deep learning workflow One ofthe key requirements for successful application of deep learning methods is alarge amount of data available from which the model can automatically learnthe representative features for the prediction task [15] The availability of opensatellite imagery such as from Copernicus offers just that

The land cover mapping systems based solely on optical imagery suffer fromissues with cloud cover and weather conditions especially in the tropical areasand with a lack of illumination in the polar regions Among the free satellitedata offered by the Copernicus programme are synthetic aperture radar (SAR)images from the Sentinel-1 satellites SAR is an active radar imaging techniquethat does not require illumination and is not hampered by cloud-cover due topenetration of microwave radiation through clouds The utilisation of SAR im-agery hence would allow mapping such challenging regions and increasing themapping frequency in the orchestrated efforts like CORINE One of the signifi-cant issues previously was the absence of timely and consistent high-resolutionwide-area SAR coverage With the advent of Copernicus Sentinel-1 satellitesoperational use of imaging radar data becomes feasible for consistent wide-areamapping The first Copernicus Sentinel-1 mission was launched in April 2014Firstly Sentinel-1A alone was capable of providing C-band SAR data in upto four imaging modes with a revisit time of 12 days Once Sentinel-1B waslaunched in 2016 the revisit time has reduced to 6 days [11]

We studied wide-area SAR-based land cover mapping by methodologicallycombining the two discussed recent advances the improved methods for large-scale image processing using deep learning and the availability of SAR imagery

3

from the Sentinel-1 satellites

11 Land Cover Mapping with SAR Imagery

While using optical satellite data is still a mainstream in land cover andland cover change mapping [16 17 18 19 5] SAR data has been getting moreattention as more suitable sensors appear To date several studies have in-vestigated the suitability of SAR for land cover mapping focusing primarilyat L-band C-band and X-band polarimetric [20 21] multitemporal and multi-frequency SAR [22] [23] as well as at the combined use of SAR and opticaldata [24 25 26 27 28]

Independently of the imagery used the majority of land cover mappingmethods so far are based on traditional supervised classification techniques [29]Widely used classifiers are support vector machines (SVM) decision trees ran-dom forests (RF) and maximum likelihood classifiers (MLC) [9 7 30 29]However extracting a large number of features needed for classification iethe feature engineering process is time intensive and requires lots of expertwork in developing an fine-tuning classification approaches This limits the ap-plications of the traditional supervised classification methods on a large scale

Backscattered microwave radiation is composed from multiple fundamen-tal scattering mechanisms determined by the vegetation water content surfaceroughness soil moisture horizontal and vertical structure of the scatterers aswell as imaging geometry during the datatake Accordingly a considerable num-ber of classes can be differentiated in SAR images [31 20] However majorityof SAR classification algorithms use fixed SAR observables (eg polarimetricfeatures) to infer specific land cover classes despite the large temporal seasonaland environmental variability between different geographical sites This leadsto a lack of generalisation capability and a need to use extensive and represen-tative reference data and SAR data The latter means the need to account fornot only all variation of SAR signatures for a specific class but also the needto consider seasonal effects as changes in moisture of soil and vegetation aswell as frozen state of land [32] that strongly affect SAR backscatter On theother hand when using multitemporal approaches such seasonal variation canbe used as an effective discriminator among different land cover classes

When using exclusively SAR data for land cover mapping reported accuracyoften turn out to be relatively low for operational land cover mapping andchange monitoring Methodologically reported solutions utilized supervisedapproaches linking SAR observables and class labels to pixels superpixels orobjects in parametric or nonparametric manner [20 21 33 31 19 34 35 3637 38 39 40 41]

However tackling relatively large number of classes was considered only inseveral studies often with relatively low reported accuracies For instance in[42] it was found that P-band PolSAR imagery was unsatisfactory for mappingmore than five classes with the iterated conditional mode (ICM) contextualclassifier applied to several polarimetric parameters They achieved a Kappavalue of 768 when mapping four classes Classification performance of the L-band ALOS PALSAR and C-band RADARSAT-2 images was compared in the

4

moist tropics [43] L-band provided 722 classification accuracy for a coarseland cover classification system and C-band only 547In a similar study inLao PDR ALOS PALSAR data were found to be mostly useful as a back-up option to optical ALOS AVNIR data[19] Multitemporal Radarsat1 datawith HH polarization and ENVISAT ASAR data with VV polarization (bothC-band) were studied for classification of five land cover classes in Korea withmoderate accuracy [44] Waske et al [30] applied boosted decision tree andrandom forests to multi-temporal C-band SAR data reaching accuracy up to84 Several studies [21] [20] investigated specifically SAR suitability for theboreal zone with reported accuracy up to 83 depending on the classificationtechnique (maximum likelihood probabilistic neural networks etc) when fivesuper-classes (based on CORINE data) were used

The potential of Sentinel-1 imagery for CORINE-type thematic mappingwas assessed in a study that used Sentinel-1A data for mapping class composi-tion in Thuringia [31] Long-time series of Sentinel-1 SAR data are consideredespecially suitable for crop type mapping [45 46 47 48] with increased numberof studies attempting land cover mapping in general [49 50]

Moreover as Sentinel-1 data are presently the only free source of SAR dataroutinely available for wide-area mapping at no cost for users it seems thebest candidate data for development and testing of improved classification ap-proaches Previous studies indicate a necessity for developing and testing newmethodological approaches that can be effectively applied to a large-scale anddeal with the variability of SAR observables concerning ecological land coverclasses We suggest adopting state-of-the-art deep learning approaches for thispurpose

12 Deep Learning in Remote Sensing

The advances in the deep learning techniques for computer vision in particu-lar Convolutional Neural Networks (CNNs) [12 51] have led to the applicationof deep learning in several domains that rely on computer vision Examples areself-driving cars image search engines medical diagnostics and augmented re-ality Deep learning approaches are starting to be adopted in the remote sensingdomain as well

Zhu et al [52] provide a discussion on the specificities of remote sensingimagery (compared to ordinary RGB images) that result in specific deep learningchallenges in this area For example remote sensing data are georeferencedoften multi-modal with particular imaging geometries there are interpretationdifficulties and the ground-truth or labelled data needed for deep learning is stilloften lacking Additionally most of the state-of-the-art CNNs are developed forthree-channel input images (ie RGB) and so certain adaptations are neededto apply them on the remote sensing data [53]

Nevertheless several research papers tackling remote sensing imagery withdeep learning techniques were published in recent years Zhang et al [54] reviewthe field and find applications to image preprocessing [55] target recognition[56 57] classification [58 59 60] and semantic feature extraction and sceneunderstanding [61 62 63 64] The deep learning approaches are found to

5

outperform standard methods applied up to several years ago ie SVMs andRFs [65 66]

When it comes to deep learning for land cover or land use mapping applica-tions have been limited to optical satellite [53 67 53 59] or aerial [68] imageryand hyperspectral imagery [60 67] owing to the similarity of these images toordinary RGB images studied in computer vision [53]

When it comes to SAR images Zhang et al [54] found that there is alreadya significant success in applying deep learning techniques for object detectionand scene understanding However for classification on SAR data applicationsare scarce and advances are yet to be achieved [54] Published research includesdeep learning for crop types mapping using combined optical and SAR imagery[66] as well as the use of SAR images exclusively [69] However those meth-ods applied deep learning only to some part of the task at hand and not inan end-to-end fashion Wang et al [59] for instance just used deep neuralnetworks for merging over-segmented elements which are produced using tradi-tional segmentation approaches Similarly Tuia et al [60] applied deep learningto extract hierarchical features which they further fed into a multiclass logis-tic classifier Duan et al [69] used first unsupervised deep learning and thencontinued with a couple of supervised labelling tasks Chen et al [67] applieda deep learning technique (stacked autoencoders) to discover the features butthen they still used traditional machine learning (SVM logistic regression) forthe image segmentation Unlike those methods we applied the deep learning inan end-to-end fashion ie from supervised feature extraction to the land classprediction This makes our approach more flexible robust and adaptable tothe SAR data from new regions as well as more efficient

When it comes to the end-to-end approaches for SAR classification thereare several studies where the focus was on a small area and on a specific landcover mapping task For instance Mohammadimanesh et al [70] used fullypolarimetric SAR (PolSAR) imagery from RADARSAT-2 to classify wetlandcomplexes for which they have developed a specifically tailored semantic seg-mentation model However the authors have tackled a small test area (around10km times 10km) and have not explored how their model generalizes to othertypes of areas Similarly Wang et al [71] adapted existing CNN models into afixed-feature-size CNN that they have evaluated on a small scale RADARSAT-2 or AIRSAR (ie airborne SAR data) In both cases they have used moreadvanced fully polarimetric SAR imagery at better resolution as opposed toSentinel-1 which means the imagery with more input information to the deeplearning models Importantly it is only Sentinel-1 that offers open operationaldata with up to every 6 days repeat Because of this the discussed approachesdeveloped and tested specifically for PolSAR imagery at a higher resolution can-not be considered applicable for a wide-area mapping yet Similarly Ahishaliet al [72] applied end-to-end approaches to SAR data They have also workedwith single polarized COSMO-SkyMed imagery However all the imagery theyconsidered was X-band SAR contrary to C-band imagery we use here and againonly on a small scale The authors proposed a compact CNN model that theyfound had outperformed some of the off-the-shelf CNN methods such as Xcep-

6

tion and Inception-ResNet-v2 It is important to note that compared to thosethe off-the-shelf models that we consider here are more sophisticated semanticsegmentation models some which employ Xception or ResNet but only as amodule in their feature extraction parts

In summary the capabilities of the deep learning approaches for the clas-sification have been investigated to a lesser extent for SAR imagery than foroptical imagery The attempts to use SAR data for land cover classificationwere relatively limited in scope area or the number of used SAR scenes Par-ticularly wide-area land cover mapping was never addressed The reasons forthis include comparatively poor availability of SAR data compared to optical(greatly changed since the advent of Sentinel-1) complex scattering mechanismsleading to ambiguous SAR signatures for different classes (which makes SARimage segmentation more difficult than the optical image segmentation [73])as well as the speckle noise caused by the coherent nature of the SAR imagingprocess

13 Study goals

Present study addresses the identified research gap of a lack of wide-arealand cover mapping using SAR data We achieve this by training fine-tuningand evaluating a set of suitable state-of-the-art deep learning models from theclass of semantic segmentation models and demonstrating their suitability forland cover mapping Moreover our work is the first to examine and demonstratethe suitability of deep learning for land cover mapping from SAR images on alarge-scale ie across the whole country

Specifically we applied the semantic segmentation models on the SAR im-ages taken over Finland We focused on the images of Finland because thereis the land cover mask of a suitable resolution that can be used for training la-bels (ie CORINE) The training is performed with the seven selected models(SegNet [74] PSPNet [75] BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80]FRRN-B [81] and FC-DenseNet [82]) which have encoder modules pre-trainedon the large RGB image corpus ImageNet 20122 Those models are freely avail-able3 In other words we reused semantic segmentation architectures developedfor natural images with pre-trained weights on RGB images and we fine-tunedthem on the SAR images Our results (with over 90 overall accuracy) demon-strate the effectiveness of the deep learning methods for the land cover mappingwith SAR data

In addition to having the high-resolution CORINE map that can serve asa ground-truth (labels) for training the deep learning models another reasonthat we selected Finland is that it is a northern country with frequent cloudcover which means that using optical imagery for wide-area mapping is oftennot feasible Hence demonstrating the usability of radar imagery for land covermapping is particularly useful here

2httpimage-netorgchallengesLSVRC20123httpsgithubcomtensorflowmodelstreemasterresearchslimpre-trained-models

7

Even though Finland is a relatively small country there is still considerableheterogeneity present in terms of land cover types and how they appear in theSAR images Namely SAR backscattering is sensitive to several factors thatlikely differ between countries or between distant areas within a country Ex-amples of such factors are moisture levels terrain variation and soil roughnesspredominant forest biome and tree species proportions types of shorter vege-tation and crops in agricultural areas and specific types of built environmentsWe did not contain our study to a particular area of Finland where the SARsignatures might be consistent but we obtained the images across a wide areaHence demonstrating the suitability of our methods in this setting hints at theirpotential generalizability Namely it means that similarly as we did here thesemantic segmentation models can be fine-tuned and adapted to work on datafrom other regions or countries with the different SAR signatures

On the other hand we took into account that the same areas will appearsomewhat different on the SAR images across different seasons Scattering char-acteristics of many land cover classes change considerably between the summerand winter months and sometimes even within weeks during seasonal changes[83 20] These include snow cover and melting freezethaw of soils ice onrivers and lakes crops growing cycle leaf-on and leaf-off conditions in decidu-ous trees Because of this in the present study we focused only on the scenesacquired during the summer season However we did allow our training datasetto contain several images of the same area taken during different times duringthe summer season This way not only spatial but also temporal variation ofSAR signatures is introduced

Our contributions can be summarised as follows

C1 We thoroughly benchmarked seven selected state-of-the-art semantic seg-mentation models covering a diverse set of approaches for land cover map-ping using Sentinel-1 SAR imagery We provide insights on the best mod-els in terms of both accuracy and efficiency

C2 Our results demonstrate the power of deep learning models along with SARimagery for accurate wide-area land cover mapping in the cloud obscuredboreal zone and polar regions

2 Deep Learning Terminology

As with other representation learning models the power of deep learningmodels comes from their ability to learn rich features (representations) fromthe dataset automatically [15] The automatically learned features are usu-ally better suited for the classifier or other task at hand than hand-engineeredfeatures Moreover thanks to a large number of layers employed it has beenproven that the deep learning networks can discover hierarchical representationsso that the higher level representations are expressed in terms of the lower levelsimpler ones For example in the case of images the low-level representationsthat can be discovered are edges and using them the mid-level ones can be

8

Table 1 Terminology for the main tasks in computer vision and its use in the deep learningversus remote sensing communities

Deep learn-ing

Remote sens-ing

Task description

Classification[13]

Image An-notationScene Under-standing SceneClassification

Assigning a whole image to a classbased on what is (mainly) repre-sented in it for example a ship oiltank sea or land

Object De-tectionLocalizationRecognition[15]

AutomaticTargetRecognition

Detecting (and localizing) presenceof particular objects in an imageThese algorithms can detect sev-eral objects in the given image Forinstance ship detection in SAR im-ages

SemanticSegmenta-tion [84]

Image Clas-sificationClustering

Assigning a class to each pixel in animage based on which image objector region it belongs to These algo-rithms not only detect and localizeobjects in the image but also out-put their exact areas and bound-aries

expressed such as corners and shapes and this helps to express the high-levelrepresentations such as object elements and their identities [15]

The deep learning models in computer vision can be grouped according totheir main task in three categories In Table 1 we provide a description forthose categories However the deep learning terminology for those tasks doesnot always correspond well to the terminology used in the remote sensing com-munity Relevant to our task a number of remote sensing studies uses the termclassification in the context of land cover mapping inherently meaning pixel- orregion-based classification which in the deep learning terminology correspondsto semantic segmentation In Table 1 we list the corresponding terminologythat we encountered being used for each task in both the deep learning and re-mote sensing communities This is helpful to disambiguate when talking aboutdifferent and recognize when talking about the same tasks in the two domainsIn the present study the focus is on land cover mapping Hence we tackle se-mantic segmentation in the deep learning terminology and image classificationie pixel-wise classification in the remote sensing terminology

Convolutional Neural Networks (CNNs) [12 13] are the deep learning model

9

that has transformed the computer vision field Initially CNNs are defined totackle the image classification (deep learning terminology) task Their structureis inspired by the visual perception of mammals [85] CNNs are named afterone of the most important operations which is particular to them comparedto other neural networks ie convolutions Mathematically a convolution is acombination of two other functions A convolution is applied on the image bysliding a filter (kernel) of a given size k times k which is usually small comparedto the original image size Different purpose filters are designed for examplea filter can serve as a vertical edge detector Application of such a convolutionoperation on an image results in a feature map Another common operationthat is usually applied after a convolution is pooling Pooling reduces the sizeof the feature map while providing robustness to the extracted features Com-mon CNNs end with a fully connected layer which is used for final predictionscommonly for image classification By employing a large number of convolu-tional layers (depth) CNNs are able to extract gradually more complex andabstract features The first CNN model to demonstrate their impressive effec-tiveness in image classification (of hand digits) was LeNet [12] Several yearslater Krizhevsky et al [13] developed AlexNet the deep CNN to dramaticallypush the limits of classification accuracy on the famous ImageNet computer vi-sion challenge [86] Since then a variety of CNN-based models are proposedSome notable examples are VGG network [14] ResNet [87] DenseNet [88] andInception V3 [89] The effectiveness of CNNs has been also proven in variousreal-world applications [90 91]

Once CNNs have proven their effectiveness to classify images Long et al[84] were the first to discover how they can augment a given CNN model tomake it suitable for the semantic segmentation task ndash they proposed the FullyConvolutional Neural Network (FCN) framework This generic architecture canbe used to adapt any CNN network used for classification into a segmentationmodel Namely the authors have shown that by replacing the last fully con-nected layer with an appropriate convolutions layer so that they will upsampleand restore the resolution of the input at the output layer CNNs can be trans-formed to classify each individual pixel (instead of the whole image) The basicidea is as follows The encoder is used to learn the feature maps and is usu-ally based on a pre-trained deep CNN for classification such as ResNet VGGor Inception The decoder part serves to upsample the discriminative featuresthat the encoder has learned from the coarse-level feature map to the fine pixellevel Long et al [84] have shown that this upsampling (backward) computa-tion can be efficiently performed using backward convolutions (deconvolutions)Moreover this means that the specific CNN models such as those mentionedabove can all be incorporated in the FCN framework for segmentation givingrise to FCN-AlexNet [84] FCN-ResNet [87] FCN-VGG16 [84] FCN-DenseNet[82] etc We present a diagram of the generic FCN architecture in Figure 1

10

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 4: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

from the Sentinel-1 satellites

11 Land Cover Mapping with SAR Imagery

While using optical satellite data is still a mainstream in land cover andland cover change mapping [16 17 18 19 5] SAR data has been getting moreattention as more suitable sensors appear To date several studies have in-vestigated the suitability of SAR for land cover mapping focusing primarilyat L-band C-band and X-band polarimetric [20 21] multitemporal and multi-frequency SAR [22] [23] as well as at the combined use of SAR and opticaldata [24 25 26 27 28]

Independently of the imagery used the majority of land cover mappingmethods so far are based on traditional supervised classification techniques [29]Widely used classifiers are support vector machines (SVM) decision trees ran-dom forests (RF) and maximum likelihood classifiers (MLC) [9 7 30 29]However extracting a large number of features needed for classification iethe feature engineering process is time intensive and requires lots of expertwork in developing an fine-tuning classification approaches This limits the ap-plications of the traditional supervised classification methods on a large scale

Backscattered microwave radiation is composed from multiple fundamen-tal scattering mechanisms determined by the vegetation water content surfaceroughness soil moisture horizontal and vertical structure of the scatterers aswell as imaging geometry during the datatake Accordingly a considerable num-ber of classes can be differentiated in SAR images [31 20] However majorityof SAR classification algorithms use fixed SAR observables (eg polarimetricfeatures) to infer specific land cover classes despite the large temporal seasonaland environmental variability between different geographical sites This leadsto a lack of generalisation capability and a need to use extensive and represen-tative reference data and SAR data The latter means the need to account fornot only all variation of SAR signatures for a specific class but also the needto consider seasonal effects as changes in moisture of soil and vegetation aswell as frozen state of land [32] that strongly affect SAR backscatter On theother hand when using multitemporal approaches such seasonal variation canbe used as an effective discriminator among different land cover classes

When using exclusively SAR data for land cover mapping reported accuracyoften turn out to be relatively low for operational land cover mapping andchange monitoring Methodologically reported solutions utilized supervisedapproaches linking SAR observables and class labels to pixels superpixels orobjects in parametric or nonparametric manner [20 21 33 31 19 34 35 3637 38 39 40 41]

However tackling relatively large number of classes was considered only inseveral studies often with relatively low reported accuracies For instance in[42] it was found that P-band PolSAR imagery was unsatisfactory for mappingmore than five classes with the iterated conditional mode (ICM) contextualclassifier applied to several polarimetric parameters They achieved a Kappavalue of 768 when mapping four classes Classification performance of the L-band ALOS PALSAR and C-band RADARSAT-2 images was compared in the

4

moist tropics [43] L-band provided 722 classification accuracy for a coarseland cover classification system and C-band only 547In a similar study inLao PDR ALOS PALSAR data were found to be mostly useful as a back-up option to optical ALOS AVNIR data[19] Multitemporal Radarsat1 datawith HH polarization and ENVISAT ASAR data with VV polarization (bothC-band) were studied for classification of five land cover classes in Korea withmoderate accuracy [44] Waske et al [30] applied boosted decision tree andrandom forests to multi-temporal C-band SAR data reaching accuracy up to84 Several studies [21] [20] investigated specifically SAR suitability for theboreal zone with reported accuracy up to 83 depending on the classificationtechnique (maximum likelihood probabilistic neural networks etc) when fivesuper-classes (based on CORINE data) were used

The potential of Sentinel-1 imagery for CORINE-type thematic mappingwas assessed in a study that used Sentinel-1A data for mapping class composi-tion in Thuringia [31] Long-time series of Sentinel-1 SAR data are consideredespecially suitable for crop type mapping [45 46 47 48] with increased numberof studies attempting land cover mapping in general [49 50]

Moreover as Sentinel-1 data are presently the only free source of SAR dataroutinely available for wide-area mapping at no cost for users it seems thebest candidate data for development and testing of improved classification ap-proaches Previous studies indicate a necessity for developing and testing newmethodological approaches that can be effectively applied to a large-scale anddeal with the variability of SAR observables concerning ecological land coverclasses We suggest adopting state-of-the-art deep learning approaches for thispurpose

12 Deep Learning in Remote Sensing

The advances in the deep learning techniques for computer vision in particu-lar Convolutional Neural Networks (CNNs) [12 51] have led to the applicationof deep learning in several domains that rely on computer vision Examples areself-driving cars image search engines medical diagnostics and augmented re-ality Deep learning approaches are starting to be adopted in the remote sensingdomain as well

Zhu et al [52] provide a discussion on the specificities of remote sensingimagery (compared to ordinary RGB images) that result in specific deep learningchallenges in this area For example remote sensing data are georeferencedoften multi-modal with particular imaging geometries there are interpretationdifficulties and the ground-truth or labelled data needed for deep learning is stilloften lacking Additionally most of the state-of-the-art CNNs are developed forthree-channel input images (ie RGB) and so certain adaptations are neededto apply them on the remote sensing data [53]

Nevertheless several research papers tackling remote sensing imagery withdeep learning techniques were published in recent years Zhang et al [54] reviewthe field and find applications to image preprocessing [55] target recognition[56 57] classification [58 59 60] and semantic feature extraction and sceneunderstanding [61 62 63 64] The deep learning approaches are found to

5

outperform standard methods applied up to several years ago ie SVMs andRFs [65 66]

When it comes to deep learning for land cover or land use mapping applica-tions have been limited to optical satellite [53 67 53 59] or aerial [68] imageryand hyperspectral imagery [60 67] owing to the similarity of these images toordinary RGB images studied in computer vision [53]

When it comes to SAR images Zhang et al [54] found that there is alreadya significant success in applying deep learning techniques for object detectionand scene understanding However for classification on SAR data applicationsare scarce and advances are yet to be achieved [54] Published research includesdeep learning for crop types mapping using combined optical and SAR imagery[66] as well as the use of SAR images exclusively [69] However those meth-ods applied deep learning only to some part of the task at hand and not inan end-to-end fashion Wang et al [59] for instance just used deep neuralnetworks for merging over-segmented elements which are produced using tradi-tional segmentation approaches Similarly Tuia et al [60] applied deep learningto extract hierarchical features which they further fed into a multiclass logis-tic classifier Duan et al [69] used first unsupervised deep learning and thencontinued with a couple of supervised labelling tasks Chen et al [67] applieda deep learning technique (stacked autoencoders) to discover the features butthen they still used traditional machine learning (SVM logistic regression) forthe image segmentation Unlike those methods we applied the deep learning inan end-to-end fashion ie from supervised feature extraction to the land classprediction This makes our approach more flexible robust and adaptable tothe SAR data from new regions as well as more efficient

When it comes to the end-to-end approaches for SAR classification thereare several studies where the focus was on a small area and on a specific landcover mapping task For instance Mohammadimanesh et al [70] used fullypolarimetric SAR (PolSAR) imagery from RADARSAT-2 to classify wetlandcomplexes for which they have developed a specifically tailored semantic seg-mentation model However the authors have tackled a small test area (around10km times 10km) and have not explored how their model generalizes to othertypes of areas Similarly Wang et al [71] adapted existing CNN models into afixed-feature-size CNN that they have evaluated on a small scale RADARSAT-2 or AIRSAR (ie airborne SAR data) In both cases they have used moreadvanced fully polarimetric SAR imagery at better resolution as opposed toSentinel-1 which means the imagery with more input information to the deeplearning models Importantly it is only Sentinel-1 that offers open operationaldata with up to every 6 days repeat Because of this the discussed approachesdeveloped and tested specifically for PolSAR imagery at a higher resolution can-not be considered applicable for a wide-area mapping yet Similarly Ahishaliet al [72] applied end-to-end approaches to SAR data They have also workedwith single polarized COSMO-SkyMed imagery However all the imagery theyconsidered was X-band SAR contrary to C-band imagery we use here and againonly on a small scale The authors proposed a compact CNN model that theyfound had outperformed some of the off-the-shelf CNN methods such as Xcep-

6

tion and Inception-ResNet-v2 It is important to note that compared to thosethe off-the-shelf models that we consider here are more sophisticated semanticsegmentation models some which employ Xception or ResNet but only as amodule in their feature extraction parts

In summary the capabilities of the deep learning approaches for the clas-sification have been investigated to a lesser extent for SAR imagery than foroptical imagery The attempts to use SAR data for land cover classificationwere relatively limited in scope area or the number of used SAR scenes Par-ticularly wide-area land cover mapping was never addressed The reasons forthis include comparatively poor availability of SAR data compared to optical(greatly changed since the advent of Sentinel-1) complex scattering mechanismsleading to ambiguous SAR signatures for different classes (which makes SARimage segmentation more difficult than the optical image segmentation [73])as well as the speckle noise caused by the coherent nature of the SAR imagingprocess

13 Study goals

Present study addresses the identified research gap of a lack of wide-arealand cover mapping using SAR data We achieve this by training fine-tuningand evaluating a set of suitable state-of-the-art deep learning models from theclass of semantic segmentation models and demonstrating their suitability forland cover mapping Moreover our work is the first to examine and demonstratethe suitability of deep learning for land cover mapping from SAR images on alarge-scale ie across the whole country

Specifically we applied the semantic segmentation models on the SAR im-ages taken over Finland We focused on the images of Finland because thereis the land cover mask of a suitable resolution that can be used for training la-bels (ie CORINE) The training is performed with the seven selected models(SegNet [74] PSPNet [75] BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80]FRRN-B [81] and FC-DenseNet [82]) which have encoder modules pre-trainedon the large RGB image corpus ImageNet 20122 Those models are freely avail-able3 In other words we reused semantic segmentation architectures developedfor natural images with pre-trained weights on RGB images and we fine-tunedthem on the SAR images Our results (with over 90 overall accuracy) demon-strate the effectiveness of the deep learning methods for the land cover mappingwith SAR data

In addition to having the high-resolution CORINE map that can serve asa ground-truth (labels) for training the deep learning models another reasonthat we selected Finland is that it is a northern country with frequent cloudcover which means that using optical imagery for wide-area mapping is oftennot feasible Hence demonstrating the usability of radar imagery for land covermapping is particularly useful here

2httpimage-netorgchallengesLSVRC20123httpsgithubcomtensorflowmodelstreemasterresearchslimpre-trained-models

7

Even though Finland is a relatively small country there is still considerableheterogeneity present in terms of land cover types and how they appear in theSAR images Namely SAR backscattering is sensitive to several factors thatlikely differ between countries or between distant areas within a country Ex-amples of such factors are moisture levels terrain variation and soil roughnesspredominant forest biome and tree species proportions types of shorter vege-tation and crops in agricultural areas and specific types of built environmentsWe did not contain our study to a particular area of Finland where the SARsignatures might be consistent but we obtained the images across a wide areaHence demonstrating the suitability of our methods in this setting hints at theirpotential generalizability Namely it means that similarly as we did here thesemantic segmentation models can be fine-tuned and adapted to work on datafrom other regions or countries with the different SAR signatures

On the other hand we took into account that the same areas will appearsomewhat different on the SAR images across different seasons Scattering char-acteristics of many land cover classes change considerably between the summerand winter months and sometimes even within weeks during seasonal changes[83 20] These include snow cover and melting freezethaw of soils ice onrivers and lakes crops growing cycle leaf-on and leaf-off conditions in decidu-ous trees Because of this in the present study we focused only on the scenesacquired during the summer season However we did allow our training datasetto contain several images of the same area taken during different times duringthe summer season This way not only spatial but also temporal variation ofSAR signatures is introduced

Our contributions can be summarised as follows

C1 We thoroughly benchmarked seven selected state-of-the-art semantic seg-mentation models covering a diverse set of approaches for land cover map-ping using Sentinel-1 SAR imagery We provide insights on the best mod-els in terms of both accuracy and efficiency

C2 Our results demonstrate the power of deep learning models along with SARimagery for accurate wide-area land cover mapping in the cloud obscuredboreal zone and polar regions

2 Deep Learning Terminology

As with other representation learning models the power of deep learningmodels comes from their ability to learn rich features (representations) fromthe dataset automatically [15] The automatically learned features are usu-ally better suited for the classifier or other task at hand than hand-engineeredfeatures Moreover thanks to a large number of layers employed it has beenproven that the deep learning networks can discover hierarchical representationsso that the higher level representations are expressed in terms of the lower levelsimpler ones For example in the case of images the low-level representationsthat can be discovered are edges and using them the mid-level ones can be

8

Table 1 Terminology for the main tasks in computer vision and its use in the deep learningversus remote sensing communities

Deep learn-ing

Remote sens-ing

Task description

Classification[13]

Image An-notationScene Under-standing SceneClassification

Assigning a whole image to a classbased on what is (mainly) repre-sented in it for example a ship oiltank sea or land

Object De-tectionLocalizationRecognition[15]

AutomaticTargetRecognition

Detecting (and localizing) presenceof particular objects in an imageThese algorithms can detect sev-eral objects in the given image Forinstance ship detection in SAR im-ages

SemanticSegmenta-tion [84]

Image Clas-sificationClustering

Assigning a class to each pixel in animage based on which image objector region it belongs to These algo-rithms not only detect and localizeobjects in the image but also out-put their exact areas and bound-aries

expressed such as corners and shapes and this helps to express the high-levelrepresentations such as object elements and their identities [15]

The deep learning models in computer vision can be grouped according totheir main task in three categories In Table 1 we provide a description forthose categories However the deep learning terminology for those tasks doesnot always correspond well to the terminology used in the remote sensing com-munity Relevant to our task a number of remote sensing studies uses the termclassification in the context of land cover mapping inherently meaning pixel- orregion-based classification which in the deep learning terminology correspondsto semantic segmentation In Table 1 we list the corresponding terminologythat we encountered being used for each task in both the deep learning and re-mote sensing communities This is helpful to disambiguate when talking aboutdifferent and recognize when talking about the same tasks in the two domainsIn the present study the focus is on land cover mapping Hence we tackle se-mantic segmentation in the deep learning terminology and image classificationie pixel-wise classification in the remote sensing terminology

Convolutional Neural Networks (CNNs) [12 13] are the deep learning model

9

that has transformed the computer vision field Initially CNNs are defined totackle the image classification (deep learning terminology) task Their structureis inspired by the visual perception of mammals [85] CNNs are named afterone of the most important operations which is particular to them comparedto other neural networks ie convolutions Mathematically a convolution is acombination of two other functions A convolution is applied on the image bysliding a filter (kernel) of a given size k times k which is usually small comparedto the original image size Different purpose filters are designed for examplea filter can serve as a vertical edge detector Application of such a convolutionoperation on an image results in a feature map Another common operationthat is usually applied after a convolution is pooling Pooling reduces the sizeof the feature map while providing robustness to the extracted features Com-mon CNNs end with a fully connected layer which is used for final predictionscommonly for image classification By employing a large number of convolu-tional layers (depth) CNNs are able to extract gradually more complex andabstract features The first CNN model to demonstrate their impressive effec-tiveness in image classification (of hand digits) was LeNet [12] Several yearslater Krizhevsky et al [13] developed AlexNet the deep CNN to dramaticallypush the limits of classification accuracy on the famous ImageNet computer vi-sion challenge [86] Since then a variety of CNN-based models are proposedSome notable examples are VGG network [14] ResNet [87] DenseNet [88] andInception V3 [89] The effectiveness of CNNs has been also proven in variousreal-world applications [90 91]

Once CNNs have proven their effectiveness to classify images Long et al[84] were the first to discover how they can augment a given CNN model tomake it suitable for the semantic segmentation task ndash they proposed the FullyConvolutional Neural Network (FCN) framework This generic architecture canbe used to adapt any CNN network used for classification into a segmentationmodel Namely the authors have shown that by replacing the last fully con-nected layer with an appropriate convolutions layer so that they will upsampleand restore the resolution of the input at the output layer CNNs can be trans-formed to classify each individual pixel (instead of the whole image) The basicidea is as follows The encoder is used to learn the feature maps and is usu-ally based on a pre-trained deep CNN for classification such as ResNet VGGor Inception The decoder part serves to upsample the discriminative featuresthat the encoder has learned from the coarse-level feature map to the fine pixellevel Long et al [84] have shown that this upsampling (backward) computa-tion can be efficiently performed using backward convolutions (deconvolutions)Moreover this means that the specific CNN models such as those mentionedabove can all be incorporated in the FCN framework for segmentation givingrise to FCN-AlexNet [84] FCN-ResNet [87] FCN-VGG16 [84] FCN-DenseNet[82] etc We present a diagram of the generic FCN architecture in Figure 1

10

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 5: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

moist tropics [43] L-band provided 722 classification accuracy for a coarseland cover classification system and C-band only 547In a similar study inLao PDR ALOS PALSAR data were found to be mostly useful as a back-up option to optical ALOS AVNIR data[19] Multitemporal Radarsat1 datawith HH polarization and ENVISAT ASAR data with VV polarization (bothC-band) were studied for classification of five land cover classes in Korea withmoderate accuracy [44] Waske et al [30] applied boosted decision tree andrandom forests to multi-temporal C-band SAR data reaching accuracy up to84 Several studies [21] [20] investigated specifically SAR suitability for theboreal zone with reported accuracy up to 83 depending on the classificationtechnique (maximum likelihood probabilistic neural networks etc) when fivesuper-classes (based on CORINE data) were used

The potential of Sentinel-1 imagery for CORINE-type thematic mappingwas assessed in a study that used Sentinel-1A data for mapping class composi-tion in Thuringia [31] Long-time series of Sentinel-1 SAR data are consideredespecially suitable for crop type mapping [45 46 47 48] with increased numberof studies attempting land cover mapping in general [49 50]

Moreover as Sentinel-1 data are presently the only free source of SAR dataroutinely available for wide-area mapping at no cost for users it seems thebest candidate data for development and testing of improved classification ap-proaches Previous studies indicate a necessity for developing and testing newmethodological approaches that can be effectively applied to a large-scale anddeal with the variability of SAR observables concerning ecological land coverclasses We suggest adopting state-of-the-art deep learning approaches for thispurpose

12 Deep Learning in Remote Sensing

The advances in the deep learning techniques for computer vision in particu-lar Convolutional Neural Networks (CNNs) [12 51] have led to the applicationof deep learning in several domains that rely on computer vision Examples areself-driving cars image search engines medical diagnostics and augmented re-ality Deep learning approaches are starting to be adopted in the remote sensingdomain as well

Zhu et al [52] provide a discussion on the specificities of remote sensingimagery (compared to ordinary RGB images) that result in specific deep learningchallenges in this area For example remote sensing data are georeferencedoften multi-modal with particular imaging geometries there are interpretationdifficulties and the ground-truth or labelled data needed for deep learning is stilloften lacking Additionally most of the state-of-the-art CNNs are developed forthree-channel input images (ie RGB) and so certain adaptations are neededto apply them on the remote sensing data [53]

Nevertheless several research papers tackling remote sensing imagery withdeep learning techniques were published in recent years Zhang et al [54] reviewthe field and find applications to image preprocessing [55] target recognition[56 57] classification [58 59 60] and semantic feature extraction and sceneunderstanding [61 62 63 64] The deep learning approaches are found to

5

outperform standard methods applied up to several years ago ie SVMs andRFs [65 66]

When it comes to deep learning for land cover or land use mapping applica-tions have been limited to optical satellite [53 67 53 59] or aerial [68] imageryand hyperspectral imagery [60 67] owing to the similarity of these images toordinary RGB images studied in computer vision [53]

When it comes to SAR images Zhang et al [54] found that there is alreadya significant success in applying deep learning techniques for object detectionand scene understanding However for classification on SAR data applicationsare scarce and advances are yet to be achieved [54] Published research includesdeep learning for crop types mapping using combined optical and SAR imagery[66] as well as the use of SAR images exclusively [69] However those meth-ods applied deep learning only to some part of the task at hand and not inan end-to-end fashion Wang et al [59] for instance just used deep neuralnetworks for merging over-segmented elements which are produced using tradi-tional segmentation approaches Similarly Tuia et al [60] applied deep learningto extract hierarchical features which they further fed into a multiclass logis-tic classifier Duan et al [69] used first unsupervised deep learning and thencontinued with a couple of supervised labelling tasks Chen et al [67] applieda deep learning technique (stacked autoencoders) to discover the features butthen they still used traditional machine learning (SVM logistic regression) forthe image segmentation Unlike those methods we applied the deep learning inan end-to-end fashion ie from supervised feature extraction to the land classprediction This makes our approach more flexible robust and adaptable tothe SAR data from new regions as well as more efficient

When it comes to the end-to-end approaches for SAR classification thereare several studies where the focus was on a small area and on a specific landcover mapping task For instance Mohammadimanesh et al [70] used fullypolarimetric SAR (PolSAR) imagery from RADARSAT-2 to classify wetlandcomplexes for which they have developed a specifically tailored semantic seg-mentation model However the authors have tackled a small test area (around10km times 10km) and have not explored how their model generalizes to othertypes of areas Similarly Wang et al [71] adapted existing CNN models into afixed-feature-size CNN that they have evaluated on a small scale RADARSAT-2 or AIRSAR (ie airborne SAR data) In both cases they have used moreadvanced fully polarimetric SAR imagery at better resolution as opposed toSentinel-1 which means the imagery with more input information to the deeplearning models Importantly it is only Sentinel-1 that offers open operationaldata with up to every 6 days repeat Because of this the discussed approachesdeveloped and tested specifically for PolSAR imagery at a higher resolution can-not be considered applicable for a wide-area mapping yet Similarly Ahishaliet al [72] applied end-to-end approaches to SAR data They have also workedwith single polarized COSMO-SkyMed imagery However all the imagery theyconsidered was X-band SAR contrary to C-band imagery we use here and againonly on a small scale The authors proposed a compact CNN model that theyfound had outperformed some of the off-the-shelf CNN methods such as Xcep-

6

tion and Inception-ResNet-v2 It is important to note that compared to thosethe off-the-shelf models that we consider here are more sophisticated semanticsegmentation models some which employ Xception or ResNet but only as amodule in their feature extraction parts

In summary the capabilities of the deep learning approaches for the clas-sification have been investigated to a lesser extent for SAR imagery than foroptical imagery The attempts to use SAR data for land cover classificationwere relatively limited in scope area or the number of used SAR scenes Par-ticularly wide-area land cover mapping was never addressed The reasons forthis include comparatively poor availability of SAR data compared to optical(greatly changed since the advent of Sentinel-1) complex scattering mechanismsleading to ambiguous SAR signatures for different classes (which makes SARimage segmentation more difficult than the optical image segmentation [73])as well as the speckle noise caused by the coherent nature of the SAR imagingprocess

13 Study goals

Present study addresses the identified research gap of a lack of wide-arealand cover mapping using SAR data We achieve this by training fine-tuningand evaluating a set of suitable state-of-the-art deep learning models from theclass of semantic segmentation models and demonstrating their suitability forland cover mapping Moreover our work is the first to examine and demonstratethe suitability of deep learning for land cover mapping from SAR images on alarge-scale ie across the whole country

Specifically we applied the semantic segmentation models on the SAR im-ages taken over Finland We focused on the images of Finland because thereis the land cover mask of a suitable resolution that can be used for training la-bels (ie CORINE) The training is performed with the seven selected models(SegNet [74] PSPNet [75] BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80]FRRN-B [81] and FC-DenseNet [82]) which have encoder modules pre-trainedon the large RGB image corpus ImageNet 20122 Those models are freely avail-able3 In other words we reused semantic segmentation architectures developedfor natural images with pre-trained weights on RGB images and we fine-tunedthem on the SAR images Our results (with over 90 overall accuracy) demon-strate the effectiveness of the deep learning methods for the land cover mappingwith SAR data

In addition to having the high-resolution CORINE map that can serve asa ground-truth (labels) for training the deep learning models another reasonthat we selected Finland is that it is a northern country with frequent cloudcover which means that using optical imagery for wide-area mapping is oftennot feasible Hence demonstrating the usability of radar imagery for land covermapping is particularly useful here

2httpimage-netorgchallengesLSVRC20123httpsgithubcomtensorflowmodelstreemasterresearchslimpre-trained-models

7

Even though Finland is a relatively small country there is still considerableheterogeneity present in terms of land cover types and how they appear in theSAR images Namely SAR backscattering is sensitive to several factors thatlikely differ between countries or between distant areas within a country Ex-amples of such factors are moisture levels terrain variation and soil roughnesspredominant forest biome and tree species proportions types of shorter vege-tation and crops in agricultural areas and specific types of built environmentsWe did not contain our study to a particular area of Finland where the SARsignatures might be consistent but we obtained the images across a wide areaHence demonstrating the suitability of our methods in this setting hints at theirpotential generalizability Namely it means that similarly as we did here thesemantic segmentation models can be fine-tuned and adapted to work on datafrom other regions or countries with the different SAR signatures

On the other hand we took into account that the same areas will appearsomewhat different on the SAR images across different seasons Scattering char-acteristics of many land cover classes change considerably between the summerand winter months and sometimes even within weeks during seasonal changes[83 20] These include snow cover and melting freezethaw of soils ice onrivers and lakes crops growing cycle leaf-on and leaf-off conditions in decidu-ous trees Because of this in the present study we focused only on the scenesacquired during the summer season However we did allow our training datasetto contain several images of the same area taken during different times duringthe summer season This way not only spatial but also temporal variation ofSAR signatures is introduced

Our contributions can be summarised as follows

C1 We thoroughly benchmarked seven selected state-of-the-art semantic seg-mentation models covering a diverse set of approaches for land cover map-ping using Sentinel-1 SAR imagery We provide insights on the best mod-els in terms of both accuracy and efficiency

C2 Our results demonstrate the power of deep learning models along with SARimagery for accurate wide-area land cover mapping in the cloud obscuredboreal zone and polar regions

2 Deep Learning Terminology

As with other representation learning models the power of deep learningmodels comes from their ability to learn rich features (representations) fromthe dataset automatically [15] The automatically learned features are usu-ally better suited for the classifier or other task at hand than hand-engineeredfeatures Moreover thanks to a large number of layers employed it has beenproven that the deep learning networks can discover hierarchical representationsso that the higher level representations are expressed in terms of the lower levelsimpler ones For example in the case of images the low-level representationsthat can be discovered are edges and using them the mid-level ones can be

8

Table 1 Terminology for the main tasks in computer vision and its use in the deep learningversus remote sensing communities

Deep learn-ing

Remote sens-ing

Task description

Classification[13]

Image An-notationScene Under-standing SceneClassification

Assigning a whole image to a classbased on what is (mainly) repre-sented in it for example a ship oiltank sea or land

Object De-tectionLocalizationRecognition[15]

AutomaticTargetRecognition

Detecting (and localizing) presenceof particular objects in an imageThese algorithms can detect sev-eral objects in the given image Forinstance ship detection in SAR im-ages

SemanticSegmenta-tion [84]

Image Clas-sificationClustering

Assigning a class to each pixel in animage based on which image objector region it belongs to These algo-rithms not only detect and localizeobjects in the image but also out-put their exact areas and bound-aries

expressed such as corners and shapes and this helps to express the high-levelrepresentations such as object elements and their identities [15]

The deep learning models in computer vision can be grouped according totheir main task in three categories In Table 1 we provide a description forthose categories However the deep learning terminology for those tasks doesnot always correspond well to the terminology used in the remote sensing com-munity Relevant to our task a number of remote sensing studies uses the termclassification in the context of land cover mapping inherently meaning pixel- orregion-based classification which in the deep learning terminology correspondsto semantic segmentation In Table 1 we list the corresponding terminologythat we encountered being used for each task in both the deep learning and re-mote sensing communities This is helpful to disambiguate when talking aboutdifferent and recognize when talking about the same tasks in the two domainsIn the present study the focus is on land cover mapping Hence we tackle se-mantic segmentation in the deep learning terminology and image classificationie pixel-wise classification in the remote sensing terminology

Convolutional Neural Networks (CNNs) [12 13] are the deep learning model

9

that has transformed the computer vision field Initially CNNs are defined totackle the image classification (deep learning terminology) task Their structureis inspired by the visual perception of mammals [85] CNNs are named afterone of the most important operations which is particular to them comparedto other neural networks ie convolutions Mathematically a convolution is acombination of two other functions A convolution is applied on the image bysliding a filter (kernel) of a given size k times k which is usually small comparedto the original image size Different purpose filters are designed for examplea filter can serve as a vertical edge detector Application of such a convolutionoperation on an image results in a feature map Another common operationthat is usually applied after a convolution is pooling Pooling reduces the sizeof the feature map while providing robustness to the extracted features Com-mon CNNs end with a fully connected layer which is used for final predictionscommonly for image classification By employing a large number of convolu-tional layers (depth) CNNs are able to extract gradually more complex andabstract features The first CNN model to demonstrate their impressive effec-tiveness in image classification (of hand digits) was LeNet [12] Several yearslater Krizhevsky et al [13] developed AlexNet the deep CNN to dramaticallypush the limits of classification accuracy on the famous ImageNet computer vi-sion challenge [86] Since then a variety of CNN-based models are proposedSome notable examples are VGG network [14] ResNet [87] DenseNet [88] andInception V3 [89] The effectiveness of CNNs has been also proven in variousreal-world applications [90 91]

Once CNNs have proven their effectiveness to classify images Long et al[84] were the first to discover how they can augment a given CNN model tomake it suitable for the semantic segmentation task ndash they proposed the FullyConvolutional Neural Network (FCN) framework This generic architecture canbe used to adapt any CNN network used for classification into a segmentationmodel Namely the authors have shown that by replacing the last fully con-nected layer with an appropriate convolutions layer so that they will upsampleand restore the resolution of the input at the output layer CNNs can be trans-formed to classify each individual pixel (instead of the whole image) The basicidea is as follows The encoder is used to learn the feature maps and is usu-ally based on a pre-trained deep CNN for classification such as ResNet VGGor Inception The decoder part serves to upsample the discriminative featuresthat the encoder has learned from the coarse-level feature map to the fine pixellevel Long et al [84] have shown that this upsampling (backward) computa-tion can be efficiently performed using backward convolutions (deconvolutions)Moreover this means that the specific CNN models such as those mentionedabove can all be incorporated in the FCN framework for segmentation givingrise to FCN-AlexNet [84] FCN-ResNet [87] FCN-VGG16 [84] FCN-DenseNet[82] etc We present a diagram of the generic FCN architecture in Figure 1

10

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 6: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

outperform standard methods applied up to several years ago ie SVMs andRFs [65 66]

When it comes to deep learning for land cover or land use mapping applica-tions have been limited to optical satellite [53 67 53 59] or aerial [68] imageryand hyperspectral imagery [60 67] owing to the similarity of these images toordinary RGB images studied in computer vision [53]

When it comes to SAR images Zhang et al [54] found that there is alreadya significant success in applying deep learning techniques for object detectionand scene understanding However for classification on SAR data applicationsare scarce and advances are yet to be achieved [54] Published research includesdeep learning for crop types mapping using combined optical and SAR imagery[66] as well as the use of SAR images exclusively [69] However those meth-ods applied deep learning only to some part of the task at hand and not inan end-to-end fashion Wang et al [59] for instance just used deep neuralnetworks for merging over-segmented elements which are produced using tradi-tional segmentation approaches Similarly Tuia et al [60] applied deep learningto extract hierarchical features which they further fed into a multiclass logis-tic classifier Duan et al [69] used first unsupervised deep learning and thencontinued with a couple of supervised labelling tasks Chen et al [67] applieda deep learning technique (stacked autoencoders) to discover the features butthen they still used traditional machine learning (SVM logistic regression) forthe image segmentation Unlike those methods we applied the deep learning inan end-to-end fashion ie from supervised feature extraction to the land classprediction This makes our approach more flexible robust and adaptable tothe SAR data from new regions as well as more efficient

When it comes to the end-to-end approaches for SAR classification thereare several studies where the focus was on a small area and on a specific landcover mapping task For instance Mohammadimanesh et al [70] used fullypolarimetric SAR (PolSAR) imagery from RADARSAT-2 to classify wetlandcomplexes for which they have developed a specifically tailored semantic seg-mentation model However the authors have tackled a small test area (around10km times 10km) and have not explored how their model generalizes to othertypes of areas Similarly Wang et al [71] adapted existing CNN models into afixed-feature-size CNN that they have evaluated on a small scale RADARSAT-2 or AIRSAR (ie airborne SAR data) In both cases they have used moreadvanced fully polarimetric SAR imagery at better resolution as opposed toSentinel-1 which means the imagery with more input information to the deeplearning models Importantly it is only Sentinel-1 that offers open operationaldata with up to every 6 days repeat Because of this the discussed approachesdeveloped and tested specifically for PolSAR imagery at a higher resolution can-not be considered applicable for a wide-area mapping yet Similarly Ahishaliet al [72] applied end-to-end approaches to SAR data They have also workedwith single polarized COSMO-SkyMed imagery However all the imagery theyconsidered was X-band SAR contrary to C-band imagery we use here and againonly on a small scale The authors proposed a compact CNN model that theyfound had outperformed some of the off-the-shelf CNN methods such as Xcep-

6

tion and Inception-ResNet-v2 It is important to note that compared to thosethe off-the-shelf models that we consider here are more sophisticated semanticsegmentation models some which employ Xception or ResNet but only as amodule in their feature extraction parts

In summary the capabilities of the deep learning approaches for the clas-sification have been investigated to a lesser extent for SAR imagery than foroptical imagery The attempts to use SAR data for land cover classificationwere relatively limited in scope area or the number of used SAR scenes Par-ticularly wide-area land cover mapping was never addressed The reasons forthis include comparatively poor availability of SAR data compared to optical(greatly changed since the advent of Sentinel-1) complex scattering mechanismsleading to ambiguous SAR signatures for different classes (which makes SARimage segmentation more difficult than the optical image segmentation [73])as well as the speckle noise caused by the coherent nature of the SAR imagingprocess

13 Study goals

Present study addresses the identified research gap of a lack of wide-arealand cover mapping using SAR data We achieve this by training fine-tuningand evaluating a set of suitable state-of-the-art deep learning models from theclass of semantic segmentation models and demonstrating their suitability forland cover mapping Moreover our work is the first to examine and demonstratethe suitability of deep learning for land cover mapping from SAR images on alarge-scale ie across the whole country

Specifically we applied the semantic segmentation models on the SAR im-ages taken over Finland We focused on the images of Finland because thereis the land cover mask of a suitable resolution that can be used for training la-bels (ie CORINE) The training is performed with the seven selected models(SegNet [74] PSPNet [75] BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80]FRRN-B [81] and FC-DenseNet [82]) which have encoder modules pre-trainedon the large RGB image corpus ImageNet 20122 Those models are freely avail-able3 In other words we reused semantic segmentation architectures developedfor natural images with pre-trained weights on RGB images and we fine-tunedthem on the SAR images Our results (with over 90 overall accuracy) demon-strate the effectiveness of the deep learning methods for the land cover mappingwith SAR data

In addition to having the high-resolution CORINE map that can serve asa ground-truth (labels) for training the deep learning models another reasonthat we selected Finland is that it is a northern country with frequent cloudcover which means that using optical imagery for wide-area mapping is oftennot feasible Hence demonstrating the usability of radar imagery for land covermapping is particularly useful here

2httpimage-netorgchallengesLSVRC20123httpsgithubcomtensorflowmodelstreemasterresearchslimpre-trained-models

7

Even though Finland is a relatively small country there is still considerableheterogeneity present in terms of land cover types and how they appear in theSAR images Namely SAR backscattering is sensitive to several factors thatlikely differ between countries or between distant areas within a country Ex-amples of such factors are moisture levels terrain variation and soil roughnesspredominant forest biome and tree species proportions types of shorter vege-tation and crops in agricultural areas and specific types of built environmentsWe did not contain our study to a particular area of Finland where the SARsignatures might be consistent but we obtained the images across a wide areaHence demonstrating the suitability of our methods in this setting hints at theirpotential generalizability Namely it means that similarly as we did here thesemantic segmentation models can be fine-tuned and adapted to work on datafrom other regions or countries with the different SAR signatures

On the other hand we took into account that the same areas will appearsomewhat different on the SAR images across different seasons Scattering char-acteristics of many land cover classes change considerably between the summerand winter months and sometimes even within weeks during seasonal changes[83 20] These include snow cover and melting freezethaw of soils ice onrivers and lakes crops growing cycle leaf-on and leaf-off conditions in decidu-ous trees Because of this in the present study we focused only on the scenesacquired during the summer season However we did allow our training datasetto contain several images of the same area taken during different times duringthe summer season This way not only spatial but also temporal variation ofSAR signatures is introduced

Our contributions can be summarised as follows

C1 We thoroughly benchmarked seven selected state-of-the-art semantic seg-mentation models covering a diverse set of approaches for land cover map-ping using Sentinel-1 SAR imagery We provide insights on the best mod-els in terms of both accuracy and efficiency

C2 Our results demonstrate the power of deep learning models along with SARimagery for accurate wide-area land cover mapping in the cloud obscuredboreal zone and polar regions

2 Deep Learning Terminology

As with other representation learning models the power of deep learningmodels comes from their ability to learn rich features (representations) fromthe dataset automatically [15] The automatically learned features are usu-ally better suited for the classifier or other task at hand than hand-engineeredfeatures Moreover thanks to a large number of layers employed it has beenproven that the deep learning networks can discover hierarchical representationsso that the higher level representations are expressed in terms of the lower levelsimpler ones For example in the case of images the low-level representationsthat can be discovered are edges and using them the mid-level ones can be

8

Table 1 Terminology for the main tasks in computer vision and its use in the deep learningversus remote sensing communities

Deep learn-ing

Remote sens-ing

Task description

Classification[13]

Image An-notationScene Under-standing SceneClassification

Assigning a whole image to a classbased on what is (mainly) repre-sented in it for example a ship oiltank sea or land

Object De-tectionLocalizationRecognition[15]

AutomaticTargetRecognition

Detecting (and localizing) presenceof particular objects in an imageThese algorithms can detect sev-eral objects in the given image Forinstance ship detection in SAR im-ages

SemanticSegmenta-tion [84]

Image Clas-sificationClustering

Assigning a class to each pixel in animage based on which image objector region it belongs to These algo-rithms not only detect and localizeobjects in the image but also out-put their exact areas and bound-aries

expressed such as corners and shapes and this helps to express the high-levelrepresentations such as object elements and their identities [15]

The deep learning models in computer vision can be grouped according totheir main task in three categories In Table 1 we provide a description forthose categories However the deep learning terminology for those tasks doesnot always correspond well to the terminology used in the remote sensing com-munity Relevant to our task a number of remote sensing studies uses the termclassification in the context of land cover mapping inherently meaning pixel- orregion-based classification which in the deep learning terminology correspondsto semantic segmentation In Table 1 we list the corresponding terminologythat we encountered being used for each task in both the deep learning and re-mote sensing communities This is helpful to disambiguate when talking aboutdifferent and recognize when talking about the same tasks in the two domainsIn the present study the focus is on land cover mapping Hence we tackle se-mantic segmentation in the deep learning terminology and image classificationie pixel-wise classification in the remote sensing terminology

Convolutional Neural Networks (CNNs) [12 13] are the deep learning model

9

that has transformed the computer vision field Initially CNNs are defined totackle the image classification (deep learning terminology) task Their structureis inspired by the visual perception of mammals [85] CNNs are named afterone of the most important operations which is particular to them comparedto other neural networks ie convolutions Mathematically a convolution is acombination of two other functions A convolution is applied on the image bysliding a filter (kernel) of a given size k times k which is usually small comparedto the original image size Different purpose filters are designed for examplea filter can serve as a vertical edge detector Application of such a convolutionoperation on an image results in a feature map Another common operationthat is usually applied after a convolution is pooling Pooling reduces the sizeof the feature map while providing robustness to the extracted features Com-mon CNNs end with a fully connected layer which is used for final predictionscommonly for image classification By employing a large number of convolu-tional layers (depth) CNNs are able to extract gradually more complex andabstract features The first CNN model to demonstrate their impressive effec-tiveness in image classification (of hand digits) was LeNet [12] Several yearslater Krizhevsky et al [13] developed AlexNet the deep CNN to dramaticallypush the limits of classification accuracy on the famous ImageNet computer vi-sion challenge [86] Since then a variety of CNN-based models are proposedSome notable examples are VGG network [14] ResNet [87] DenseNet [88] andInception V3 [89] The effectiveness of CNNs has been also proven in variousreal-world applications [90 91]

Once CNNs have proven their effectiveness to classify images Long et al[84] were the first to discover how they can augment a given CNN model tomake it suitable for the semantic segmentation task ndash they proposed the FullyConvolutional Neural Network (FCN) framework This generic architecture canbe used to adapt any CNN network used for classification into a segmentationmodel Namely the authors have shown that by replacing the last fully con-nected layer with an appropriate convolutions layer so that they will upsampleand restore the resolution of the input at the output layer CNNs can be trans-formed to classify each individual pixel (instead of the whole image) The basicidea is as follows The encoder is used to learn the feature maps and is usu-ally based on a pre-trained deep CNN for classification such as ResNet VGGor Inception The decoder part serves to upsample the discriminative featuresthat the encoder has learned from the coarse-level feature map to the fine pixellevel Long et al [84] have shown that this upsampling (backward) computa-tion can be efficiently performed using backward convolutions (deconvolutions)Moreover this means that the specific CNN models such as those mentionedabove can all be incorporated in the FCN framework for segmentation givingrise to FCN-AlexNet [84] FCN-ResNet [87] FCN-VGG16 [84] FCN-DenseNet[82] etc We present a diagram of the generic FCN architecture in Figure 1

10

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 7: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

tion and Inception-ResNet-v2 It is important to note that compared to thosethe off-the-shelf models that we consider here are more sophisticated semanticsegmentation models some which employ Xception or ResNet but only as amodule in their feature extraction parts

In summary the capabilities of the deep learning approaches for the clas-sification have been investigated to a lesser extent for SAR imagery than foroptical imagery The attempts to use SAR data for land cover classificationwere relatively limited in scope area or the number of used SAR scenes Par-ticularly wide-area land cover mapping was never addressed The reasons forthis include comparatively poor availability of SAR data compared to optical(greatly changed since the advent of Sentinel-1) complex scattering mechanismsleading to ambiguous SAR signatures for different classes (which makes SARimage segmentation more difficult than the optical image segmentation [73])as well as the speckle noise caused by the coherent nature of the SAR imagingprocess

13 Study goals

Present study addresses the identified research gap of a lack of wide-arealand cover mapping using SAR data We achieve this by training fine-tuningand evaluating a set of suitable state-of-the-art deep learning models from theclass of semantic segmentation models and demonstrating their suitability forland cover mapping Moreover our work is the first to examine and demonstratethe suitability of deep learning for land cover mapping from SAR images on alarge-scale ie across the whole country

Specifically we applied the semantic segmentation models on the SAR im-ages taken over Finland We focused on the images of Finland because thereis the land cover mask of a suitable resolution that can be used for training la-bels (ie CORINE) The training is performed with the seven selected models(SegNet [74] PSPNet [75] BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80]FRRN-B [81] and FC-DenseNet [82]) which have encoder modules pre-trainedon the large RGB image corpus ImageNet 20122 Those models are freely avail-able3 In other words we reused semantic segmentation architectures developedfor natural images with pre-trained weights on RGB images and we fine-tunedthem on the SAR images Our results (with over 90 overall accuracy) demon-strate the effectiveness of the deep learning methods for the land cover mappingwith SAR data

In addition to having the high-resolution CORINE map that can serve asa ground-truth (labels) for training the deep learning models another reasonthat we selected Finland is that it is a northern country with frequent cloudcover which means that using optical imagery for wide-area mapping is oftennot feasible Hence demonstrating the usability of radar imagery for land covermapping is particularly useful here

2httpimage-netorgchallengesLSVRC20123httpsgithubcomtensorflowmodelstreemasterresearchslimpre-trained-models

7

Even though Finland is a relatively small country there is still considerableheterogeneity present in terms of land cover types and how they appear in theSAR images Namely SAR backscattering is sensitive to several factors thatlikely differ between countries or between distant areas within a country Ex-amples of such factors are moisture levels terrain variation and soil roughnesspredominant forest biome and tree species proportions types of shorter vege-tation and crops in agricultural areas and specific types of built environmentsWe did not contain our study to a particular area of Finland where the SARsignatures might be consistent but we obtained the images across a wide areaHence demonstrating the suitability of our methods in this setting hints at theirpotential generalizability Namely it means that similarly as we did here thesemantic segmentation models can be fine-tuned and adapted to work on datafrom other regions or countries with the different SAR signatures

On the other hand we took into account that the same areas will appearsomewhat different on the SAR images across different seasons Scattering char-acteristics of many land cover classes change considerably between the summerand winter months and sometimes even within weeks during seasonal changes[83 20] These include snow cover and melting freezethaw of soils ice onrivers and lakes crops growing cycle leaf-on and leaf-off conditions in decidu-ous trees Because of this in the present study we focused only on the scenesacquired during the summer season However we did allow our training datasetto contain several images of the same area taken during different times duringthe summer season This way not only spatial but also temporal variation ofSAR signatures is introduced

Our contributions can be summarised as follows

C1 We thoroughly benchmarked seven selected state-of-the-art semantic seg-mentation models covering a diverse set of approaches for land cover map-ping using Sentinel-1 SAR imagery We provide insights on the best mod-els in terms of both accuracy and efficiency

C2 Our results demonstrate the power of deep learning models along with SARimagery for accurate wide-area land cover mapping in the cloud obscuredboreal zone and polar regions

2 Deep Learning Terminology

As with other representation learning models the power of deep learningmodels comes from their ability to learn rich features (representations) fromthe dataset automatically [15] The automatically learned features are usu-ally better suited for the classifier or other task at hand than hand-engineeredfeatures Moreover thanks to a large number of layers employed it has beenproven that the deep learning networks can discover hierarchical representationsso that the higher level representations are expressed in terms of the lower levelsimpler ones For example in the case of images the low-level representationsthat can be discovered are edges and using them the mid-level ones can be

8

Table 1 Terminology for the main tasks in computer vision and its use in the deep learningversus remote sensing communities

Deep learn-ing

Remote sens-ing

Task description

Classification[13]

Image An-notationScene Under-standing SceneClassification

Assigning a whole image to a classbased on what is (mainly) repre-sented in it for example a ship oiltank sea or land

Object De-tectionLocalizationRecognition[15]

AutomaticTargetRecognition

Detecting (and localizing) presenceof particular objects in an imageThese algorithms can detect sev-eral objects in the given image Forinstance ship detection in SAR im-ages

SemanticSegmenta-tion [84]

Image Clas-sificationClustering

Assigning a class to each pixel in animage based on which image objector region it belongs to These algo-rithms not only detect and localizeobjects in the image but also out-put their exact areas and bound-aries

expressed such as corners and shapes and this helps to express the high-levelrepresentations such as object elements and their identities [15]

The deep learning models in computer vision can be grouped according totheir main task in three categories In Table 1 we provide a description forthose categories However the deep learning terminology for those tasks doesnot always correspond well to the terminology used in the remote sensing com-munity Relevant to our task a number of remote sensing studies uses the termclassification in the context of land cover mapping inherently meaning pixel- orregion-based classification which in the deep learning terminology correspondsto semantic segmentation In Table 1 we list the corresponding terminologythat we encountered being used for each task in both the deep learning and re-mote sensing communities This is helpful to disambiguate when talking aboutdifferent and recognize when talking about the same tasks in the two domainsIn the present study the focus is on land cover mapping Hence we tackle se-mantic segmentation in the deep learning terminology and image classificationie pixel-wise classification in the remote sensing terminology

Convolutional Neural Networks (CNNs) [12 13] are the deep learning model

9

that has transformed the computer vision field Initially CNNs are defined totackle the image classification (deep learning terminology) task Their structureis inspired by the visual perception of mammals [85] CNNs are named afterone of the most important operations which is particular to them comparedto other neural networks ie convolutions Mathematically a convolution is acombination of two other functions A convolution is applied on the image bysliding a filter (kernel) of a given size k times k which is usually small comparedto the original image size Different purpose filters are designed for examplea filter can serve as a vertical edge detector Application of such a convolutionoperation on an image results in a feature map Another common operationthat is usually applied after a convolution is pooling Pooling reduces the sizeof the feature map while providing robustness to the extracted features Com-mon CNNs end with a fully connected layer which is used for final predictionscommonly for image classification By employing a large number of convolu-tional layers (depth) CNNs are able to extract gradually more complex andabstract features The first CNN model to demonstrate their impressive effec-tiveness in image classification (of hand digits) was LeNet [12] Several yearslater Krizhevsky et al [13] developed AlexNet the deep CNN to dramaticallypush the limits of classification accuracy on the famous ImageNet computer vi-sion challenge [86] Since then a variety of CNN-based models are proposedSome notable examples are VGG network [14] ResNet [87] DenseNet [88] andInception V3 [89] The effectiveness of CNNs has been also proven in variousreal-world applications [90 91]

Once CNNs have proven their effectiveness to classify images Long et al[84] were the first to discover how they can augment a given CNN model tomake it suitable for the semantic segmentation task ndash they proposed the FullyConvolutional Neural Network (FCN) framework This generic architecture canbe used to adapt any CNN network used for classification into a segmentationmodel Namely the authors have shown that by replacing the last fully con-nected layer with an appropriate convolutions layer so that they will upsampleand restore the resolution of the input at the output layer CNNs can be trans-formed to classify each individual pixel (instead of the whole image) The basicidea is as follows The encoder is used to learn the feature maps and is usu-ally based on a pre-trained deep CNN for classification such as ResNet VGGor Inception The decoder part serves to upsample the discriminative featuresthat the encoder has learned from the coarse-level feature map to the fine pixellevel Long et al [84] have shown that this upsampling (backward) computa-tion can be efficiently performed using backward convolutions (deconvolutions)Moreover this means that the specific CNN models such as those mentionedabove can all be incorporated in the FCN framework for segmentation givingrise to FCN-AlexNet [84] FCN-ResNet [87] FCN-VGG16 [84] FCN-DenseNet[82] etc We present a diagram of the generic FCN architecture in Figure 1

10

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 8: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Even though Finland is a relatively small country there is still considerableheterogeneity present in terms of land cover types and how they appear in theSAR images Namely SAR backscattering is sensitive to several factors thatlikely differ between countries or between distant areas within a country Ex-amples of such factors are moisture levels terrain variation and soil roughnesspredominant forest biome and tree species proportions types of shorter vege-tation and crops in agricultural areas and specific types of built environmentsWe did not contain our study to a particular area of Finland where the SARsignatures might be consistent but we obtained the images across a wide areaHence demonstrating the suitability of our methods in this setting hints at theirpotential generalizability Namely it means that similarly as we did here thesemantic segmentation models can be fine-tuned and adapted to work on datafrom other regions or countries with the different SAR signatures

On the other hand we took into account that the same areas will appearsomewhat different on the SAR images across different seasons Scattering char-acteristics of many land cover classes change considerably between the summerand winter months and sometimes even within weeks during seasonal changes[83 20] These include snow cover and melting freezethaw of soils ice onrivers and lakes crops growing cycle leaf-on and leaf-off conditions in decidu-ous trees Because of this in the present study we focused only on the scenesacquired during the summer season However we did allow our training datasetto contain several images of the same area taken during different times duringthe summer season This way not only spatial but also temporal variation ofSAR signatures is introduced

Our contributions can be summarised as follows

C1 We thoroughly benchmarked seven selected state-of-the-art semantic seg-mentation models covering a diverse set of approaches for land cover map-ping using Sentinel-1 SAR imagery We provide insights on the best mod-els in terms of both accuracy and efficiency

C2 Our results demonstrate the power of deep learning models along with SARimagery for accurate wide-area land cover mapping in the cloud obscuredboreal zone and polar regions

2 Deep Learning Terminology

As with other representation learning models the power of deep learningmodels comes from their ability to learn rich features (representations) fromthe dataset automatically [15] The automatically learned features are usu-ally better suited for the classifier or other task at hand than hand-engineeredfeatures Moreover thanks to a large number of layers employed it has beenproven that the deep learning networks can discover hierarchical representationsso that the higher level representations are expressed in terms of the lower levelsimpler ones For example in the case of images the low-level representationsthat can be discovered are edges and using them the mid-level ones can be

8

Table 1 Terminology for the main tasks in computer vision and its use in the deep learningversus remote sensing communities

Deep learn-ing

Remote sens-ing

Task description

Classification[13]

Image An-notationScene Under-standing SceneClassification

Assigning a whole image to a classbased on what is (mainly) repre-sented in it for example a ship oiltank sea or land

Object De-tectionLocalizationRecognition[15]

AutomaticTargetRecognition

Detecting (and localizing) presenceof particular objects in an imageThese algorithms can detect sev-eral objects in the given image Forinstance ship detection in SAR im-ages

SemanticSegmenta-tion [84]

Image Clas-sificationClustering

Assigning a class to each pixel in animage based on which image objector region it belongs to These algo-rithms not only detect and localizeobjects in the image but also out-put their exact areas and bound-aries

expressed such as corners and shapes and this helps to express the high-levelrepresentations such as object elements and their identities [15]

The deep learning models in computer vision can be grouped according totheir main task in three categories In Table 1 we provide a description forthose categories However the deep learning terminology for those tasks doesnot always correspond well to the terminology used in the remote sensing com-munity Relevant to our task a number of remote sensing studies uses the termclassification in the context of land cover mapping inherently meaning pixel- orregion-based classification which in the deep learning terminology correspondsto semantic segmentation In Table 1 we list the corresponding terminologythat we encountered being used for each task in both the deep learning and re-mote sensing communities This is helpful to disambiguate when talking aboutdifferent and recognize when talking about the same tasks in the two domainsIn the present study the focus is on land cover mapping Hence we tackle se-mantic segmentation in the deep learning terminology and image classificationie pixel-wise classification in the remote sensing terminology

Convolutional Neural Networks (CNNs) [12 13] are the deep learning model

9

that has transformed the computer vision field Initially CNNs are defined totackle the image classification (deep learning terminology) task Their structureis inspired by the visual perception of mammals [85] CNNs are named afterone of the most important operations which is particular to them comparedto other neural networks ie convolutions Mathematically a convolution is acombination of two other functions A convolution is applied on the image bysliding a filter (kernel) of a given size k times k which is usually small comparedto the original image size Different purpose filters are designed for examplea filter can serve as a vertical edge detector Application of such a convolutionoperation on an image results in a feature map Another common operationthat is usually applied after a convolution is pooling Pooling reduces the sizeof the feature map while providing robustness to the extracted features Com-mon CNNs end with a fully connected layer which is used for final predictionscommonly for image classification By employing a large number of convolu-tional layers (depth) CNNs are able to extract gradually more complex andabstract features The first CNN model to demonstrate their impressive effec-tiveness in image classification (of hand digits) was LeNet [12] Several yearslater Krizhevsky et al [13] developed AlexNet the deep CNN to dramaticallypush the limits of classification accuracy on the famous ImageNet computer vi-sion challenge [86] Since then a variety of CNN-based models are proposedSome notable examples are VGG network [14] ResNet [87] DenseNet [88] andInception V3 [89] The effectiveness of CNNs has been also proven in variousreal-world applications [90 91]

Once CNNs have proven their effectiveness to classify images Long et al[84] were the first to discover how they can augment a given CNN model tomake it suitable for the semantic segmentation task ndash they proposed the FullyConvolutional Neural Network (FCN) framework This generic architecture canbe used to adapt any CNN network used for classification into a segmentationmodel Namely the authors have shown that by replacing the last fully con-nected layer with an appropriate convolutions layer so that they will upsampleand restore the resolution of the input at the output layer CNNs can be trans-formed to classify each individual pixel (instead of the whole image) The basicidea is as follows The encoder is used to learn the feature maps and is usu-ally based on a pre-trained deep CNN for classification such as ResNet VGGor Inception The decoder part serves to upsample the discriminative featuresthat the encoder has learned from the coarse-level feature map to the fine pixellevel Long et al [84] have shown that this upsampling (backward) computa-tion can be efficiently performed using backward convolutions (deconvolutions)Moreover this means that the specific CNN models such as those mentionedabove can all be incorporated in the FCN framework for segmentation givingrise to FCN-AlexNet [84] FCN-ResNet [87] FCN-VGG16 [84] FCN-DenseNet[82] etc We present a diagram of the generic FCN architecture in Figure 1

10

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 9: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Table 1 Terminology for the main tasks in computer vision and its use in the deep learningversus remote sensing communities

Deep learn-ing

Remote sens-ing

Task description

Classification[13]

Image An-notationScene Under-standing SceneClassification

Assigning a whole image to a classbased on what is (mainly) repre-sented in it for example a ship oiltank sea or land

Object De-tectionLocalizationRecognition[15]

AutomaticTargetRecognition

Detecting (and localizing) presenceof particular objects in an imageThese algorithms can detect sev-eral objects in the given image Forinstance ship detection in SAR im-ages

SemanticSegmenta-tion [84]

Image Clas-sificationClustering

Assigning a class to each pixel in animage based on which image objector region it belongs to These algo-rithms not only detect and localizeobjects in the image but also out-put their exact areas and bound-aries

expressed such as corners and shapes and this helps to express the high-levelrepresentations such as object elements and their identities [15]

The deep learning models in computer vision can be grouped according totheir main task in three categories In Table 1 we provide a description forthose categories However the deep learning terminology for those tasks doesnot always correspond well to the terminology used in the remote sensing com-munity Relevant to our task a number of remote sensing studies uses the termclassification in the context of land cover mapping inherently meaning pixel- orregion-based classification which in the deep learning terminology correspondsto semantic segmentation In Table 1 we list the corresponding terminologythat we encountered being used for each task in both the deep learning and re-mote sensing communities This is helpful to disambiguate when talking aboutdifferent and recognize when talking about the same tasks in the two domainsIn the present study the focus is on land cover mapping Hence we tackle se-mantic segmentation in the deep learning terminology and image classificationie pixel-wise classification in the remote sensing terminology

Convolutional Neural Networks (CNNs) [12 13] are the deep learning model

9

that has transformed the computer vision field Initially CNNs are defined totackle the image classification (deep learning terminology) task Their structureis inspired by the visual perception of mammals [85] CNNs are named afterone of the most important operations which is particular to them comparedto other neural networks ie convolutions Mathematically a convolution is acombination of two other functions A convolution is applied on the image bysliding a filter (kernel) of a given size k times k which is usually small comparedto the original image size Different purpose filters are designed for examplea filter can serve as a vertical edge detector Application of such a convolutionoperation on an image results in a feature map Another common operationthat is usually applied after a convolution is pooling Pooling reduces the sizeof the feature map while providing robustness to the extracted features Com-mon CNNs end with a fully connected layer which is used for final predictionscommonly for image classification By employing a large number of convolu-tional layers (depth) CNNs are able to extract gradually more complex andabstract features The first CNN model to demonstrate their impressive effec-tiveness in image classification (of hand digits) was LeNet [12] Several yearslater Krizhevsky et al [13] developed AlexNet the deep CNN to dramaticallypush the limits of classification accuracy on the famous ImageNet computer vi-sion challenge [86] Since then a variety of CNN-based models are proposedSome notable examples are VGG network [14] ResNet [87] DenseNet [88] andInception V3 [89] The effectiveness of CNNs has been also proven in variousreal-world applications [90 91]

Once CNNs have proven their effectiveness to classify images Long et al[84] were the first to discover how they can augment a given CNN model tomake it suitable for the semantic segmentation task ndash they proposed the FullyConvolutional Neural Network (FCN) framework This generic architecture canbe used to adapt any CNN network used for classification into a segmentationmodel Namely the authors have shown that by replacing the last fully con-nected layer with an appropriate convolutions layer so that they will upsampleand restore the resolution of the input at the output layer CNNs can be trans-formed to classify each individual pixel (instead of the whole image) The basicidea is as follows The encoder is used to learn the feature maps and is usu-ally based on a pre-trained deep CNN for classification such as ResNet VGGor Inception The decoder part serves to upsample the discriminative featuresthat the encoder has learned from the coarse-level feature map to the fine pixellevel Long et al [84] have shown that this upsampling (backward) computa-tion can be efficiently performed using backward convolutions (deconvolutions)Moreover this means that the specific CNN models such as those mentionedabove can all be incorporated in the FCN framework for segmentation givingrise to FCN-AlexNet [84] FCN-ResNet [87] FCN-VGG16 [84] FCN-DenseNet[82] etc We present a diagram of the generic FCN architecture in Figure 1

10

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 10: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

that has transformed the computer vision field Initially CNNs are defined totackle the image classification (deep learning terminology) task Their structureis inspired by the visual perception of mammals [85] CNNs are named afterone of the most important operations which is particular to them comparedto other neural networks ie convolutions Mathematically a convolution is acombination of two other functions A convolution is applied on the image bysliding a filter (kernel) of a given size k times k which is usually small comparedto the original image size Different purpose filters are designed for examplea filter can serve as a vertical edge detector Application of such a convolutionoperation on an image results in a feature map Another common operationthat is usually applied after a convolution is pooling Pooling reduces the sizeof the feature map while providing robustness to the extracted features Com-mon CNNs end with a fully connected layer which is used for final predictionscommonly for image classification By employing a large number of convolu-tional layers (depth) CNNs are able to extract gradually more complex andabstract features The first CNN model to demonstrate their impressive effec-tiveness in image classification (of hand digits) was LeNet [12] Several yearslater Krizhevsky et al [13] developed AlexNet the deep CNN to dramaticallypush the limits of classification accuracy on the famous ImageNet computer vi-sion challenge [86] Since then a variety of CNN-based models are proposedSome notable examples are VGG network [14] ResNet [87] DenseNet [88] andInception V3 [89] The effectiveness of CNNs has been also proven in variousreal-world applications [90 91]

Once CNNs have proven their effectiveness to classify images Long et al[84] were the first to discover how they can augment a given CNN model tomake it suitable for the semantic segmentation task ndash they proposed the FullyConvolutional Neural Network (FCN) framework This generic architecture canbe used to adapt any CNN network used for classification into a segmentationmodel Namely the authors have shown that by replacing the last fully con-nected layer with an appropriate convolutions layer so that they will upsampleand restore the resolution of the input at the output layer CNNs can be trans-formed to classify each individual pixel (instead of the whole image) The basicidea is as follows The encoder is used to learn the feature maps and is usu-ally based on a pre-trained deep CNN for classification such as ResNet VGGor Inception The decoder part serves to upsample the discriminative featuresthat the encoder has learned from the coarse-level feature map to the fine pixellevel Long et al [84] have shown that this upsampling (backward) computa-tion can be efficiently performed using backward convolutions (deconvolutions)Moreover this means that the specific CNN models such as those mentionedabove can all be incorporated in the FCN framework for segmentation givingrise to FCN-AlexNet [84] FCN-ResNet [87] FCN-VGG16 [84] FCN-DenseNet[82] etc We present a diagram of the generic FCN architecture in Figure 1

10

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 11: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Figure 1 The architecture of Fully Convolutional Neural Networks (FCNs) [84]

3 Materials and methods

Here we first describe the study site SAR and reference data This isfollowed by an in-depth description of the deep learning terminology and themodels used in the study We finish with the description of the experimentalsetup and the evaluation metrics

31 Study site

Our study site is covering the area of Finland at latitudes from 61to 675The processed area is shown in Figure 2 The study area includes central andnorthern areas of Finland covered primarily by boreal forestland with inclusionsof water bodies (primarily lakes) urban settlements and agricultural areas aswell as marshland and open bogs We have omitted Lapland due to potentialsnow cover during the months of data acquisition The terrain height variationis moderate and mostly within 100minus 300 meters range

32 SAR data

Presently Sentinel-1 is a C-band SAR dual-satellite system with two satel-lites orbiting 180 apart [11] launched in 2014 and 2016 respectively Theoperational acquisition modes are Stripmap (SM) Interferometric Wide-Swath(IW) Extra Wide Swath (EW) and Wave Mode (WV) The IW-mode is thedefault mode over land providing 250 km wide swath composed of three sub-swaths with single look image of at 5 m by 20 m spatial resolution It uses theso-called TOPS (Terrain Observation with Progressive Scan) SAR mode

The SAR data acquired by Sentinel-1 satellites in IW mode are used in thestudy Altogether 14 Sentinel-1A images acquired during summers of 2015 and2016 were used in the study more concretely during June July and August inthose two years Their geographical coverage is schematically shown in Figure2

11

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 12: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Figure 2 Study area in Finland with reference CORINE land cover data and schematiclocation of areas used for model training and accuracy assessment

12

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 13: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Original scenes were downloaded as Level-1 Ground Range Detected (GRD)products They represent focused SAR data that has been detected multi-looked and projected to ground-range using an Earth ellipsoid They wereorthorectified in ESA SNAP S1TBX software using local digital terrain model(with 2 meters resolution) available from National Land Survey of Finland Thepixel spacing of ortho-rectified scenes was set to 20 meters Orthorectificationincluded terrain flattening to obtain backscatter in gamma-nought format [92]The scenes were further re-projected to the ERTS89 ETRS-TM35FIN projec-tion (EPSG3067) and resampled to a final pixel size of 20 metres

33 Reference data

In Finland the Finnish Environment Institute (SYKE) is responsible forproduction of the CORINE maps While for most of the EU territory theCORINE mask of 100m times 100m spatial resolution is available the nationalinstitutions might choose to create more precise maps and SYKE in particularhad produced a 20mtimes20m spatial resolution mask for Finland (Figure 3) Sincethen the updates have been produced regularly the latest one at the time ofthis study which we used being CLC2012 There are 48 different land useclasses in the map that can be hierarchically grouped into 4 CLC Levels Indetail there are 30 classes on CLC Level-3 15 classes on CLC Level-2 and 5top CLC Level-1 classes According to the information provided by SYKE theaccuracy of the CLC Level-3 is 61 of the CLC Level-3 83 and of the CLCLevel-1 it is 93 The selected classes and their corresponding color codes usedfor our segmentation results are shown in Table 2

Until the most recent CORINE production round EEA member countriesadopted national approaches for the production of CORINE EEA TechnicalGuidelines include manual digitalization of land cover change based on visualinterpretation of optical satellite imagery In Finland the European CLC wasnot applicable for the majority of national users due to large minimal map-ping unit (MMU) Thus national version was produced with somewhat mod-ified nomenclature of classes [93 94] The national high-resolution CLC2012data is in raster format of 20 m with corresponding MMU In the provision of2012 update of CLC obtaining optical imagery over Scandinavia and Britainwas particularly challenging because of the frequent clouds thus calling for theuse of radar imagery to meet user requirements on accuracy and coverage [31]CORINE map itself is built from high resolution satellite images acquired pri-marily during the summer and to a smaller extent during the spring months[2]

34 Semantic Segmentation Models

We selected following seven state-of-the-art [95] semantic segmentation mod-els to test for our land cover mapping task SegNet [74] PSPNet [75] BiSeNet[76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet[82] The models were selected to cover a wide set of approaches to semanticsegmentation In the following we describe its specific architecture for each

13

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 14: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Table 2 CORINE CLC Level-1 classes and their color codes used in our classification resultsclass R G B colorWater bodies (500) 0 191 255 bluePeatland bogs and marshes (400) 173 216 230 light blueForested areas (300) 127 255 0 greenAgricultural areas (200) 222 184 135 brownUrban fabric (100) 128 0 0 red

Figure 3 Zoomed in area fragment with our reference data ie CORINE shown on top (left)along with the Google Earth layer (right)

of these DL models We will use the following common abbreviations conv

for convolution operation concat for concatenation max pool for max pool-ing operation BN for batch normalisation and ReLU for the rectified linear unitactivation function

341 BiSeNet (Bilateral Segmentation Network)

BiSeNet model is designed to decouple the functions of encoding additionalspatial information and enlarging the receptive field which are fundamental toachieving good segmentation performance As can be seen in Figure 4 there aretwo main components to this model Spatial Path (SP) and Context Path (CP)Spatial Path serves to encode rich spatial information Context Path serves toprovide sufficient receptive field and uses global average pooling and pre-trainedXception [96] or ResNet [87] as the backbone The goal of the creators was notonly to obtain superior performance but to achieve a balance between the speedand performance Hence BiSeNet is a relatively fast semantic segmentationmodel

342 SegNet (Encoder-Decoder-Skip)

Similarly to BiSeNet SegNet is also designed with computational perfor-mance in mind this time particularly during inference Because of this thenetwork has a significantly smaller number of trainable parameters compared tomost of the other architectures The encoder in SegNet is based on VGG16 itconsists of its first 13 convolutional layers while the fully connected layers are

14

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 15: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Figure 4 The architecture of BiSeNet ARM stands for the Attention Refinement Moduleand FFM for the Feature Fusion Module introduced in the modelrsquos paper [76]

Figure 5 The architecture of SegNet-based Encoder-Decoder with Skip connections [74] Bluetiles represent Convolution + Batch Normalisation + ReLU green tiles represent Pooling redndash Upsampling and yellow ndash a softmax operation

omitted Hence the novelty of this network lies in its decoder part as followsThe decoder consists of one decoder layer for each encoder layer and so it alsohas 13 layers Each individual decoder layer utilizes max-pooling indices mem-orized from its corresponding encoder feature map The authors have showedthat this enhances boundary delineation between classes Finally the decoderoutput is sent to a multi-class soft-max function yielding classification for each

15

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 16: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

pixel (see Figure 5)

343 Mobile U-Net

Mobile U-Net is based on the U-Net [97] semantic segmentation architec-ture shown in Figure 6 In designing U-Net Fully Convolutional approach wasgenerally employed with a following modification Their upsampling part of thearchitecture has no fully convolutional layer but is nearly symmetrical to thefeature extraction part due to the use of the similar feature maps This resultsin a u-shaped architecture (see Figure 6) and hence the name of the modelWhile originally developed for biomedical images the U-net architecture hasproven successful for image segmentation in other domains as well Here wesomewhat modify the U-Net architecture according to MobileNets [80] frame-work to improve its efficiency In particular the MobileNets framework usesDepthwise Separable Convolutions a form which factorizes standard convolu-tions (eg 3times3) into a depthwise convolution (applied separately to each inputband) and a pointwise (1times 1) convolution to combine the outputs of depthwiseconvolution

Figure 6 The architecture of U-Net [97]

344 DeepLab-V3+

DeepLab-V3+ [77] is an improved version of DeepLab-V3 [98] while thelatter is an improved version the original DeepLab [78] model This segmen-tation model does not follow the FCN framework like the previously discussed

16

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 17: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

models The main features that distinguish the DeepLab model from FCNs arethe atrous convolutions for upsampling and the application of probabilistic ma-chine learning models concretely conditional random fields (CRFs) for a finerlocalization accuracy in the final fully connected layer Atrous convolutionsin particular allow to enlarge the context from which the next layer featuremaps are learned while preserving the number of parameters (and thus thesame efficiency) Using a chain of atrous convolutions allows to compute thefinal output layer of a CNN at an arbitrarily high resolution (removing the needfor the upsampling part as used in FCNs) In the follow up work proposingDeepLab-V3 Chen et al [98] change the approach to atrous convolutions togradually double the atrous rates and show that with an adapted version theirnew algorithm outperforms the previous one even without including the fullyconnected CRF layer Finally in their newest adaption to the model calledDeepLab-V3+ Chen et al [77] turn to a similar approach to the FCNs iethey add a decoder module to the architecture (see Figure 7) That is theyemploy the features extracted by the DeepLab-V3 module in the encoder partand add the decoder module consisting of 1times 1 and 3times 3 convolutions

Figure 7 The architecture of DeepLabV3+ [77]

345 FRRN-B (Full-Resolution Residual Networks)

As we have seen most of the semantic segmentation architectures are basedon some form a FCN and so they utilize existing classification networks suchon ResNet or VGG16 as encoders We also discussed the main reason for such

17

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 18: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Figure 8 The architecture of FRRN-B RU n and FRRU n stand for residual units and full-resolution residual units with n-channel convolutions respectively FRRUs simultaneouslyoperate on the two streams [81]

approaches which is to take advantage of the learned weights from those archi-tectures pretrained for the classification task Nevertheless one disadvantageof the FCN approach is that the resulting network outputs of the encoder part(particularly after the pooling operations) are at a lower resolution which de-teriorates localization performance of the overall segmentation model Pohlenet al [81] proposed to tackle this by having two parallel network streams pro-cessing the input image a pooling and a residual stream (Figure 8) As thename says the pooling stream performs successive pooling and then unpoolingoperations and it serves to obtain good recognition of the objects and classesThe residual stream computes residuals at the full image resolution which en-ables that low level features ie object pixel-level locations are propagated tothe network output The name of the model comes from its building blocksie full-resolution residual units Each such a unit simultaneously operates onthe pooling and the residual stream In the original paper [81] the authors pro-pose two alternative architecture FRRN-A and FRRN-B and they show that

18

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 19: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

FRRN-B achieves superior performance on the Cityscapes benchmark datasetHence we employ the FRRN-B architecture

346 PSPNet (Pyramid Scene Parsing Network)

Figure 9 The architecture of PSPNet [75]

Zhao et al [75] propose the Pyramid Scene Parsing as a solution to thechallenge of making the local predictions based on the local context only andnot considering the global image scene In remote sensing an example for thischallenge happening could be when a model wrongly predicts the water withwaves present in it as the dry vegetation class because they appear similar andthe model did not consider that these pixels are being part of a larger watersurface ie it missed the global context In similarity to the other FCN-basedapproaches PSPNet uses a pre-trained classification architecture to extract thefeature map in this case ResNet The main module of this network is thepyramid pooling which is enclosed by a square in Figure 9 As can be seen inthe Figure this module fuses features at four scales from the coarse (red) tothe fine (green) Hence the output of each level in the pyramid pooling modulecontains the feature map of a different resolution In the end the differentfeatures are stacked together yielding the final pyramid pooling global featurefor predictions

347 FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN [88]as a basis for the encoder followed by applying the FCN approach [82] Thespecificity of the DenseNet architecture is the presence of blocks where eachlayer is connected to all other layers in a feed-forward manner Figure 10 showsthe architecture of FC-DenseNet where the blocks are represented by the DenseBlock units According to [88] such architecture scales well to hundreds of layerswithout any optimization issues while yielding excellent results in classificationtasks In order to efficiently upsample the DenseNet feature maps Jegou etal [82] substitute the upsampling convolutions of FCNs by Dense Blocks andTransitions Up The Transition Up modules consist of transposed convolutionswhich are then concatenated with the outputs from the input skip connection(the dashed lines in Figure 10)

19

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 20: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Figure 10 The architecture of FC-DenseNet [82]

35 Training approach

To accomplish better segmentation performance there is an option to pre-train the semantic segmentation models (in particular their encoder modules)using a larger set of available images of another type (such as natural images)Using the model pre-trained with natural images to continue training with thelimited set of SAR images the knowledge becomes effectively transferred fromthe natural to the SAR task [99] To accomplish such transfer we used themodels whose encoders were pre-trained for the ImageNet classification taskand fine-tuned them using our SAR dataset (described next)

36 Experimental Setup

In this section we describe first how we prepared the SAR images for trainingwith the deep learning models which are designed for natural images Then weprovide the details of our implementation

361 SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels each of them beinginformative about certain types of land cover Hence using their combinationis expected to yield better land cover mapping results than using any of themindependently Moreover the previous work has shown the benefits of also using

20

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 21: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

the DEM model for land cover mapping [9] Hence as the third layer we usedthe DEM of Finland from the National Land Survey

SAR backscatter for both polarizations was converted to decibels by applyingthe 10 middot log10 transformation In addition for the deep learning models eachband should be normalized so that the distribution of the pixel values wouldresemble a Gaussian distribution centered at zero This is done to yield afaster convergence during the training To normalize the data each pixel valueis subtracted by the mean of all pixels and then divided by their standarddeviation In addition given that the semantic segmentation models expectpixel values in the range (0255) we scaled the normalized data and also theDEM values to this range Such preprocessed layers are then used to create theimage dataset for training

We named the created dataset SAR RGB-DEM The naming comes fromthe process used to create the images in this dataset Namely one of the twochannels of a Sentinel-1 image is assigned to R and the other to G channel Forthe third B channel we use the DEM layer

362 TrainDevelopment and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into 512pxtimes512px partial images (further in the text called imagelets) for training Thuseach imagelet represented an area of roughly 10times10km2 The first reason for thispreprocessing is about the squared shape some of the selected models requiredsquared-shaped images Some other of the models were flexible with the imageshape and size but we wanted to make the setups for all the models the sameso that their results are comparable The second reason for the preprocessing isabout the computational capacity with our hardware setup (described below)this was the largest image size that we could work with

Upon splitting the SAR RGB-DEM images we discarded those imageletsthat were completely outside the land mass area as well as those for which wedid not have a complete CORINE label (such as if they fell in part outside theFinnish borders) This resulted in more than 7K imagelets of size 512pxtimes512px

Given the geography of Finland to have representative training data itseems useful to include imagelets from both northern and southern (includingthe large cities) parts of the country into the model training On the otherhand some noticeable differences are found also in the gradient from east towest of the country To achieve representative training dataset we selectedall imagelets between the longitudes of 24and 28for the accuracy assessment(model testing) and all the other imagelets for the model training (that istraining and development in the computer vision terminology) In this way weprevented the situation in which two images of the same area but acquired atdifferent times are used one for training and the other one for testing Imagesthat were overlapping any border of the introduced strip were discarded Theprocedure resulted in 3104 images in the training and development set and 3784images in the test (accuracy assessment) set Finally we used 60 from thetraining and development set for training and the rest for development of thedeep learning models

21

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 22: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Table 3 The properties of the examined semantic segmentation architectures

Architecture Base model ParametersBiSeNet ResNet101 2475MSegNet VGG16 3497MMobile U-Net Not applicable 887MDeepLabV3+ ResNet101 4796MFRRN-B ResNet101 2475MPSPNet ResNet101 56MFC-DenseNet ResNet101 927M

363 Data Augmentation

Further we have employed the data augmentation technique The mainidea behind the data augmentation is to enable improved learning by reusingoriginal images with slight transformations such as rotation flipping addingGaussian noise or slightly changing the brightness This provides additionalinformation to the model and the dataset size is effectively increased More-over an additional benefit of the data augmentation is in helping the model tolearn some invariant data properties for which no examples are present in theoriginal dataset Given the sensitivity of the SAR backscatter we did not wantto augment the images in terms of the color brightness or by adding noiseHowever we could safely employ rotations and flipping For rotations we onlyused the 90increments giving three possible rotated versions of an image Forimage flipping we applied horizontal and vertical flipping or both at the sametime giving another three possible versions of the original image4 Notice thatour images are square so the transformations did not change the image dimen-sions Finally we applied the online augmentation as opposite to the offlineversion In the online process each augmented image is seen only once and sothis process yields a network that generalises better

364 Implementation

To apply the described semantic segmentation models we adapted the open-source Semantic Segmentation Suite We used Python with TensorFlow [100]backend

365 Hardware and Training Setup

We trained and tested separately each of the deep learning models on a singleGPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM

We used the RMSProp optimisation algorithm learning rate of 00001 anddecay of the learning rate of 09954 Each model was trained for an equalnumber of epochs = 500 and during the process the checkpoint for the best

4Vertical flip operation switches between top-left and bottom-left image origin (reflectionalong the central horizontal axis) and horizontal flip switches between top-left and top-rightimage origin (reflection along the central vertical axis)

22

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 23: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

model was saved Then we used that model for evaluation on the test set andwe report those results

37 Evaluation Metrics

In the review on the metrics used in land cover classification Costa et al[101] have found a lack of consistency complicating intercomparison of differentstudies To avoid such issues and ensure that our results are easily compara-ble with the literature we thoroughly evaluated our models For each modeland class we report the following measures of accuracy precision also knownas producerrsquos accuracy (PA) recall also known as userrsquos accuracy (UA) andoverall accuracy and Kappa coefficient The formulas are as follows

For each segmentation class (land cover type) c we calculate precision (pro-ducerrsquos accuracy)

Pc =Tpc

Tpc + Fpc

and recall (userrsquos accuracy)

Rc =Tpc

Tpc + Fnc

where Tpc represents true positive Fpc false positive and Fnc false negativepixels for the class c

When it comes to accuracy [102] we calculate per class accuracy 5

Accc =Cii

Gi

and overall pixel accuracy

AccOP =ΣL

i=1Cii

ΣLi=1Gi

where Cij is the number of pixels having a ground truth label i and beingclassifiedpredicted as j Gi is the total number of pixels labelled with i and Lis the number of classes All these metrics can take values from 0 to 1

Finally we also use a Kappa statistic (Cohenrsquos measure of agreement) indi-cating how the classification results compare to the values assigned by chance[103] Kappa statistics can take values from 0 to 1 Starting from a k by kconfusion matrix with elements fij following calculations are done

5Effectively per class accuracy is defined as the recall obtained on each class

23

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 24: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Po =1

N

ksumj=1

fjj (1)

ri =

ksumj=1

fij foralli and cj =

ksumi=1

fij forallj (2)

Pe =1

N2

ksumi=1

rici (3)

where Po the observed proportional agreement (effectively the overall accuracy)ri and cj are the row and column totals for classes i and j and Pe is theexpected proportion of agreement The final measure of agreement is given bysuch statistic [103]

κ =Po minus Pe

1minus Pe (4)

Depending on the value of Kappa the observed agreement is considered aseither poor (00 to 02) fair (02 to 04) moderate (04 to 06) good (06 to 08)or very good (08 to 10)

4 Results and Discussion

Using the experimental setup described in previous section we evaluatedthe seven selected semantic segmentation models SegNet [74] PSPNet [75]BiSeNet [76] DeepLabV3+ [77 78] U-Net [79 80] FRRN-B [81] and FC-DenseNet [82] The overall classification performance statistics for all studiedmodels is gathered in Table 4 Figure 11 shows maps produced for severalimagelets with the best performing model FC-DenseNet Obtained results arecompared to prior work and classification performance for different land coverclasses is discussed further

41 Classification Performance

All the models performed relatively well in terms of classification achiev-ing the overall accuracy above 83 Three models performed particularly wellachieving the accuracy score above 89 SegNet FRRN-B and the best per-forming model FC-DenseNet which achieved the accuracy of 907

Before further analysis let us recall that CORINE is not exclusively a landcover map but rather land cover and land use map thus for specific classescan differ from ecological classes observed by Sentinel-1 Also the aggregationto Level-1 is sometimes not strictly ldquoecologicalrdquo or complies to physics surfacescattering considerations For example roads airports major industrial areasand road network often exhibit areas similar to field presence of trees and green

24

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 25: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Table 4 Summary of the classification performance and efficiency of various Deep Learningmodels (UA-userrsquos accuracy PA - producerrsquos accuracy average inference time is per image inthe dataset)

LC

classes

Test

scale

(km

2)

Accura

cy(U

APA

)

BiS

eNet

Dee

pL

ab

V3+

Seg

Net

FR

RN

-BU

-Net

PS

PN

etF

C-D

ense

Net

Urb

anfa

bri

c(1

00)

1081

626

21

15

14

3631

38

30

45

25

38

18

62

27

Agr

icu

ltu

ral

area

s(2

00)

2516

049

51

50

49

69

66

68

68

66

66

53

48

7271

For

este

dar

eas

(300

)28

5462

90

91

88

96

93

94

92

95

92

95

89

95

9396

Pea

tlan

d

bog

san

dm

arsh

es(4

00)

2099

054

43

56

13

67

57

71

55

70

52

65

31

7458

Wat

erb

od

ies

(500

)53

564

85

91

94

92

96

96

9596

96

96

94

94

96

96

Overa

llAccura

cy(

)838

6854

9890

3892

7892

5865

19066

Kappa

06

41

06

49

07

54

07

58

07

54

06

80

0785

Avera

geinfere

ncetime(s)

00

389

00267

00

761

01

424

00

848

00

495

01

930

25

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 26: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

Table 5 Confusion matrix for classification with FC-DenseNet modelFC-DenseNet103

CLC2012 Sentinel-1 classurban water forest field peatland total PA

1 7301999 413073 15892771 3212839 221476 27042158 2702 78331 128294872 3457634 171029 1935276 133937142 9583 3663698 2703632 686788977 12795703 7730444 713682454 9624 766200 121609 16527970 44866048 620934 62902761 7135 56097 1866020 19164137 1091008 30309189 52486451 578

total 11866325 133399206 741831489 62136627 40817319 990050966UA 615 962 926 722 743 907

Figure 11 Illustration of the FC-DenseNet model performance selection of classificationresults ie direct output of the network without any post-processing (bottom row) versusreference Corine data (upper row)

vegetation near summer cottages can cause them exhibit signatures close to for-est rather than urban sometimes forest on the rocky terrain can be misclassifiedas urban instead due to presence of very bright targets and strong disruptivefeatures while confusion between peatland and field areas is also often a com-mon place Finally the accuracy of the CORINE data is only somewhat higherthan 90

As for the results across the different land classes all the models performedparticularly well in recognising the water bodies and forested areas while theurban fabric represented the most challenging class for all the models We ex-pect that the inclusion of the DEM as one layer in the training images hashelped to achieve good results on the water bodies class for most of the mod-els (except for BiSeNet all the models achieved both the user and produceraccuracy above 90) The urban class was particularly challenging for the fol-lowing main reasons First this class changes the most as new houses roadsand urban areas are built While we took the most suitable available CORINEclass in terms of time for our Sentinel-1 images there are almost certain dif-

26

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 27: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

ferences between the urban class as it was in 2012 and in 2015-2016 Secondthe CORINE map itself does not have a perfect accuracy neither aggregationrules are perfect As a matter of fact in majority of studies where SAR basedclassification was done versus CLC or similar data a poor or modest overallagreement was observed for this class [21 41 83 20] while the userrsquos accuracywas strongly higher than producerrsquos [104] The latter is exactly due to radarbeing able to sense sharp boundaries and bright targets very well whereas suchbright targets often donrsquot dominate the whole CORINE Level-1 urban classWe argue that any inaccuracies present will be particularly attenuated in ourmodels for the urban class because of the sharp and sudden boundary changesin this class unlike for the others such as forest and water The top performingmodel ie FC-DenseNet performed the best across all the classes It is par-ticularly notable that it achieved the user accuracy ie precision for the urbanclass of 62 improving on it significantly compared to all the other modelsNevertheless its score on the producer accuracy ie recall on this class of 27is outperformed by the two other top models ie SegNet and FRRN-B

We mentioned the issues of SAR backscattering sensitivity to several groundfactors so that the same classes might appear differently on the images betweencountries or between distant areas within a country An interesting indicationof our study however is that the deep learning models might be able to dealwith this issue Namely we used the models pre-trained on ImageNet and finetuned them with a relatively small number (14) of Sentinel-1 scenes The modelslearned to recognize varying types of the backscattering signal across the countryof Finland This indicates that with a similar type of fine-tuning present modelscould be relatively easily adapted to the other areas and countries with differentSAR backscattering patterns Such robustness and adaptability of the deeplearning models come from their automatic learning of feature representationwithout the need for a human pre-defining those features

42 Computational Performance

The training times with our hardware configuration took from 6 days up to 2weeks for the different models This could be significantly improved by trainingeach model using a multi-GPU system instead of a single-GPU as we did

In terms of the inference time we also saw the differences in the perfor-mance In Table 4 we present the average inference time per the 512pxtimes512pximagelets that we worked with The results show that there is a trade-off be-tween classification and computational performance the best models in terms ofclassification results (ie FC-DenseNet and FRRN-B) take several times longerinference time compared to the rest Depending on the application this mightnot be of particular importance

43 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on landcover classification with SAR data [83 20 21 41 28 31] Depending on the levelof classes aggregation (4-5 major classes or more) with using mostly statistical

27

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 28: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

or classical machine learning approaches reported classification accuracies wereas high as 80-87 to as low as 30 when only SAR imagery were used

Two recent studies that employed neural networks to SAR imagery classifi-cation (albeit in combination with satellite optical data) for land cover mappingwere [28] and [66] with reported classification accuracies of up to 975 and946 respectively

The best model in our experiments achieved the overall accuracy of 907However our results are obtained using solely the SAR imagery In contrastSAR imagery (PALSAR) alone yielded the overall accuracy of 781 in [28]The types of classes they studied are also different compared to ours (cropsversus vegetation versus land cover types) and our study is performed on alarger area Importantly the previous studies have applied different types ofmodels (regular NNs versus CNN versus semantic segmentation) In particularthe CNN models work on the 7 times 7 resolution windows while we have appliedmore advanced semantic segmentation models which work on the level of apixel Keeping in mind findings from [28] that the addition of optical imageson top of SAR improved the results for over 10 we expect that our modelswould perform comparably well or outperform these previous works if appliedto a combined SAR and optical imagery

In terms of the deep learning setup the most similar to ours are the studies[53] and [70] However RapidEye optical imagery at 5 m spatial resolutionwas used in [53] and the test site was considerably smaller Study [70] similarto our research relied exclusively on SAR imagery however fully polarimetricimages and acquired by RADARSAT-2 at considerably better resolution Theyhave developed an FCN-type of a semantic segmentation model lsquospecificallydesigned for the classification of wetland complexes using PolSAR imageryrsquoUsing this model to classify eight wetland map classes they achieved the overallaccuracy of 93 However because their model is designed specifically forwetland complexes it is not clear if such a model would generalize to othertypes of areas Compared to our study they have focused on a considerablysmaller area (nearly the size of a single imagelet we used) and on a very specifictask (wetland types mapping) Thus it is not readily clear how general theirapproach is and how it compares to our presented approach

44 Outlook and Future Work

There are several lines for potential improvement based on the results of thisstudy as well as future work directions

First using even a larger set of Sentinel-1 images can be recommended sincefor the supervised deep learning models large amounts of data are crucial Herewe processed only 6888 imagelets altogether but deep learning algorithms be-come efficient typically only once they are trained with hundreds of thousandsor millions of images

Second if SAR images and reference data of a higher resolution are used weexpect better classification performance too as smaller details could be poten-tially captured Also better agreement in acquisition timing of reference and

28

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 29: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

SAR imagery can be recommended The reference and training data shouldcome from the same months or year if possible and that the reference mapsshould represent the reality as accurately as possible The models in our exper-iments were certainly limited by the CORINErsquos own limited accuracy

Third in this study we have tested the effectiveness of off-the-shelf deeplearning models for land cover mapping from SAR data While the results showtheir effectiveness it is also likely that the novel types of models specificallydeveloped for the radar data (such as [70]) will yield even better results Basedon our results we suggest DenseNet-based models as a starting point In par-ticular one could develop the deep learning models to handle directly the SLCdata which preserve the phase information

Focusing on a single season is both an advantage and a limitation Impor-tantly we have avoided confusion between SAR signatures varying seasonallyfor several land cover classes However multitemporal dynamics itself can bepotentially used as an additional useful class-discriminating parameter Incor-porating seasonal dynamics of each land cover pixel (as a time series) is leftfor future work perhaps with additional need to incorporate recurrent neuralnetworks into the approach

As discussed in Section 311 it could be suitable to use more detailed(specific) land cover classes as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological leading to mixing several distinctSAR signatures in one class and thus causing additional confusion for the clas-sifier Later classified specific classes can be aggregated into larger classespotentially showing improved performance [19]

Finally we have used only SAR images and a freely-available DEM modelfor the presented large-scale land cover mapping If one were to combine othertype of remote sensing images in particular the optical images we expect thatthe results would significantly improve This is true for those areas where suchimagery can be collected due to cloud coverage while in operational scenario itwould potentially require use of at least two models (with and without opticalsatellite imagery) It is also important to access added value of SAR imagerywith deep learning models when optical satellite images are available as wellas possible data fusion and decision fusion scenarios before a decision on themapping approach is done [19]

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art seman-tic segmentation models to SAR image classification with high accuracy Sev-eral models were benchmarked in a countrywide classification experiment usingSentinel-1 IW-mode SAR data reaching nearly 91 overall classification accu-racy with the best performing model (FC-DenseNet) Given that the 14 usedSentinel-1 scenes resulted in 7K training images this indicates strong poten-tial for using pre-trained CNNs for further fine-tuning and seems particularlysuitable when the number of training images is limited (to thousand or tensof thousands instead of millions) In addition to suggesting the best candidate

29

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 30: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

semantic segmentation models for land cover mapping with SAR data (that isthe DenseNet-based models) our study offers baseline results against which thenewly proposed models should be evaluated Several possible improvements forthe future work were identified including the necessity for testing multitempo-ral approaches data fusion and very high-resolution SAR imagery as well asdeveloping models specifically for SAR and will be addressed in future work

References

[1] S Bojinski M Verstraete T C Peterson C Richter A SimmonsM Zemp The concept of essential climate variables in support of climateresearch applications and policy Bulletin of the American MeteorologicalSociety 95 (9) (2014) 1431ndash1443

[2] G Buttner J Feranec G Jaffrain L Mari G Maucha T Soukup TheCORINE land cover 2000 project EARSeL eProceedings 3 (3) (2004)331ndash346

[3] M Bossard J Feranec J Otahel et al CORINE land cover technicalguide Addendum 2000 (2000)

[4] G Buttner CORINE land cover and land cover change products in LandUse and Land Cover Mapping in Europe Springer 2014 pp 55ndash74

[5] M Trm T Markkanen S Hatunen P Hrm O-P Mattila A ArslanAssessment of land-cover data for land-surface modelling in regional cli-mate studies Boreal Environment Research 20 (2) (2015) 243ndash260

[6] J Chen J Chen A Liao X Cao L Chen X Chen C He G HanS Peng M Lu et al Global land cover mapping at 30 m resolution Apok-based operational approach ISPRS Journal of Photogrammetry andRemote Sensing 103 (2015) 7ndash27

[7] C A d Almeida A C Coutinho J C D M Esquerdo M AdamiA Venturieri C G Diniz N Dessay L Durieux A R Gomes Highspatial resolution land use and land cover mapping of the brazilian legalamazon in 2008 using landsat-5tm and modis data Acta Amazonica46 (3) (2016) 291ndash302

[8] C Homer J Dewitz L Yang S Jin P Danielson G Xian J CoulstonN Herold J Wickham K Megown Completion of the 2011 national landcover database for the conterminous united statesndashrepresenting a decade ofland cover change information Photogrammetric Engineering amp RemoteSensing 81 (5) (2015) 345ndash354

[9] Y Zhao D Feng L Yu X Wang Y Chen Y Bai H J HernandezM Galleguillos C Estades G S Biging et al Detailed dynamic landcover mapping of chile Accuracy improvement by integrating multi-temporal data Remote Sensing of Environment 183 (2016) 170ndash185

30

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 31: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

[10] P Griffiths C Nendel P Hostert Intra-annual reflectance compositesfrom sentinel-2 and landsat for national-scale crop and land cover map-ping Remote sensing of environment 220 (2019) 135ndash151

[11] R Torres P Snoeij D Geudtner D Bibby M Davidson E AttemaP Potin B Rommen N Floury M Brown I Traver P DeghayeB Duesmann B Rosich N Miranda C Bruno M LrsquoAbbate R CrociA Pietropaolo M Huchler F Rostan GMES Sentinel-1 mission RemoteSensing of Environment 120 (2012) 9ndash24 doi101016jrse201105028

[12] Y LeCun L Bottou Y Bengio P Haffner Gradient-based learning ap-plied to document recognition Proceedings of the IEEE 86 (11) (1998)2278ndash2324

[13] A Krizhevsky I Sutskever G E Hinton Imagenet classification withdeep convolutional neural networks in Advances in neural informationprocessing systems 2012 pp 1097ndash1105

[14] K Simonyan A Zisserman Very deep convolutional networks for large-scale image recognition CoRR abs14091556 (2014)

[15] I Goodfellow Y Bengio A Courville Deep learning (2016)

[16] W Cohen S Goward Landsatrsquos role in ecological applications ofremote sensing BioScience 54 (6) (2004) 535ndash545 doi101641

0006-3568(2004)054[0535LRIEAO]20CO2

[17] S Goetz A Baccini N Laporte T Johns W Walker J Kellndor-fer R Houghton M Sun Mapping and monitoring carbon stocks withsatellite observations A comparison of methods Carbon Balance andManagement 4 (2009) doi1011861750-0680-4-2

[18] C Atzberger Advances in remote sensing of agriculture Context de-scription existing operational monitoring systems and major informationneeds Remote Sensing 5 (2) (2013) 949ndash981 doi103390rs5020949

[19] T Hame J Kilpi H A Ahola Y Rauste O Antropov M RautiainenL Sirro S Bounpone Improved mapping of tropical forests with opticaland SAR imagery part I Forest cover and accuracy assessment usingmulti-resolution data IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 6 (1) (2013) 74ndash91

[20] O Antropov Y Rauste H Astola J Praks T Hame M HallikainenLand cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network IEEE Transactions on Geoscienceand Remote Sensing 52 (9) (2014) 5256ndash5270 doi101109TGRS20132287712

31

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 32: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

[21] A Lonnqvist Y Rauste M Molinier T Hame Polarimetric SAR datain land cover mapping in boreal zone IEEE Transactions on Geoscienceand Remote Sensing 48 (10) (2010) 3652ndash3662 doi101109TGRS20102048115

[22] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457 doi101016jisprsjprs2009

01003

[23] L Bruzzone M Marconcini U Wegmller A Wiesmann An advancedsystem for the automatic classification of multitemporal sar images IEEETransactions on Geoscience and Remote Sensing 42 (6) (2004) 1321ndash1334doi101109TGRS2004826821

[24] T Ullmann A Schmitt A Roth J Duffe S Dech H-W HubbertenR Baumhauer Land cover characterization and classification of arctictundra environments by means of polarized synthetic aperture x- andc-band radar (polsar) and landsat 8 multispectral imagery richards is-land canada Remote Sensing 6 (9) (2014) 8565ndash8593 doi103390

rs6098565URL httpwwwmdpicom2072-4292698565

[25] N Clerici C A V Caldern J M Posada Fusion of sentinel-1a andsentinel-2a data for land cover mapping a case study in the lowermagdalena region colombia Journal of Maps 13 (2) (2017) 718ndash726doi1010801744564720171372316

[26] C Castaneda D Ducrot Land cover mapping of wetland areas in anagricultural landscape using sar and landsat imagery Journal of Environ-mental Management 90 (7) (2009) 2270ndash2277

[27] Y Ban H Hu I M Rangel Fusion of quickbird ms and radarsat sardata for urban land-cover mapping Object-based and knowledge-basedapproach International Journal of Remote Sensing 31 (6) (2010) 1391ndash1410

[28] G V Laurin V Liesenberg Q Chen L Guerriero F Del Frate A Bar-tolini D Coomes B Wilebore J Lindsell R Valentini Optical andsar sensor synergies for forest and land cover mapping in a tropical sitein west africa International Journal of Applied Earth Observation andGeoinformation 21 (2013) 7ndash16

[29] R Khatami G Mountrakis S V Stehman A meta-analysis of remotesensing research on supervised pixel-based land-cover image classificationprocesses General guidelines for practitioners and future research Re-mote Sensing of Environment 177 (2016) 89ndash100

32

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 33: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

[30] B Waske M Braun Classifier ensembles for land cover mapping usingmultitemporal sar imagery ISPRS Journal of Photogrammetry and Re-mote Sensing 64 (5) (2009) 450ndash457

[31] H Balzter B Cole C Thiel C Schmullius Mapping CORINE landcover from Sentinel-1A SAR and SRTM digital elevation model data usingrandom forests Remote Sensing 7 (11) (2015) 14876ndash14898 doi10

3390rs71114876

[32] S-E Park Variations of microwave scattering properties by seasonalfreezethaw transition in the permafrost active layer observed by ALOSPALSAR polarimetric data Remote Sensing 7 (12) (2015) 17135ndash17148doi103390rs71215874

[33] M C Dobson L E Pierce F T Ulaby Knowledge-based land-coverclassification using ERS-1JERS-1 SAR composites IEEE Transactionson Geoscience and Remote Sensing 34 (1) (1996) 83ndash99 doi101109

36481896

[34] L Sirro T Hme Y Rauste J Kilpi J Hmlinen K Gunia B de JongF Paz Pellat Potential of different optical and SAR data in forest andland cover classification to support redd+ mrv Remote Sensing 10 (6)(2018) doi103390rs10060942

[35] J D T De Alban G M Connette P Oswald E L Webb CombinedLandsat and L-band SAR data improves land cover classification andchange detection in dynamic tropical landscapes Remote Sensing 10 (2)(2018) doi103390rs10020306

[36] N Longepe P Rakwatin O Isoguchi M Shimada Y Uryu K YuliantoAssessment of alos palsar 50 m orthorectified fbd data for regional landcover classification by support vector machines IEEE Transactions onGeoscience and Remote Sensing 49 (6) (2011) 2135ndash2150 doi101109

TGRS20102102041

[37] T Esch A Schenk T Ullmann M Thiel A Roth S Dech Char-acterization of land cover types in terrasar-x images by combined anal-ysis of speckle statistics and intensity information IEEE Transactionson Geoscience and Remote Sensing 49 (6) (2011) 1911ndash1925 doi

101109TGRS20102091644

[38] J W Cable J M Kovacs J Shang X Jiao Multi-temporal polarimet-ric radarsat-2 for land cover monitoring in northeastern ontario canadaRemote Sensing 6 (3) (2014) 2372ndash2392 doi103390rs6032372URL httpwwwmdpicom2072-4292632372

[39] X Niu Y Ban Multi-temporal radarsat-2 polarimetric sar data for urbanland-cover classification using an object-based support vector machine anda rule-based approach International Journal of Remote Sensing 34 (1)(2013) 1ndash26 doi101080014311612012700133

33

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 34: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

[40] T L Evans M Costa K Telmer T S F Silva Using alospalsar andradarsat-2 to map land cover and seasonal inundation in the brazilianpantanal IEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing 3 (4) (2010) 560ndash575 doi101109JSTARS2010

2089042

[41] P Lumsdon S R Cloude G Wright Polarimetric classification of landcover for Glen Affric radar project IEE Proceedings - Radar Sonar andNavigation 152 (6) (2005) 404ndash412 doi101049ip-rsn20041313

[42] C da Costa Freitas L de Souza Soler S J S SantrsquoAnna L V DutraJ R Dos Santos J C Mura A H Correia Land use and land cover map-ping in the brazilian amazon using polarimetric airborne p-band sar dataIEEE Transactions on Geoscience and Remote Sensing 46 (10) (2008)2956ndash2970

[43] G Li D Lu E Moran L Dutra M Batistella A comparative analysis ofalos palsar l-band and radarsat-2 c-band data for land-cover classificationin a tropical moist region ISPRS Journal of Photogrammetry and RemoteSensing 70 (2012) 26ndash38 doi101016jisprsjprs201203010

[44] N Park K Chi Integration of multitemporalpolarization cband sar datasets for landcover classification International Journal of Remote Sensing29 (16) (2008) 4667ndash4688 doi10108001431160801947341

[45] E Tomppo O Antropov J Praks Cropland classification using Sentinel-1 time series Methodological performance and prediction uncertainty as-sessment Remote Sensing 11 (21) (2019) doi103390rs11212480

[46] D B Nguyen A Gruber W Wagner Mapping rice extent and croppingscheme in the mekong delta using sentinel-1a data Remote Sensing Letters7 (12) (2016) 1209ndash1218

[47] A Veloso S Mermoz A Bouvet T Le Toan M Planells J-F DejouxE Ceschia Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications Remote Sensing ofEnvironment 199 (2017) 415ndash426

[48] G Satalino A Balenzano F Mattia M W Davidson C-band SARdata for mapping crops dominated by surface or volume scattering IEEEGeoscience and Remote Sensing Letters 11 (2) (2013) 384ndash388

[49] F Vicente-Guijalba A Jacob J M Lopez-Sanchez C Lopez-MartinezJ Duro C Notarnicola D Ziolkowski A Mestre-Quereda E PottierJ J Mallorqu M Lavalle M Engdahl Sincohmap Land-cover and veg-etation mapping using multi-temporal Sentinel-1 interferometric coher-ence in IGARSS 2018 - 2018 IEEE International Geoscience and RemoteSensing Symposium 2018 pp 6631ndash6634 doi101109IGARSS2018

8517926

34

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 35: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

[50] S Ge O Antropov W Su H Gu J Praks Deep recurrent neural net-works for land-cover classification using Sentinel-1 InSAR time series inIGARSS 2019 - 2019 IEEE International Geoscience and Remote SensingSymposium 2019 pp 473ndash476 doi101109IGARSS20198900088

[51] Y LeCun Y Bengio et al Convolutional networks for images speechand time series The handbook of brain theory and neural networks3361 (10) (1995) 1995

[52] X X Zhu D Tuia L Mou G-S Xia L Zhang F Xu F Fraun-dorfer Deep learning in remote sensing a review arXiv preprintarXiv171003959 (2017)

[53] M Mahdianpari B Salehi M Rezaee F Mohammadimanesh Y ZhangVery deep convolutional neural networks for complex land cover mappingusing multispectral remote sensing imagery Remote Sensing 10 (7) (2018)1119

[54] L Zhang L Zhang B Du Deep learning for remote sensing data Atechnical tutorial on the state of the art IEEE Geoscience and RemoteSensing Magazine 4 (2) (2016) 22ndash40

[55] J Zhang P Zhong Y Chen S Li l 12-regularized deconvolutionnetwork for the representation and restoration of optical remote sensingimages IEEE Transactions on Geoscience and Remote Sensing 52 (5)(2014) 2617ndash2627

[56] X Chen S Xiang C-L Liu C-H Pan Aircraft detection by deep beliefnets in Pattern Recognition (ACPR) 2013 2nd IAPR Asian Conferenceon IEEE 2013 pp 54ndash58

[57] X Chen S Xiang C-L Liu C-H Pan Vehicle detection in satelliteimages by hybrid deep convolutional neural networks IEEE Geoscienceand remote sensing letters 11 (10) (2014) 1797ndash1801

[58] Y Liu G Cao Q Sun M Siegel Hyperspectral classification viadeep networks and superpixel segmentation International Journal of Re-mote Sensing 36 (13) (2015) 3459ndash3482 doi101080014311612015

1055607

[59] J Wang Q Qin Z Li X Ye J Wang X Yang X Qin Deep hierarchicalrepresentation and segmentation of high resolution remote sensing imagesin Geoscience and Remote Sensing Symposium (IGARSS) 2015 IEEEInternational IEEE 2015 pp 4320ndash4323

[60] D Tuia R Flamary N Courty Multiclass feature learning for hyperspec-tral image classification Sparse and hierarchical solutions ISPRS Journalof Photogrammetry and Remote Sensing 105 (2015) 272ndash285

35

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 36: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

[61] F Hu G-S Xia J Hu L Zhang Transferring deep convolutional neu-ral networks for the scene classification of high-resolution remote sensingimagery Remote Sensing 7 (11) (2015) 14680ndash14707

[62] O A B Penatti K Nogueira J A dos Santos Do deep features gener-alize from everyday objects to remote sensing and aerial scenes domainsin 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion Workshops (CVPRW) 2015 pp 44ndash51 doi101109CVPRW20157301382

[63] F P Luus B P Salmon F Van den Bergh B T J Maharaj Multiviewdeep learning for land-use classification IEEE Geoscience and RemoteSensing Letters 12 (12) (2015) 2448ndash2452

[64] F Zhang B Du L Zhang Scene classification via a gradient boostingrandom convolutional network framework IEEE Transactions on Geo-science and Remote Sensing 54 (3) (2016) 1793ndash1802

[65] T Ishii R Nakamura H Nakada Y Mochizuki H Ishikawa Surfaceobject recognition with cnn and svm in landsat 8 images in 2015 14thIAPR International Conference on Machine Vision Applications (MVA)2015 pp 341ndash344 doi101109MVA20157153200

[66] N Kussul M Lavreniuk S Skakun A Shelestov Deep learning clas-sification of land cover and crop types using remote sensing data IEEEGeoscience and Remote Sensing Letters 14 (5) (2017) 778ndash782

[67] Y Chen Z Lin X Zhao G Wang Y Gu Deep learning-based classifi-cation of hyperspectral data IEEE Journal of Selected topics in appliedearth observations and remote sensing 7 (6) (2014) 2094ndash2107

[68] G Wu X Shao Z Guo Q Chen W Yuan X Shi Y Xu R ShibasakiAutomatic building segmentation of aerial imagery using multi-constraintfully convolutional networks Remote Sensing 10 (3) (2018) 407

[69] Y Duan F Liu L Jiao P Zhao L Zhang Sar image segmentationbased on convolutional-wavelet neural network and markov random fieldPattern Recognition 64 (2017) 255ndash267

[70] F Mohammadimanesh B Salehi M Mahdianpari E Gill M MolinierA new fully convolutional neural network for semantic segmentation of po-larimetric SAR imagery in complex land cover ecosystem ISPRS Journalof Photogrammetry and Remote Sensing 151 (2019) 223 ndash 236

[71] L Wang X Xu H Dong R Gui F Pu Multi-pixel simultaneous classifi-cation of polsar image using convolutional neural networks Sensors 18 (3)(2018) 769

36

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 37: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

[72] M Ahishali S Kiranyaz T Ince M Gabbouj Dual and single polarizedSAR image classification using compact convolutional neural networksRemote Sensing 11 (11) (2019) 1340

[73] Z Li Z Yang H Xiong Homogeneous region segmentation for SARimages based on two steps segmentation algorithm in Computers Com-munications and Systems (ICCCS) International Conference on IEEE2015 pp 196ndash200

[74] V Badrinarayanan A Kendall R Cipolla Segnet A deep convolutionalencoder-decoder architecture for image segmentation IEEE transactionson pattern analysis and machine intelligence 39 (12) (2017) 2481ndash2495

[75] H Zhao J Shi X Qi X Wang J Jia Pyramid scene parsing networkin Proceedings of the IEEE conference on computer vision and patternrecognition 2017 pp 2881ndash2890

[76] C Yu J Wang C Peng C Gao G Yu N Sang Bisenet Bilateralsegmentation network for real-time semantic segmentation in Proceed-ings of the European Conference on Computer Vision (ECCV) 2018 pp325ndash341

[77] L-C Chen Y Zhu G Papandreou F Schroff H Adam Encoder-decoder with atrous separable convolution for semantic image segmen-tation arXiv preprint arXiv180202611 (2018)

[78] L-C Chen G Papandreou I Kokkinos K Murphy A L YuilleDeeplab Semantic image segmentation with deep convolutional netsatrous convolution and fully connected CRFs IEEE transactions on pat-tern analysis and machine intelligence 40 (4) (2018) 834ndash848

[79] O Ronneberger PFischer T Brox U-net Convolutional networksfor biomedical image segmentation in Medical Image Computing andComputer-Assisted Intervention (MICCAI) Vol 9351 of LNCS Springer2015 pp 234ndash241 (available on arXiv150504597 [csCV])URL httplmbinformatikuni-freiburgdePublications2015

RFB15a

[80] A G Howard M Zhu B Chen D Kalenichenko W Wang T WeyandM Andreetto H Adam Mobilenets Efficient convolutional neural net-works for mobile vision applications arXiv preprint arXiv170404861(2017)

[81] T Pohlen A Hermans M Mathias B Leibe Full-resolution residualnetworks for semantic segmentation in street scenes in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition 2017pp 4151ndash4160

37

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 38: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

[82] S Jegou M Drozdzal D Vazquez A Romero Y Bengio The onehundred layers tiramisu Fully convolutional densenets for semantic seg-mentation in Computer Vision and Pattern Recognition Workshops(CVPRW) 2017 IEEE Conference on IEEE 2017 pp 1175ndash1183

[83] O Antropov Y Rauste A Lonnqvist T Hame PolSAR mosaic normal-ization for improved land-cover mapping IEEE Geoscience and RemoteSensing Letters 9 (6) (2012) 1074ndash1078

[84] J Long E Shelhamer T Darrell Fully convolutional networks for se-mantic segmentation in Proceedings of the IEEE conference on computervision and pattern recognition 2015 pp 3431ndash3440

[85] D H Hubel T N Wiesel Receptive fields binocular interaction andfunctional architecture in the catrsquos visual cortex The Journal of physiol-ogy 160 (1) (1962) 106ndash154

[86] O Russakovsky J Deng H Su J Krause S Satheesh S Ma Z HuangA Karpathy A Khosla M Bernstein et al Imagenet large scale visualrecognition challenge International journal of computer vision 115 (3)(2015) 211ndash252

[87] K He X Zhang S Ren J Sun Deep residual learning for image recog-nition in Proceedings of the IEEE conference on computer vision andpattern recognition 2016 pp 770ndash778

[88] G Huang Z Liu L Van Der Maaten K Q Weinberger Densely con-nected convolutional networks in CVPR Vol 1 2017 p 3

[89] C Szegedy W Liu Y Jia P Sermanet S Reed D Anguelov D Er-han V Vanhoucke A Rabinovich Going deeper with convolutions inProceedings of the IEEE conference on computer vision and pattern recog-nition 2015 pp 1ndash9

[90] S Ji W Xu M Yang K Yu 3D convolutional neural networks for humanaction recognition IEEE transactions on pattern analysis and machineintelligence 35 (1) (2013) 221ndash231

[91] T N Sainath A-r Mohamed B Kingsbury B Ramabhadran Deepconvolutional neural networks for LVCSR in Acoustics speech and signalprocessing (ICASSP) 2013 IEEE international conference on IEEE 2013pp 8614ndash8618

[92] D Small L Zuberbuhler A Schubert E Meier Terrain-flattened gammanought Radarsat-2 backscatter Canadian Journal of Remote Sensing37 (5) (2012) 493ndash499

[93] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Theproduction of finnish corine land cover 2000 classification XXth ISPRSCongress Istanbul Turkey (2004)

38

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion
Page 39: Wide-Area Land Cover Mapping with Sentinel-1 Imagery using … · 2020-02-27 · Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models

[94] P Hrm R Teiniranta M Trm R Repo E Jrvenp M Kallio Finnishcorine land cover 2000 classification XXth ISPRS Congress AnchorageUS (2004)

[95] A Garcia-Garcia S Orts-Escolano S Oprea V Villena-MartinezJ Garcia-Rodriguez A review on deep learning techniques applied tosemantic segmentation arXiv preprint arXiv170406857 (2017)

[96] F Chollet Xception Deep learning with depthwise separable convolu-tions in Proceedings of the IEEE conference on computer vision andpattern recognition 2017 pp 1251ndash1258

[97] O Ronneberger P Fischer T Brox U-net Convolutional networks forbiomedical image segmentation in International Conference on Medicalimage computing and computer-assisted intervention Springer 2015 pp234ndash241

[98] L-C Chen G Papandreou F Schroff H Adam Rethinkingatrous convolution for semantic image segmentation arXiv preprintarXiv170605587 (2017)

[99] Y Bengio Deep learning of representations for unsupervised and trans-fer learning in Proceedings of ICML Workshop on Unsupervised andTransfer Learning 2012 pp 17ndash36

[100] M Abadi P Barham J Chen Z Chen A Davis J Dean M DevinS Ghemawat G Irving M Isard et al Tensorflow A system for large-scale machine learning in 12th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 16) 2016 pp 265ndash283

[101] H Costa G M Foody D S Boyd Supervised methods of image seg-mentation accuracy assessment in land cover mapping Remote sensing ofenvironment 205 (2018) 338ndash351

[102] G Csurka D Larlus F Perronnin F Meylan What is a good evaluationmeasure for semantic segmentation in BMVC Vol 27 Citeseer 2013p 2013

[103] J Cohen A coefficient of agreement for nominal scales Educational andPsychological Measurement 20 (1) (1960) 37 ndash 46

[104] O Antropov Y Rauste T Hame Volume scattering modeling in PolSARdecompositions Study of ALOS PALSAR data over boreal forest IEEETransactions on Geoscience and Remote Sensing 49 (10) (2011) 3838ndash3848

39

  • 1 Introduction
    • 11 Land Cover Mapping with SAR Imagery
    • 12 Deep Learning in Remote Sensing
    • 13 Study goals
      • 2 Deep Learning Terminology
      • 3 Materials and methods
        • 31 Study site
        • 32 SAR data
        • 33 Reference data
        • 34 Semantic Segmentation Models
          • 341 BiSeNet (Bilateral Segmentation Network)
          • 342 SegNet (Encoder-Decoder-Skip)
          • 343 Mobile U-Net
          • 344 DeepLab-V3+
          • 345 FRRN-B (Full-Resolution Residual Networks)
          • 346 PSPNet (Pyramid Scene Parsing Network)
          • 347 FC-DenseNet (Fully Convolutional DenseNets)
            • 35 Training approach
            • 36 Experimental Setup
              • 361 SAR Data Preprocessing for Deep Learning
              • 362 TrainDevelopment and Test (Accuracy Assessment) Dataset
              • 363 Data Augmentation
              • 364 Implementation
              • 365 Hardware and Training Setup
                • 37 Evaluation Metrics
                  • 4 Results and Discussion
                    • 41 Classification Performance
                    • 42 Computational Performance
                    • 43 Comparison to Similar Work
                    • 44 Outlook and Future Work
                      • 5 Conclusion