賴昆祺_運用dbscan演算法與google maps於大量物種出現紀錄之研究

11
運運 DBSCAN 運運運運 Google Maps 運運運運運運運運運運運運 Applying DBSCAN Algorithm and Google Maps in the study of Large-Scale of Species Occurrence Data 運運運 1 2 3 運運 4 5 Kun-Chi Lai, Elie Chen, You-Sheng Li, Kwang-Tsao Shao 運運 物物 、、, TaiBIF 物物物物物物物物物物物 物物物物物物物 26 物物物物物 150 物物 85 物物物物物物物物物物物物 物物物物物物物 %。 (Geospatial Knowledge Discovery) 物物物物物物物 物物物物物物物物物物 物物物物物物物物物物物物物物物物物物物物物物 物物物物物物物物物物物物物物物物物 一,。, 物物 DBSCAN 物物物物物物物物物物物物物 物物物物物物物物物物物物物物 物物物 物物物物物物物(Eps)物物物物物物物物物物物物物物(MinPts) ,( Convex hull 物 Polygon clipping 物 物 Google Maps 物 Google Maps 物 物物物物物物物物物物物物物物 Google Maps 物物物物物物物物 物物物物物物物物物物物物物物物物物物物物物 ,。 DBSCAN 運 運 、、Google Maps 1 物物物物物 物物物物物物物 物物物物物物物物物物物 物物物物 / 物物物物物物物物物 物物物 PhD Student, Department of computer Science, National Chengchi University Project Manager of Taiwan Biodiversity Information Facility, Biodiversity Research Center, Academia Sinica 2 物物物物物物物物物物物物物物物物物物 PhD Student, Institute of Marine Biology, National Taiwan Ocean University 3 物物物物物 物物物物物物物 物物物物物物物物物物物 物 Software Engineer of Taiwan Biodiversity Information Facility, Biodiversity Research Center, Academia Sinica 4 物物物物物 物物物物物物物物物物 物 Software Engineer, Research Center for Information Technology Innovation, Academia Sinica 5 物物物物物 物物物物物物物 物物物物物物物物物物物物物物物物物物物物物 Research Fellow and Executive Officer for Systematics and Biodiversity Information Division, Biodiversity Research Center, Academia Sinica

Upload: kclai0122

Post on 29-Jul-2015

149 views

Category:

Documents


8 download

TRANSCRIPT

DBSCAN Google Maps Applying DBSCAN Algorithm and Google Maps in the study of LargeScale of Species Occurrence Data

1 2 3 4 5Kun-Chi Lai, Elie Chen, You-Sheng Li, Kwang-Tsao Shao TaiBIF 26 150 85 (Geospatial Knowledge Discovery) DBSCAN (Eps) (MinPts) Convex hull Polygon clipping Google Maps Google Maps Google Maps

DBSCAN Google Maps1

/ PhD Student, Department of computer Science, National Chengchi University Project Manager of Taiwan Biodiversity Information Facility, Biodiversity Research Center, Academia Sinica 2 PhD Student, Institute of Marine Biology, National Taiwan Ocean University 3 Software Engineer of Taiwan Biodiversity Information Facility, Biodiversity Research Center, Academia Sinica 4 Software Engineer, Research Center for Information Technology Innovation, Academia Sinica 5 Research Fellow and Executive Officer for Systematics and Biodiversity Information Division, Biodiversity Research Center, Academia Sinica

AbstractThe primary species occurrence data include data on animal and plant specimens in museums and herbaria, as well as species observations. The TaiBIF data portal has integrated 26 datasets so far, resulting in more than 1.5 million species occurrence data; 85% of them are geo-referenced. Geospatial clustering is an important method for geospatial knowledge discovery which explores spatial data. In this paper, we present density-based clustering method. It utilizes DBSCAN algorithm to draw arbitrary distribution maps by using two parameters (one is -neighborhood and the other is MinPts, the minimum number of points). DBSCAN algorithm describes the visualization of occurrence data on Google Maps which can be helpful in understanding and discovering the knowledge embedded in the species geographical mapping, leading to better conservation effort. Keywords: species occurrence data, cluster analysis, biodiversity informatics

1. 2001 (Global Biodiversity Information Facility, GBIF) GBIF (Darwin Core) DiGIR TAPIR BioCASE (Hill et al., 2009) 2.7 GIS GBIF (Geospatial Knowledge Discovery)(Data Mining) (Cluster Analysis)(Miller et al. 2009) 150 (Biodiversity informatics)

(Peterson et al, 2010) Google Maps (Zang et al., 2008) Google Maps Google Maps

2. 2.1 ( ) (Chapman, 2005) GIS GBIF 1 0.1 (GBIF Data Portal, 2011) (Encyclopedia of Life, EOL) (Encyclopedia of Life, 2011) Hijmans 50*50 DIVA-GIS (Hijmans, 2001)Flemons GBIF GBIF-MAPA (Services Oriented ArchitectureSOA) (Flemons et al., 2007)

2.2 2002 Dublin Core

(TaiBIF) TAPIR (TDWG Access Protocol for Information Retrieval) TAPIR (Customization) GBIF extension XML file Darwin Core TaiBIF ( ,2010) 150 TaiBIF 40*40 10*10 2*2 3 2010

(a) 40 (b)10 1.

3.3.1 DBSCAN (Clusters) (1) K (Partitioning methods)(2)

(Hierarchical methods) (3) (Density-based methods)(4)(Grid-based methods) (noises)(outliers) OPTICS DENCLUE DBSCAN DBSCANDensity-Based Spatial Clustering of Applications with Noise (Eps) (MinPts)(Eps) MinPts (1) (Directly density-reachable) 3 D E E F (2)(Density-reachable) 3 D E F (3)(Density-connected) 3 B A C A (Han et al., 2007)

3 DBSCAN

3.2 Darwin Core TAPIR TaiBIF () DBSCAN Google Maps 4

4 DBSCAN MinPts Convex hull Incremental Jarvis's March (Gift Wrap) Divide and Conquer Quick hull Quick hull 2010

5 MinPts convex hull 6 35 MinPts 515 ( 6)

Eps =5 ; MinPts=15 =3

Eps =4 ; MinPts=15 =7

Eps =3 ; MinPts=15 =6

6 Eps MinPts DBSCAN () MinPts outliers outliers MinPts k-dist MinPts k-dist dist(p,q)q p k k-dist MinPts (Xu et al., 1998) k-disk (Threshold point) MinPts Eps MinPts Eps MinPts Eps Polygon Cliping : SutherlandHodgman clippingWeilerAtherton clippingVatti clipping Greiner subject polygonclipping polygon 7

7

4. 1999 (8)

8 (, 1999) DBSCAN 4 MinPts 11 9a 9b 8 DBSCAN 4 DBSCAN

a b DBSCAN (Eps =0.04 MinPts=11) 9 DBSCAN

5. DBSCAN () 4 Google Maps Google Maps GIS Web 2.0 Google Maps (scientific workflow)

Chapman, A.D. (2005) Uses of primary species-occurrence data, version 1.0. Global Biodiversity Information Facility. Encyclopedia of Life (2011) Retrieved from http://www.eol.org Ester, M., Kriegel, H.P., Sander J., Xu X. (1998) Clustering for Mining in Large Spatial Databases. Special Issue on Data Mining, KI-Journal, ScienTec Publishing, 1, 1-7. Finley, D.R. (2007) Point-In-Polygon Algorithm - Determining Whether A Point Is Inside A Complex Polygon. http://www.alienryderflex.com/polygon/. Flemons, P., Guralnick, R., Krieger, J., Ranipeta, A., Neufeld, D. (2007) A web-based GIS tool for exploring the world's biodiversity: The Global Biodiversity Information Facility Mapping and Analysis Portal Application (GBIF-MAPA). Ecological Informatics, 2(1), 49-60. GBIF Data Portal. (2011). Retrieved from http://data.gbif.org Han, J., Kamber, M. (2006). Data Mining: Concepts and Techniques (2 ed.). Morgan Kaufmann. Hijmans, R.J., Spooner, D.M. (2001). Geographic distribution of wild potato species. American Journal of Botany, 88(11), 2101-2112. Hill, A.W., Guralnick, R., Flemons, P., Beaman, R., Wieczorek, J., Ranipeta, A. et al (2009) Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data. BMC Bioinformatics, 10, S3. Jaffe, A., Naaman, M., Tassa, T., Davis, M. (2006) Generating summaries and visualization for large collections of geo-referenced photographs. In: Proceedings of the 8th ACM international workshop on Multimedia information retrieval, 89-98. Li, W., Ong, E., Xu, S., Hung, T. (2005) A Point Inclusion Test Algorithm for Simple Polygons. In: Computational Science and Its Applications - ICCSA 2005, 3480, 769775. Miller, H.J., Han, J. (2009) Geographic Data Mining and Knowledge Discovery (2 ed.). CRC Press. Mucke, E. (2009) Computing Prescriptions: Quickhull: Computing Convex Hulls Quickly. Computing in Science Engineering, 11(5), 54-57. Peterson, A.T., Knapp, S., Guralnick, R., Sobern, J., Holder, M.T. (2010) The big questions for biodiversity informatics. Systematics and Biodiversity, 8(2), 159-168. Shao, K.T., Peng, C.I., Yen, E., Lai, K.C., Wang, M.C., Lin, J. et al (2007) Integration of Biodiversity Databases in Taiwan and Linkage to Global Databases. Data Science Journal, 6, S2--S10. Zang, N., Rosson, M.B., Nasser, V. (2008) Mashups: who? what? why?. CHI '08 extended abstracts on Human factors in computing systems, 3171-3176 (1999) 288

2010 Biodiversity Science2010 Vol. 18 (5) pp. 444-453ISSN10050094 2010 Google Map --2010 ISBN9789860258349