data discovery and reuse
DESCRIPTION
Slides for workshop led by Friedrich Lindenberg and Jonathan Gray at "Use of Information and Data for EnhancedCommunication and Advocacy" workshop in Budapest, 17th March 2011.TRANSCRIPT
Data discovery and reuse
TTF-IP Workshop, 18.3.
Monday, March 21, 2011
Data processes 1
•Need: machine-readable, openly licensed.
•Re-publish derived data
Monday, March 21, 2011
Data processes 2
•Goal: reproducible results, ecosystems:
•Tools to regularly extract data, share, transform and load it.
•Catalogues, documentation.
•“Data-the-process”, not “data-the-file”
•no “Excel Afternoons”
Monday, March 21, 2011
Data Acquisition
VoluntaryRelease
InvolunatryRelease
Active acquisition FoI Scraping
Passiveacquisition PSI/Open Data Leaks
Monday, March 21, 2011
Basic tools
•Language “convention”: Python
•Screen scraping: ScraperWiki
•Semi-structured storage: MongoDB
•Keeping an overview: CKAN
Monday, March 21, 2011
Monday, March 21, 2011
Textual data
•De-PDF (Acrobat Pro, pdf2text)
•Index & Search (Apache Solr)
•Basic NLP: Word counts/freqs, NEE etc.: nltk
•Publish: Co-ment, AnnotateIt, Scribd
•Soon: DocumentClouds for all!
Monday, March 21, 2011
Monday, March 21, 2011
Monday, March 21, 2011
Monday, March 21, 2011
Monday, March 21, 2011
Monday, March 21, 2011
Numeric data I•De-PDF: ABBYY FineReader
•Munge & Massage: Google Refine
•Share & Extend: Google Spreadsheets
•R/Stata/SPSS: more suited to internal processes.
•BI/Analytics/Aggregation: custom?
Monday, March 21, 2011
Monday, March 21, 2011
Numeric data II
•Visualization, first go: Google Vis Toolkit, IBM ManyEyes
•Visualization, interactive: Protovis, Raphael
•Flash considered harmful :-(
Monday, March 21, 2011
Monday, March 21, 2011
Monday, March 21, 2011
Network data
•Can be derived from other types
•Think about structure: nodes, edges, weights, directions
•Analysis: find central actors, mediators, ...: MCI, NetworkX
•Visualization: Gephi, GraphViz
Monday, March 21, 2011
Monday, March 21, 2011
FTS “Afghanistan”, Ronny PatzMonday, March 21, 2011
Geo-data
•There is more than Google Maps markers :-)
•Talk to your neighborhood OSM crowd.
Monday, March 21, 2011