[databeers] 27-11-2014 “githubland”. luis osa
TRANSCRIPT
Motivation: No code rule
I “Data beers” rules do not allow to show code during talks.I But data is the same as code!
(cf. Lisp homoiconicity, Unix “rule of representation”)
1 / 16
Where is there a lot of code? Github
is the VCS developed by the Linux kernel team
is a Git repository web-based hosting service
2 / 16
Inspiration: Blatt maps
I U.S. Census Bureau data on second languages inAmerican households 1
1http://gizmodo.com/the-most-common-languages-spoken-in-the-u-s-state-by-1575719698
3 / 16
Processing Github information
I Github offers a REST API, but it has rate limitsI GitHub Archive publishes all public commits in hourly
archivesI Google BigQuery has the Github timeline as public data
9 / 16
Which countries are there in Europe?
I There may be new countries:
I There may be less countries:
I A solution: DBpedia and SPARQL
DBpedia has a SPARQL endpoint to receive queries. Thereare wrapper libraries
10 / 16
No Twitter
I Quite tired of people categorizing tweets. There are manyAPIs out there!
I Do not worry, we are still going to get rich! → using WorldBank macroeconomic data 2
2Sherouse, Oliver (2014). Wbdata. Arlington, VA. Available from http://github.com/OliverSherouse/wbdata.
11 / 16
corr(GDP, language)
Figure: Pearson correlation of GDP with language preference 3
3Negative values denote a language used in richer countries; a low value in the language precedence means ahigher place in the language preference list for a country. 13 / 16
corr(unemployment, language)
Figure: Pearson correlation of unemployment with language preference4
4Positive values show preferred languages in countries with low unemployment
14 / 16
corr(debt, language)
Figure: Pearson correlation of total government debt as % of GDP with language preference5
5Positive values show preferred languages in countries with low debt
15 / 16
Take away messages
I Data talk about code!
I SPARQL and other APIs: all data is on your laptop
I BigQuery and other tools: your laptop controls clusters
I All languages are beautiful
I but do not program in OCaml if you can avoid it
16 / 16
Take away messages
I Data talk about code!
I SPARQL and other APIs: all data is on your laptop
I BigQuery and other tools: your laptop controls clusters
I All languages are beautiful
I but do not program in OCaml if you can avoid it
16 / 16
Take away messages
I Data talk about code!
I SPARQL and other APIs: all data is on your laptop
I BigQuery and other tools: your laptop controls clusters
I All languages are beautiful
I but do not program in OCaml if you can avoid it
16 / 16
Take away messages
I Data talk about code!
I SPARQL and other APIs: all data is on your laptop
I BigQuery and other tools: your laptop controls clusters
I All languages are beautiful
I but do not program in OCaml if you can avoid it
16 / 16