[databeers] 27-11-2014 “githubland”. luis osa

21
Githubland Luis Osa <[email protected]> November 2014 0 / 16

Upload: data-beers

Post on 17-Jul-2015

104 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Githubland

Luis Osa <[email protected]>

November 2014

0 / 16

Motivation: No code rule

I “Data beers” rules do not allow to show code during talks.I But data is the same as code!

(cf. Lisp homoiconicity, Unix “rule of representation”)

1 / 16

Where is there a lot of code? Github

is the VCS developed by the Linux kernel team

is a Git repository web-based hosting service

2 / 16

Inspiration: Blatt maps

I U.S. Census Bureau data on second languages inAmerican households 1

1http://gizmodo.com/the-most-common-languages-spoken-in-the-u-s-state-by-1575719698

3 / 16

European (programming) languages

Figure: Most popular languages

4 / 16

European (programming) languages

Figure: The problem is Octopress

5 / 16

European (programming) languages

Figure: Most popular languages excluding JavaScript

6 / 16

European (programming) languages

Figure: The problem is the web

7 / 16

European (programming) languages

Figure: Most popular languages excluding JavaScript and PHP

8 / 16

Processing Github information

I Github offers a REST API, but it has rate limitsI GitHub Archive publishes all public commits in hourly

archivesI Google BigQuery has the Github timeline as public data

9 / 16

Which countries are there in Europe?

I There may be new countries:

I There may be less countries:

I A solution: DBpedia and SPARQL

DBpedia has a SPARQL endpoint to receive queries. Thereare wrapper libraries

10 / 16

No Twitter

I Quite tired of people categorizing tweets. There are manyAPIs out there!

I Do not worry, we are still going to get rich! → using WorldBank macroeconomic data 2

2Sherouse, Oliver (2014). Wbdata. Arlington, VA. Available from http://github.com/OliverSherouse/wbdata.

11 / 16

Google Correlations

Figure: “Clojure programming destroys jobs”, Del Cacho, Carlos, 2014

12 / 16

corr(GDP, language)

Figure: Pearson correlation of GDP with language preference 3

3Negative values denote a language used in richer countries; a low value in the language precedence means ahigher place in the language preference list for a country. 13 / 16

corr(unemployment, language)

Figure: Pearson correlation of unemployment with language preference4

4Positive values show preferred languages in countries with low unemployment

14 / 16

corr(debt, language)

Figure: Pearson correlation of total government debt as % of GDP with language preference5

5Positive values show preferred languages in countries with low debt

15 / 16

Take away messages

I Data talk about code!

I SPARQL and other APIs: all data is on your laptop

I BigQuery and other tools: your laptop controls clusters

I All languages are beautiful

I but do not program in OCaml if you can avoid it

16 / 16

Take away messages

I Data talk about code!

I SPARQL and other APIs: all data is on your laptop

I BigQuery and other tools: your laptop controls clusters

I All languages are beautiful

I but do not program in OCaml if you can avoid it

16 / 16

Take away messages

I Data talk about code!

I SPARQL and other APIs: all data is on your laptop

I BigQuery and other tools: your laptop controls clusters

I All languages are beautiful

I but do not program in OCaml if you can avoid it

16 / 16

Take away messages

I Data talk about code!

I SPARQL and other APIs: all data is on your laptop

I BigQuery and other tools: your laptop controls clusters

I All languages are beautiful

I but do not program in OCaml if you can avoid it

16 / 16

Take away messages

I Data talk about code!

I SPARQL and other APIs: all data is on your laptop

I BigQuery and other tools: your laptop controls clusters

I All languages are beautiful

I but do not program in OCaml if you can avoid it

16 / 16