pysqldf

pysqldf

Ryoji Ishii

pandas

pandas

Rのdataframeの様に表形式のデータ型DataFrameを利用可能にするライブラリ。ただ、データの操作が煩雑。

この表データとあの表データを結合して1つの表にしたいんだけど、merge…？concat…？あれ、axisは……あーindexがー……

えー、グルーピングして集計したいけどどのインスタンスにメソッド生えてたっけな…

みんなだいすきSQL

yhat/pandasql

pandasqlDataFrameに対して基本的なクエリ(group-by,join,sub-query等)を実行可能。UDFは使えない。

UDF使えるようにしたプルリク送っても反応がなかった…

メンテされてないっぽい

airtoxin/pysqldf

pysqldf

アイディアの元になったと思われるR

のsqldfのPython版という事で。

基本のSQLは勿論、UDF・UDAF、クエリが発行出来るデータ型を増やした。

unionしたいfrom pysqldf import SQLDF

sqldf = SQLDF(globals()) df1 = {"a": [1,10,100], "b": [2,20,200]} df2 = {"a": [1,11,111], "b": [2,22,222]}

print(sqldf.execute("select * from df1 union select * from df2;")) # [output] # a b # 0 1 2 # 1 10 20 # 2 11 22 # 3 100 200 # 4 111 222 print(sqldf.execute("select * from df1 union all select * from df2;")) # [output] # a b # 0 1 2 # 1 10 20 # 2 100 200 # 3 1 2 # 4 11 22 # 5 111 222

集計したいfrom pysqldf import SQLDF, load_iris

sqldf = SQLDF(globals()) iris = load_iris() print(sqldf.execute(""" select species, avg(sepal_length), avg(sepal_width) from iris group by species ;""" )) # [output] # species avg(sepal_length) avg(sepal_width) # 0 Iris-setosa 5.006 3.418 # 1 Iris-versicolor 5.936 2.770 # 2 Iris-virginica 6.588 2.974

(*^^*)

ipython notebook

web版のインタラクティブシェル。グラフとか生成する場合に、出力した図がページに埋込まれるので見やすい。

操作履歴と出力結果がセットの状態で保存ができ、配布も簡単。

data science stack• ipython

• pandas + pysqldf

• matplotlib

• numpy + scipy

• statsmodels

• scikit-learn + scikit-image

みんなつかってくれ！

おわり

pysqldf

Technology