workshop 20140522 bigquery implementation
DESCRIPTION
The BigQuery starter guide for load data using CSV or JSON format. And the query guide...TRANSCRIPT
MiTAC MiCloud - Google Cloud Platform Partner @ APAC2014Q2 BigQuery Workshop
Google BigQuery Big data with SQL like query feature, but fast...
Google BigQueryGoogle BigQueryhttp://goo.gl/XZmqgN
RESTful
GCE LB
前言:
● 我們要實作喔~ 有興趣的
朋友,請打開您的電腦...
● 開好GCP專案?
● Enable Billing了?
● 裝好google_cloud_sdk?
● 這裡的無線AP:
○ 帳號:
○ 密碼:
Data Access
Big Data Access
Frontend Services
Backend Services
BigQuery它是...
● TB level data analysis● Fast mining response● SQL like query language● Multi-dataset interactive
support● Cheap and pay by use● Offline job support
Getting Start
BigQuery Web UI
https://bigquery.cloud.google.com/
BigQuery structure● Project● Dataset● Table● Job
Handson - Import
The easily way - Import Wizard
JCMB_2014.csv Schema
date_time:String,atmospheric_pressure:float,rainfall:float,wind_speed:float,wind_direction:float,surface_temperature:float,relative_humidity:float,solar_flux:float,battery:float
Load Data to BigQuery in CMD
CSV / JSON Cloud Storage BigQuery
Load CSV to BigQuerygsutil cp [source] gs://[bucket-name]# gsutil cp ~/Desktop/log.csv gs://your-bucket/Copying file:///Users/simonsu/Desktop/log.csv [Content-Type=text/csv]...Uploading: 4.59 MB/36.76 MB
bq load [project]:[dataset].[table] gs://[bucket]/[csv path] [schema]# bq load project.dataset gs://your-bucket/log.csv IP:STRING,DNS:STRING,TS:STRING,URL:STRING
Waiting on bqjob_rf4f3f1d9e2366a6_00000142c1bdd36f_1 ... (24s) Current status: DONE
Load JSON to BigQuerybq load --source_format NEWLINE_DELIMITED_JSON \ [project]:[dataset].[table] [json file] [schema file]
# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest ./sample.json ./schema.json
Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE
# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest gs://your-bucket/sample.json ./schema.
json
Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE
Handson - Query
Web way - Query Console
Install google_cloud_sdk (https://developers.google.com/cloud/sdk/)
Shell way - bq commad
Shell way - bq commad
bq query <sql_query># bq query 'select charge_unit,charge_desc,one_charge from testbq.test'
BigQuery - Query Language
Query syntax● SELECT● WITHIN● FROM● FLATTEN● JOIN● WHERE● GROUP BY● HAVING● ORDER BY● LIMIT
Query supportSupported functions and operators
● Aggregate functions● Arithmetic operators● Bitwise operators● Casting functions● Comparison functions● Date and time functions● IP functions● JSON functions● Logical operators● Mathematical functions● Regular expression functions● String functions● Table wildcard functions● URL functions● Window functions● Other functions
select charge_unit,charge_desc,one_charge from testbq.test
Select
+-----------------+----------------+--------------------+| charge_unit | charge_desc | one_charge |+-----------------+----------------+--------------------+| M | 按月計費 |0 || D | 按日計費 |0 || HH | 小時計費 |0 || T | 分計費 |0 || SS | 按次計費 |1 | +-----------------+----------------+--------------------+
SELECT a.order_id,a.sales,b.begin_use_date FROM testbq.order_master a LEFT JOIN testbq.order_detail b ON a.order_id = b.order_id
Join
+-----------------+----------------+-----------------------------+| a_order_id | a_sales | b_begin_use_date |+-----------------+----------------+-----------------------------+| OM2003 | D589 | 2011-11-01 17:43:00 UTC | | OM2004 | D589 | 2011-11-01 09:43:00 UTC || OM2005 | D589 | 2011-11-01 17:55:00 UTC || OM2006 | D589 | 2011-11-01 17:54:00 UTC || OM2007 | D589 | 2011-11-03 16:31:00 UTC |+-----------------+----------------+-----------------------------+
SELECT
fullName,
age,
gender,
citiesLived.place
FROM (FLATTEN([dataset.tableId], children))
WHERE
(citiesLived.yearsLived > 1995) AND
(children.age > 3)
GROUP BY fullName, age, gender, citiesLived.place
Flatten
+------------+-----+--------+--------------------+
| fullName | age | gender | citiesLived_place |
+------------+-----+--------+--------------------+
| John Doe | 22 | Male | Stockholm |
| Mike Jones | 35 | Male | Los Angeles |
| Mike Jones | 35 | Male | Washington DC |
| Mike Jones | 35 | Male | Portland |
| Mike Jones | 35 | Male | Austin |
+------------+-----+--------+---------------------+
SELECT word, COUNT(word) AS countFROM publicdata:samples.shakespeareWHERE (REGEXP_MATCH(word,r'\w\w\'\w\w'))GROUP BY wordORDER BY count DESCLIMIT 3;
Regular Expression
+-----------------+----------------+| word | count |+-----------------+----------------+| ne'er | 42 || we'll | 35 || We'll | 33 |+-----------------+----------------+
SELECT TOP (FORMAT_UTC_USEC(timestamp * 1000000), 5) AS top_revision_time, COUNT (*) AS revision_countFROM [publicdata:samples.wikipedia];
+----------------------------+----------------+| top_revision_time | revision_count |+----------------------------+----------------+| 2002-02-25 15:51:15.000000 | 20971 || 2002-02-25 15:43:11.000000 | 15955 || 2010-01-14 15:52:34.000000 | 3 || 2009-12-31 19:29:19.000000 | 3 || 2009-12-28 18:55:12.000000 | 3 |+----------------------------+----------------+
Time Function
SELECT DOMAIN(repository_homepage) AS user_domain, COUNT(*) AS activity_countFROM [publicdata:samples.github_timeline]GROUP BY user_domainHAVING user_domain IS NOT NULL AND user_domain != ''ORDER BY activity_count DESCLIMIT 5;
IP Function
+-----------------+----------------+| user_domain | activity_count |+-----------------+----------------+| github.com | 281879 || google.com | 34769 || khanacademy.org | 17316 || sourceforge.net | 15103 || mozilla.org | 14091 |+-----------------+----------------+
Handson - Programming
● Prepare a Google Cloud Platform project● Create a Service Account● Generate key from Service Account p12 key
Prepare
Google Service Account
web server applictionservice account
v.s.
Prepare Authentications
p12 key → pem key轉換$ openssl pkcs12 -in privatekey.p12 -out privatekey.pem -nocerts $ openssl rsa -in privatekey.pem -out key.pem
Node.js - bigquery模組
var bq = require('bigquery') , prjId = 'your-bigquery-project-id';
bq.init({ client_secret: '/path/to/client_secret.json', key_pem: '/path/to/key.pem'});
bq.job.listds(prjId, function(e,r,d){ if(e) console.log(e); console.log(JSON.stringify(d));}); 操作時,透過bq呼叫job之下的
function做操作
bigquery模組可參考:https://github.com/peihsinsu/bigquery
/* Ref: https://developers.google.com/apps-script/advanced/bigquery */var request = { query: 'SELECT TOP(word, 30) AS word, COUNT(*) AS word_count ' + 'FROM publicdata:samples.shakespeare WHERE LENGTH(word) > 10;' };var queryResults = BigQuery.Jobs.query(request, projectId);var jobId = queryResults.jobReference.jobId;queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId);var rows = queryResults.rows;while (queryResults.pageToken) { queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId, { pageToken: queryResults.pageToken }); rows = rows.concat(queryResults.rows);}
Google Drive way - Apps Script
● Features: https://cloud.google.com/products/bigquery#features● Case Studies: https://cloud.google.com/products/bigquery#case-
studies● Pricing: https://cloud.google.com/products/bigquery#pricing● Documentation: https://cloud.google.
com/products/bigquery#documentation● Query Reference: https://developers.google.com/bigquery/query-
reference
References