+ 應用實務 ( 一 ) + 應用實務 ( 一 ) v 0.20. 2 使用 hadoop 的例子( 1 )...
TRANSCRIPT
++ 應用實務應用實務 (( 一一 ))
V 0.20V 0.20
2
使用 使用 Hadoop Hadoop 的例子(的例子( 11 )) Freestylers - Image retrieval engine
用 Hadoop 影像處理 Hosting Habitat
取得所有 clients 的軟體資訊 分析並告知 clients 未安裝或未更新的軟體
Journey Dynamics 用 Hadoop MapReduce 分析 billions of lines of GPS
data 並產生交通路線資訊 .
Krugle 用 Hadoop and Nutch 建構 原始碼搜尋引擎
3
使用 使用 Hadoop Hadoop 的例子(的例子( 22 )) SEDNS - Security Enhanced DNS Group
收集全世界的 DNS 以探索網路分散式內容 .
Technical analysis and Stock Research 分析股票資訊
University of Nebraska Lincoln, Research Computing Facility 用 Hadoop 跑約 200TB 的 CMS 經驗分析 緊湊渺子線圈( CMS , Compact Muon Solenoid )為瑞士歐洲核子研究組織 CERN 的大型強子對撞器計劃的兩大通用型粒子偵測器中的一個。
4
使用 使用 Hadoop Hadoop 的例子(的例子( 33 )) Yahoo!
Used to support research for Ad Systems and Web Search 使用 Hadoop 平台來發現發送垃圾郵件的殭屍網絡
趨勢科技 過濾像是釣魚網站或惡意連結的網頁內容
5
Hadoop Hadoop 提供提供 Automatic parallelization & distribution Fault-tolerance Status and monitoring tools A clean abstraction and API for
programmers
6
Hadoop Hadoop 整體運算示意圖整體運算示意圖
7
HadoopHadoop 程式設計程式設計I/O I/O 操作程式操作程式
MapReduceMapReduce 運算程式運算程式HadoopHadoop 編譯與運作方法編譯與運作方法
結論結論
8
HadoopHadoop 程式設計程式設計
Hadoop Hadoop 的的基本 基本 I/O I/O 操作程式操作程式
9
傳送檔案至傳送檔案至 HDFSHDFS
public class PutToHdfs { static boolean putToHdfs(String src, String dst, Configuration conf) { Path dstPath = new Path(dst); try { // 產生操作 hdfs 的物件 FileSystem hdfs = dstPath.getFileSystem(conf); // 上傳 hdfs.copyFromLocalFile(false, new Path(src),new Path(dst)); } catch (IOException e) { e.printStackTrace(); return false; } return true; }
// 將檔案從 local上傳到 hdfs , src 為 local 的來源 , dst 為 hdfs 的目的端
10
從從 HDFSHDFS 取回檔案取回檔案
public class GetFromHdfs {static boolean getFromHdfs(String src,String dst, Configuration conf)
{ Path dstPath = new Path(src); try { // 產生操作 hdfs 的物件 FileSystem hdfs = dstPath.getFileSystem(conf); // 下載 hdfs.copyToLocalFile(false, new Path(src),new Path(dst)); } catch (IOException e) { e.printStackTrace(); return false; } return true; }
// 將檔案從 hdfs下載回 local, src 為 hdfs的來源 , dst 為 local 的目的端
11
檢查與刪除檔案檢查與刪除檔案
public class CheckAndDelete { static boolean checkAndDelete(final String path, Configuration conf) { Path dst_path = new Path(path); try { // 產生操作 hdfs 的物件 FileSystem hdfs = dst_path.getFileSystem(conf); // 檢查是否存在 if (hdfs.exists(dst_path)) { // 有則刪除 hdfs.delete(dst_path, true); } } catch (IOException e) { e.printStackTrace(); return false; } return true; }
// checkAndDelete函式,檢查是否存在該資料夾,若有則刪除之
12
HadoopHadoop 程式設計程式設計
MapReduceMapReduce運算程式運算程式
13
Class MR{static public Class Mapper …{ } static public Class Reducer …{ }main(){
Configuration conf = new Configuration();Job job = new Job(conf, “job name");job.setJarByClass(thisMainClass.class);job.setMapperClass(Mapper.class);job.setReduceClass(Reducer.class);
FileInputFormat.addInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);}}
Program Prototype (v 0.20)Program Prototype (v 0.20)
Map 程式碼
Reduce 程式碼
其他的設定參數程式碼
Map 區
Reduce 區
設定區
14
Class Mapper (v 0.20)Class Mapper (v 0.20)
class MyMap extends Mapper < , , , > {// 全域變數區public void map ( key, value,
Context context )throws IOException,InterruptedException {// 區域變數與程式邏輯區context.write( NewKey, NewValue);}
}
1
234
56789
INPUT KEYClass
OUTPUT VALUE
Class
OUTPUT KEYClass
INPUT VALUE
Class
INPUT VALUE
Class
INPUT KEYClass
import org.apache.hadoop.mapreduce.Mapper;
15
Class Reducer (v 0.20)Class Reducer (v 0.20)
class MyRed extends Reducer < , , , > {// 全域變數區public void reduce ( key, Iterable< > values,
Context context) throws IOException,InterruptedException
{// 區域變數與程式邏輯區context.write( NewKey, NewValue);}
}
1
234
56789
INPUT KEYClass
OUTPUT VALUE
Class
OUTPUT KEYClass
INPUT VALUE
Class
INPUT KEYClass
INPUT VALUE
Class
import org.apache.hadoop.mapreduce.Reducer;
16
public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: hadoop jar WordCount.jar <input> <output>"); System.exit(2); } Job job = new Job(conf, "Word Count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); CheckAndDelete.checkAndDelete(args[1], conf); System.exit(job.waitForCompletion(true) ? 0 : 1); }
範例 範例 (1)(1)
17
class TokenizerMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map( LongWritable key, Text value, Context context)
throws IOException , InterruptedException {String line = ((Text) value).toString();StringTokenizer itr = new StringTokenizer(line);while (itr.hasMoreTokens()) {
word.set(itr.nextToken());context.write(word, one);
}}}
1
234
56789
範例 範例 (2)(2)
Input key
Input value
………………….…………………No news is a good news.…………………
/user/hadooper/input/a.txt
itr
< no , 1 >
< news , 1 >
< is , 1 >
< a, 1 >
< good , 1 >
< news, 1 >
<word,one>
line
no news newsis gooda
itr itr itr itr itr itr
18
class IntSumReducer extends Reducer< Text, IntWritable, Text, IntWritable> {
IntWritable result = new IntWritable();public void reduce( Text key, Iterable <IntWritable> values, Context context)throws IOException, InterruptedException {
int sum = 0;for ( IntWritable val : values )
sum += val.get(); result.set(sum);context.write ( key, result);
}}
12 3
45678
範例 範例 (3)(3)
< news , 2 >
<key,SunValue>news
1 1
< a, 1 >
< good, 1 >
< is, 1 >
< news, 11 >
< no, 1 >
for ( int i ; i < values.length ; i ++ ){ sum += values[i].get() }
19
HadoopHadoop 編譯與運作方法編譯與運作方法1. 編輯 WordCount.java
http://trac.nchc.org.tw/cloud/attachment/wiki/jazz/Hadoop_Lab6/WordCount.java?format=raw
2. mkdir MyJava3. javac -classpath hadoop-*-core.jar -d MyJava
WordCount.java
4. jar -cvf wordcount.jar -C MyJava .5. bin/hadoop jar wordcount.jar WordCount [ 程式參數 ]
•所在的執行目錄為 Hadoop_Home(因為 hadoop-*-core.jar )•javac編譯時需要 classpath, 但 hadoop jar時不用•wordcount.jar = 封裝後的編譯檔,但執行時需告知 class name
•Hadoop進行運算時,只有 input 檔要放到 hdfs上,以便hadoop分析運算;執行檔( wordcount.jar)不需上傳,也不需每個 node都放,程式的載入交由 java處理
20
ConclusionsConclusions
以上範例程式碼包含 Hadoop 的 key,value 架構 操作 Hdfs 檔案系統 Map Reduce 運算方式
執行 hadoop 運算時,程式檔不用上傳至hadoop 上,但資料需要再 HDFS 內
可運用範例七的程式達成連續運算 Hadoop 0.20 與 Hadoop 0.18 有些 API 有些許差異,盡可能完全改寫
21
HBase HBase 介紹介紹此篇介紹此篇介紹 HBase HBase 概念概念
22
HBaseHBase
HBase 是具有以下特點的儲存系統: 類似表格的資料結構 (Multi-Dimensional Map) 分散式 高可用性、高效能 很容易擴充容量及效能
Hbase 適用於利用數以千計的一般伺服器上,來儲存 Petabytes 級的資料。
HBase 以 Hadoop 分散式檔案系統 (HDFS) 為基礎, 提供類 Bigtable 功能
Hbase 同時提供 Hadoop MapReduce 程式設計。
23
為什麼使用為什麼使用 HBase?HBase? 不是關聯式 (Relational) 資料庫系統
表格 (Table) 只有一個主要索引 (primary index) 即 row key. 不提供 join 不提供 SQL 語法。
提供 Java 函式庫 , 與 REST 與 Thrift 等介面。 提供 getRow(), Scan() 存取資料。
getRow() 可以取得一筆 row range 的資料,同時也可以指定版本 (timestamp) 。
Scan() 可以取得整個表格的資料或是一組 row range ( 設定 start key, end key)
有限的單元性 (Aatomicity) 與交易 (transaction) 功能 . 只有一種資料型態 (bytes)
24
為什麼使用為什麼使用 HBase?HBase? 關聯式資料庫 (Relational Database) 適用做資料異動
C.R.U.D (create, retrieval, update, delete) 等操作,主要因為這動作在記憶體中進行
對於超大量的資料分析,資料分散在多個節點的情況下,關聯式資料庫系統就不適用了
現有可於大量資料的分析的聯式資料庫軟體價格昂貴,且仍只適用在特定用途。
這裏所謂的大量資料分析是指 Big queries – 整個資料表的存取 Big databases - 100 Terabytes 以上的資料
Bigtable 的可以配合MapReduce框架,進行複雜的分析與查詢。
25
誰使用 誰使用 HBase HBase Adobe
內部使用 (Structure data)
Kalooga 圖片搜尋引擎 http://www.kalooga.com/
Meetup 社群聚會網站 http://www.meetup.com/
Streamy 成功從 MySQL 移轉到 Hbase http://www.streamy.com/
Trend Micro 雲端掃毒架構 http://trendmicro.com/
Yahoo! 儲存文件 fingerprint 避免重複 http://www.yahoo.com/
More http://wiki.apache.org/hadoop/Hbase/PoweredBy
26
Data ModelData Model Table依 row key 來自動排序 Table schema 只要定義 column families .
每個 column family 可有無限數量的 columns 每個 column 的值可有無限數量的時間版本 (timestamp) Column 可以動態新增,每個 row 可有不同數量的 columns 。 同一個 column family 的 columns會群聚在一個實體儲存單元上,且依 column 的名稱排序。
byte[] 是唯一的資料型態 (Row, Family: Column, Timestamp) Value
Row key
Column Family
valueTimeStamp
27
Data ModelData Model
HBase實際上儲存 Table 時,是以 column family 為單位來存放
28
HTable HTable 成員成員
Contents Department
news bid sport
t1com.yahoo.news.t
w
“我研發水下 6 千公尺機器人”
“tech”
t2 “蚊子怎麼搜尋人肉” “tech”
t3 “用腦波「發聲」 ” “tech”
t1com.yahoo.bid.tw
“… ipad …” “ 3C ”
t1com.yahoo.sport.t
w “… Wang 40…” “MBA”
Table, Family, Column, Qualifier , Row, TimeStamp
29
RegionsRegions 表格是由一或多個 region 所構成
Region 是由其 startKey 與 endKey 所指定 每個 region 可能會存在於多個不同節點上,而且是由
數個 HDFS 檔案與區塊所構成,這類 region 是由 Hadoop 負責複製
30
HBase HBase 與 與 Hadoop Hadoop 搭配的架構搭配的架構
31
HBase HBase 程式設計程式設計常用的常用的 HBase API HBase API 說明說明
實做 實做 I/O I/O 操作操作搭配搭配 Map Reduce Map Reduce 運算運算
3232
1. Java 1. Java 之編譯與執行之編譯與執行1. 將 hbase_home 目錄內的 .jar 檔全部拷貝至
hadoop_home/lib/ 資料夾內2. 編譯
javac Δ -classpath Δ hadoop-*-core.jar:hbase-*.jar Δ -d Δ
MyJava Δ MyCode.java
3. 封裝 jar Δ -cvf Δ MyJar.jar Δ -C Δ MyJava Δ .
4. 執行 bin/hadoop Δ jar Δ MyJar.jar Δ MyCode Δ {Input/ Δ Output/ }
•所在的執行目錄為 Hadoop_Home
•./MyJava = 編譯後程式碼目錄•Myjar.jar = 封裝後的編譯檔
•先放些文件檔到 HDFS上的 input目錄•./input; ./ouput 不一定為 hdfs的輸入、輸出目錄
33
HTable HTable 成員成員
Contents Department
news bid sport
t1com.yahoo.news.t
w
“我研發水下 6 千公尺機器人”
“tech”
t2 “蚊子怎麼搜尋人肉” “tech”
t3 “用腦波「發聲」 ” “tech”
t1com.yahoo.bid.tw
“… ipad …” “ 3C ”
t1com.yahoo.sport.t
w “… Wang 40…” “MBA”
Table, Family, Column, Qualifier , Row, TimeStamp
34
HBase HBase 常用函式常用函式 HBaseAdmin HBaseConfiguration HTable HTableDescriptor Put Get Scanner
Database
Table
Family
Column Qualifier
35
HBaseConfigurationHBaseConfiguration Adds HBase configuration files to a
Configuration = new HBaseConfiguration ( ) = new HBaseConfiguration (Configuration
c) 繼承自
org.apache.hadoop.conf.Configuration
回傳值 函數 參數void addResource (Path file)void clear ()String get (String name)String getBoolean (String name, boolean defaultValue )void set (String name, String value)void setBoolean (String name, boolean value)
<property> <name> name </name> <value> value </value></property>
363636
HBaseAdmin HBaseAdmin HBase 的管理介面
= new HBaseAdmin( HBaseConfiguration conf ) Ex:
回傳值 函數 參數
void
addColumn(String tableName, HColumnDescriptor column)
checkHBaseAvailable
(HBaseConfiguration conf)
createTable (HTableDescriptor desc)deleteTable (byte[] tableName)deleteColumn (String tableName, String columnName)enableTable (byte[] tableName)disableTable (String tableName)
HTableDescriptor[] listTables ()void modifyTable (byte[] tableName, HTableDescriptor htd)boolean tableExists (String tableName)
HBaseAdmin admin = new HBaseAdmin(config);admin.disableTable (“tablename”);
37
HTableDescriptorHTableDescriptor HTableDescriptor contains the name of an HTable, and its column
families. = new HTableDescriptor() = new HTableDescriptor(String name)
Constant-values org.apache.hadoop.hbase.HTableDescriptor.TABLE_DESCRIPTOR_VER
SION Ex:
回傳值 函數 參數void addFamily (HColumnDescriptor family) HColumnDescriptor
removeFamily (byte[] column)
byte[] getName ( ) = Table namebyte[] getValue (byte[] key) = 對應 key 的 valuevoid setValue (String key, String value)
HTableDescriptor htd = new HTableDescriptor(tablename);
htd.addFamily ( new HColumnDescriptor (“Family”));
38
HColumnDescriptorHColumnDescriptor An HColumnDescriptor contains information about a column family
= new HColumnDescriptor(String familyname) Constant-values
org.apache.hadoop.hbase.HTableDescriptor.TABLE_DESCRIPTOR_VERSION
Ex:
回傳值 函數 參數byte[] getName ( ) = Family namebyte[] getValue (byte[] key) = 對應 key 的 valuevoid setValue (String key, String value)
HTableDescriptor htd = new HTableDescriptor(tablename);HColumnDescriptor col = new HColumnDescriptor("content:");htd.addFamily(col);
393939
HTableHTable Used to communicate with a single HBase table.
= new HTable(HBaseConfiguration conf, String tableName) Ex:
回傳值 函數 參數
void checkAndPut(byte[] row, byte[] family, byte[] qualifier, byte[] value, Put put)
void close ()boolean exists (Get get)Result get (Get get)byte[][] getEndKeys ()ResultScanner getScanner (byte[] family)HTableDescriptor
getTableDescriptor ()
byte[] getTableName ()static boolean isTableEnabled (HBaseConfiguration conf, String tableName)void put (Put put)
HTable table = new HTable (conf, Bytes.toBytes ( tablename ));ResultScanner scanner = table.getScanner ( family );
40
PutPut Used to perform Put operations for a single row.
= new Put(byte[] row) = new Put(byte[] row, RowLock rowLock)
Ex:
Put add (byte[] family, byte[] qualifier, byte[] value)Put add (byte[] column, long ts, byte[] value)byte[] getRow ()RowLock getRowLock ()long getTimeStamp ()boolean isEmpty ()Put setTimeStamp (long timestamp)
HTable table = new HTable (conf, Bytes.toBytes ( tablename ));Put p = new Put ( brow );p.add (family, qualifier, value);table.put ( p );
414141
GetGet Used to perform Get operations on a single row.
= new Get (byte[] row) = new Get (byte[] row, RowLock rowLock)
Ex:
Get addColumn (byte[] column)Get addColumn (byte[] family, byte[] qualifier)Get addColumns (byte[][] columns)Get addFamily (byte[] family)TimeRange getTimeRange ()Get setTimeRange (long minStamp, long maxStamp)Get setFilter (Filter filter)
HTable table = new HTable(conf, Bytes.toBytes(tablename));Get g = new Get(Bytes.toBytes(row));
42
ResultResult Single row result of a Get or Scan query.
= new Result()
Ex:
boolean containsColumn (byte[] family, byte[] qualifier)NavigableMap <byte[],byte[]> getFamilyMap (byte[] family)
byte[] getValue (byte[] column)byte[] getValue (byte[] family, byte[] qualifier)int Size ()
HTable table = new HTable(conf, Bytes.toBytes(tablename));Get g = new Get(Bytes.toBytes(row));Result rowResult = table.get(g);Bytes[] ret = rowResult.getValue( (family + ":"+ column ) );
43
ScannerScanner All operations are identical to Get
Rather than specifying a single row, an optional startRow and stopRow may be defined.
If rows are not specified, the Scanner will iterate over all rows. = new Scan () = new Scan (byte[] startRow, byte[] stopRow) = new Scan (byte[] startRow, Filter filter)
Get addColumn (byte[] column)Get addColumn (byte[] family, byte[] qualifier)Get addColumns (byte[][] columns)Get addFamily (byte[] family)TimeRange getTimeRange ()Get setTimeRange (long minStamp, long maxStamp)Get setFilter (Filter filter)
44
Interface ResultScannerInterface ResultScanner Interface for client-side scanning. Go to HTable to
obtain instances. HTable.getScanner (Bytes.toBytes(family));
Ex:
void close ()Result next ()
ResultScanner scanner = table.getScanner (Bytes.toBytes(family));for (Result rowResult : scanner) {
Bytes[] str = rowResult.getValue ( family , column );}
45
HBase Key/Value HBase Key/Value 的格式的格式 org.apache.hadoop.hbase.KeyValue getRow(), getFamily(), getQualifier(), getTimestamp(), and
getValue(). The KeyValue blob format inside the byte array is:
Key 的格式 :
Rowlength 最大值為 Short.MAX_SIZE, column family length 最大值為 Byte.MAX_SIZE, column qualifier + key length 必須小於 Integer.MAX_SIZE.
<keylength> <valuelength> <key> <value>
< row- length >
< row>< column-
family-length >
< column-family >
< column-qualifier >
< time-stamp >
< key-type >
46
HBase HBase 程式設計程式設計
實做實做 I/OI/O 操作操作
47
範例一:新增範例一:新增 TableTable
$ hbase shell> create ‘tablename', ‘family1', 'family2', 'family3‘0 row(s) in 4.0810 seconds> Listtablename1 row(s) in 0.0190 seconds
< 指令 >
create < 表名 >, {<family>, ….}
48
範例一:新增範例一:新增 TableTable public static void createHBaseTable ( String tablename, String
familyname ) throws IOException
{
HBaseConfiguration config = new HBaseConfiguration();
HBaseAdmin admin = new HBaseAdmin(config);
HTableDescriptor htd = new HTableDescriptor( tablename );
HColumnDescriptor col = new HColumnDescriptor( familyname );
htd.addFamily ( col );
if( admin.tableExists(tablename))
{ return () }
admin.createTable(htd);
}
< 程式碼>
49
範例二:範例二: PutPut 資料進資料進 ColumnColumn< 指令
>
> put 'tablename','row1', 'family1:qua1', 'value'
0 row(s) in 0.0030 seconds
put ‘ 表名’ , ‘ 列’ , ‘column’, ‘值’ , [‘ 時間’ ]
50
範例二: 範例二: PutPut 資料進資料進 ColumnColumnstatic public void putData(String tablename, String row, String
family,String column, String value) throws IOException {
HBaseConfiguration config = new HBaseConfiguration();HTable table = new HTable(config, tablename);byte[] brow = Bytes.toBytes(row);byte[] bfamily = Bytes.toBytes(family);byte[] bcolumn = Bytes.toBytes(column);byte[] bvalue = Bytes.toBytes(value);Put p = new Put(brow);p.add(bfamily, bcolumn, bvalue);table.put(p);table.close();
}
< 程式碼>
51
範例三: 範例三: Get ColumnGet Column ValueValue< 指令
>
> get 'tablename', 'row1' COLUMN CELL family1:column1 timestamp=1265169495385, value=value1 row(s) in 0.0100 seconds
get ‘ 表名’ , ‘列’
52
範例三: 範例三: Get ColumnGet Column ValueValueString getColumn ( String tablename, String row, Strin
g family, String column ) throws IOException {HBaseConfiguration conf = new HBaseConfiguration();HTable table;table = new HTable( conf, Bytes.toBytes( tablename));Get g = new Get(Bytes.toBytes(row));Result rowResult = table.get(g);return Bytes.toString( rowResult.getValue ( Bytes.toBytes (family + “:” + column)));}
< 程式碼>
53
範例四: 範例四: Scan all ColumnScan all Column
> scan 'tablename'ROW COLUMN+CELL row1 column=family1:column1, timestamp=1265169415385, value=value1row2 column=family1:column1, timestamp=1263534411333, value=value2row3 column=family1:column1, timestamp=1263645465388, value=value3row4 column=family1:column1, timestamp=1264654615301, value=value4row5 column=family1:column1, timestamp=1265146569567, value=value55 row(s) in 0.0100 seconds
< 指令>
scan ‘ 表名’
54
範例四:範例四: Scan all ColumnScan all Columnstatic void ScanColumn(String tablename, String family, String
column) throws IOException {HBaseConfiguration conf = new HBaseConfiguration();HTable table = new HTable ( conf,
Bytes.toBytes(tablename));ResultScanner scanner = table.getScanner(
Bytes.toBytes(family));int i = 1;for (Result rowResult : scanner) {
byte[] by = rowResult.getValue( Bytes.toBytes(family),
Bytes.toBytes(column) );String str = Bytes.toString ( by );System.out.println("row " + i + " is \"" + str +"\"");i++;
}}}
< 程式碼>
55
範例五: 刪除資料表範例五: 刪除資料表
> disable 'tablename'0 row(s) in 6.0890 seconds> drop 'tablename' 0 row(s) in 0.0090 seconds0 row(s) in 0.0090 seconds0 row(s) in 0.0710 seconds
< 指令>
disable ‘ 表名’ drop ‘ 表名’
56
範例五: 刪除資料表範例五: 刪除資料表static void drop ( String tablename ) throws IOExceptions {
HBaseConfiguration conf = new HBaseConfiguration();HBaseAdmin admin = new HBaseAdmin (conf);if (admin.tableExists(tablename)) {
admin.disableTable(tablename);admin.deleteTable(tablename);
}else{System.out.println(" [" + tablename+ "] not
found!"); }}
< 程式碼>
57
HBase HBase 程式設計程式設計
MapReduceMapReduce 與與HBaseHBase 的搭配的搭配
58
範例六:範例六: WordCountHBaseWordCountHBase說明:
此程式碼將輸入路徑的檔案內的字串取出做字數統計 再將結果塞回 HTable 內運算方法:
將此程式運作在 hadoop 0.20 平台上,用 ( 參考 2) 的方法加入 hbase 參數後,將此程式碼打包成 XX.jar
結果:> scan 'wordcount'
ROW COLUMN+CELL am column=content:count, timestamp=1264406245488, value=1
chen column=content:count, timestamp=1264406245488, value=1 hi, column=content:count, timestamp=1264406245488,
value=2注意:1. 在 hdfs 上來源檔案的路徑為 "/user/$YOUR_NAME/input"
請注意必須先放資料到此 hdfs 上的資料夾內,且此資料夾內只能放檔案,不可再放資料夾
2. 運算完後,程式將執行結果放在 hbase 的 wordcount 資料表內
595959
範例六:範例六: WordCountHBaseWordCountHBasepublic class WordCountHBase{ public static class Map extends
Mapper<LongWritable,Text,Text, IntWritable>
{ private IntWritable i = new
IntWritable(1); public void map(LongWritable
key,Text value,Context context) throws IOException, InterruptedException
{ String s[] =
value.toString().trim().split(" "); for( String m : s) { context.write(new Text(m), i);}}}
public static class Reduce extends TableReducer<Text, IntWritable, NullWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0; for(IntWritable i : values) { sum += i.get(); }
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes("content"), Bytes.toBytes("count"), Bytes.toBytes(String.valueOf(sum)));
context.write(NullWritable.get(), put);
} }
<1>
60
範例六:範例六: WordCountHBaseWordCountHBase public static void
createHBaseTable(String tablename)throws IOException
{ HTableDescriptor htd = new
HTableDescriptor(tablename); HColumnDescriptor col = new
HColumnDescriptor("content:"); htd.addFamily(col); HBaseConfiguration config = new
HBaseConfiguration(); HBaseAdmin admin = new
HBaseAdmin(config); if(admin.tableExists(tablename)) { admin.disableTable(tablename); admin.deleteTable(tablename); } System.out.println("create new table:
" + tablename); admin.createTable(htd); }
public static void main(String args[]) throws Exception
{
String tablename = "wordcount";
Configuration conf = new Configuration();
conf.set(TableOutputFormat.OUTPUT_TABLE, tablename);
createHBaseTable(tablename);
String input = args[0];
Job job = new Job(conf, "WordCount " + input);
job.setJarByClass(WordCountHBase.class);
job.setNumReduceTasks(3);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TableOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(input));
System.exit(job.waitForCompletion(true)?0:1);
}}
<2>
616161
範例七:範例七: LoadHBaseMapper LoadHBaseMapper 說明:
此程式碼將 HBase 的資料取出來,再將結果塞回 hdfs 上運算方法:
將此程式運作在 hadoop 0.20 平台上,用 ( 參考 2) 的方法加入 hbase 參數後,將此程式碼打包成 XX.jar
結果: $ hadoop fs -cat <hdfs_output>/part-r-00000 ---------------------------
54 30 31 GunLong54 30 32 Esing54 30 33 SunDon54 30 34 StarBucks
---------------------------注意:1. 請注意 hbase 上必須要有 table, 並且已經有資料2. 運算完後,程式將執行結果放在你指定 hdfs 的 <hdfs_output> 內 請注意 沒有 <hdfs_output> 資料夾
62
範例七:範例七: LoadHBaseMapperLoadHBaseMapperpublic class LoadHBaseMapper {public static class HtMap extends
TableMapper<Text, Text> {public void
map(ImmutableBytesWritable key, Result value,
Context context) throws IOException, InterruptedException {String res = Bytes.toString(value.getValue(Bytes.toBytes("Detail"),
Bytes.toBytes("Name")));context.write(new Text(key.toString()), new Text(res));
}}
public static class HtReduce extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String str = new String("");
Text final_key = new Text(key);
Text final_value = new Text();
for (Text tmp : values) {
str += tmp.toString(); }
final_value.set(str);
context.write(final_key, final_value);
}}
<1>
63
範例七: 範例七: LoadHBaseMapperLoadHBaseMapperpublic static void main(String
args[]) throws Exception {String input = args[0];String tablename = "tsmc";Configuration conf = new
Configuration();Job job = new Job (conf,
tablename + " hbase data to hdfs");
job.setJarByClass (LoadHBaseMapper.class);
TableMapReduceUtil. initTableMapperJob
(tablename, myScan, HtMap.class,Text.class, Text.class, job);
job.setMapperClass (HtMap.class);
job.setReducerClass (HtReduce.class);
job.setMapOutputKeyClass (Text.class);
job.setMapOutputValueClass (Text.class);
job.setInputFormatClass ( TableInputFormat.class);
job.setOutputFormatClass ( TextOutputFormat.class);
job.setOutputKeyClass( Text.class);
job.setOutputValueClass( Text.class);
FileOutputFormat.setOutputPath ( job, new Path(input));
System.exit (job.waitForCompletion (true) ? 0 : 1);
}}
<2>