fantin big data (2)
TRANSCRIPT
![Page 1: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/1.jpg)
1
Corso di programmazione su HadoopScaricare l’ultima release stabile di Hadoop:
http://hadoop.apache.org/common/releases.htmlConfigurazione:
File conf/hadoop-env.shSpecificare le variabili d’ambiente:LINUX: export JAVA_HOME=/usr/local/lib/...
MAC OS: export JAVA_HOME=/Library/Java/Home
Nota: devono sempre essere controllate le impostazioni locali della vostra macchina.
![Page 2: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/2.jpg)
2
Corso di programmazione su HadoopConfigurazione:
File conf/hadoop-env.shOpzionale: Specificare la massima quantità di
memoria assegnabile a Java heap:# The maximum amount of heap to use, in MB.
Default is 1000. export HADOOP_HEAPSIZE=2000
![Page 3: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/3.jpg)
3
Corso di programmazione su Hadoop
Configurazione:File core-site.xmlOpzionale: specificare la directory in cui Hadoop andrà a scrivere l'output temporaneo<property> <name>hadoop.tmp.dir</name>% Sostituire questo valore con la directory specificata <value>/tmp/hadoop-tmp-${user.name}</value> <description>A base for other temporary directories.<description></property>
![Page 4: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/4.jpg)
4
Corso di programmazione su Hadoop
![Page 5: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/5.jpg)
5
Corso di programmazione su Hadoop
Mapper
![Page 6: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/6.jpg)
6
Corso di programmazione su Hadoop
![Page 7: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/7.jpg)
7
Corso di programmazione su Hadoop
![Page 8: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/8.jpg)
8
Corso di programmazione su Hadoop
![Page 9: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/9.jpg)
9
Corso di programmazione su Hadoop
![Page 10: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/10.jpg)
10
Corso di programmazione su Hadoop
![Page 11: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/11.jpg)
11
Corso di programmazione su Hadoop
Esempio word count
![Page 12: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/12.jpg)
12
Corso di programmazione su Hadoop
![Page 13: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/13.jpg)
13
Corso di programmazione su Hadoop
![Page 14: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/14.jpg)
14
Corso di programmazione su Hadoop
![Page 15: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/15.jpg)
15
Corso di programmazione su Hadoop
![Page 16: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/16.jpg)
16
Corso di programmazione su HadoopEsempio word count
package mapred; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper;
![Page 17: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/17.jpg)
17
Corso di programmazione su Hadoop
import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat;
![Page 18: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/18.jpg)
18
Corso di programmazione su Hadooppublic class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private Text word = new Text(); private IntWritable one = new IntWritable(1); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); String [] words = line.split(" "); for(String term: words){ word.set(term); output.collect(word,one); } } }
![Page 19: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/19.jpg)
19
Corso di programmazione su Hadooppublic static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { double value = values.next().get(); sum += value; }
![Page 20: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/20.jpg)
20
Corso di programmazione su HadoopIntWritable sumValue = new IntWritable(sum); output.collect(key, sumValue); } } public static void main(String arg[]) throws IOException, ClassNotFoundException, InterruptedException { if (arg.length != 2) { System.out.println("Usage:"); System.out.println("inputPath outputPath"); System.exit(1); } String inputPath = arg[0]; String outputPath = arg[1]; JobConf conf = new JobConf(WordCount.class);
![Page 21: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/21.jpg)
21
Corso di programmazione su Hadoopconf.setJobName("WordCount");conf.setMapOutputValueClass(IntWritable.class); conf.setMapOutputKeyClass(Text.class); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class);
![Page 22: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/22.jpg)
22
Corso di programmazione su HadoopFileInputFormat.setInputPaths(conf, new Path(inputPath)); FileOutputFormat.setOutputPath(conf, new Path(outputPath)); JobClient.runJob(conf); } }
![Page 23: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/23.jpg)
23
Corso di programmazione su Hadoop
the 26to 15of 14in 9and 9a 9that 7on 7is 7he 6has 6
had 6for 6at 6are 6who 5players 5have 5club 5been 5The 5not 4
Risultato sort: Ashley 4was 3sale 3said 3new 3his 3be3as3….
![Page 24: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/24.jpg)
24
Corso di programmazione su Hadoop
Esempio utilizzando il software GATE
![Page 25: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/25.jpg)
25
Corso di programmazione su Hadoop
![Page 26: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/26.jpg)
26
Corso di programmazione su Hadoop
![Page 27: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/27.jpg)
27
Corso di programmazione su Hadoop
![Page 28: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/28.jpg)
28
Corso di programmazione su Hadoop
![Page 29: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/29.jpg)
29
Corso di programmazione su Hadoop
![Page 30: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/30.jpg)
30
Corso di programmazione su Hadoop
![Page 31: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/31.jpg)
31
Corso di programmazione su Hadoop
![Page 32: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/32.jpg)
32
Corso di programmazione su Hadoop
![Page 33: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/33.jpg)
33
Corso di programmazione su Hadoop
![Page 34: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/34.jpg)
34
Corso di programmazione su Hadoop
![Page 35: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/35.jpg)
35
Corso di programmazione su Hadoop
![Page 36: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/36.jpg)
36
Corso di programmazione su Hadoop
![Page 37: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/37.jpg)
37
Corso di programmazione su Hadoop
![Page 38: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/38.jpg)
38
Corso di programmazione su HadoopEsempio deviazione standard
![Page 39: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/39.jpg)
39
Corso di programmazione su Hadoop
![Page 40: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/40.jpg)
40
Corso di programmazione su Hadoop
![Page 41: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/41.jpg)
41
Corso di programmazione su Hadoop
![Page 42: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/42.jpg)
42
Corso di programmazione su Hadoop
![Page 43: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/43.jpg)
43
Corso di programmazione su Hadoop
![Page 44: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/44.jpg)
44
Corso di programmazione su Hadoop
![Page 45: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/45.jpg)
45
Corso di programmazione su Hadoop
![Page 46: FANTIN BIG DATA (2)](https://reader030.vdocuments.pub/reader030/viewer/2022020213/58a97ef81a28ab0a0a8b65a7/html5/thumbnails/46.jpg)
46
Corso di programmazione su Hadoop
Continua su:http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/