hadoop-1.0.1 설치 및 테스트 맛보기
ITWeb/Hadoop일반 2012. 2. 29. 15:26[참고사이트]
http://blog.softwaregeeks.org/archives/category/develop/hadoop
hadoop-1.0.1 설치 및 테스트를 해보자.. (Single Machine 설정 입니다.)
기존에 0.20.X 버전과 설치 방법과 테스트 방법은 동일 합니다.
먼저 필요한 파일들을 다운 받아야 겠죠.
- JDK : http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u31-download-1501634.html
- Hadoop : http://mirror.apache-kr.org//hadoop/common/hadoop-1.0.1/
- 파일을 다운로드 받으셔서 압축을 해제 하신 후 아래 폴더 구조에 맞게 넣어 주시면 됩니다.
기본 디렉토리 구조
- 저는 그냥 일반 계정에 설치를 하였습니다.
- 일반적으로는 hadoop 계정을 만드셔서 해당 계정에 설치를 하시면 될 것 같습니다.
환경설정하기
- bash 설정입니다.
- vi .bash_profile 또는 .bashrc 에 추가 합니다.
샘플 실행하기
- 0.20.X 와 같이 example jar 파일이 있습니다.
- hadoop-examples-1.0.1.jar
참 쉽죠.. ^^;
그럼 Hadoop Example 코드를 볼까요?
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/
아주 기초적인 내용이니 활용은 함께 공부해 보아요.. ^^
http://blog.softwaregeeks.org/archives/category/develop/hadoop
http://www.ibm.com/developerworks/kr/library/l-hadoop-1/
http://www.ibm.com/developerworks/kr/library/l-hadoop-2/
http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html
hadoop-1.0.1 설치 및 테스트를 해보자.. (Single Machine 설정 입니다.)
기존에 0.20.X 버전과 설치 방법과 테스트 방법은 동일 합니다.
먼저 필요한 파일들을 다운 받아야 겠죠.
- JDK : http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u31-download-1501634.html
- Hadoop : http://mirror.apache-kr.org//hadoop/common/hadoop-1.0.1/
- 파일을 다운로드 받으셔서 압축을 해제 하신 후 아래 폴더 구조에 맞게 넣어 주시면 됩니다.
기본 디렉토리 구조
- 저는 그냥 일반 계정에 설치를 하였습니다.
- 일반적으로는 hadoop 계정을 만드셔서 해당 계정에 설치를 하시면 될 것 같습니다.
/home
/henry
/app
/jdk
/hadoop
/henry
/app
/jdk
/hadoop
환경설정하기
- bash 설정입니다.
- vi .bash_profile 또는 .bashrc 에 추가 합니다.
export JAVA_HOME=/home/henry/app/jdk
export HADOOP_HOME=/home/henry/app/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
export HADOOP_HOME=/home/henry/app/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
샘플 실행하기
- 0.20.X 와 같이 example jar 파일이 있습니다.
- hadoop-examples-1.0.1.jar
$HADOOP_HOME에서 수행하는 command 순서 입니다.
mkdir temp
cp hadoop-examples-1.0.1.jar ./temp/
cd temp
jar xvf hadoop-examples-1.0.1.jar
rm -rf META-INF
rm -f hadoop-examples-1.0.1.jar
jar cvf ../hadoop-examples-1.0.1-0.jar .
cd ..
vi input.txt # 여러 단어들을 입력 하시면 됩니다. 예제가 word count 이므로
hadoop jar hadoop-examples-1.0.1.0.jar org.apache.hadoop.examples.WordCount input.txt output
# 실행 결과
mkdir temp
cp hadoop-examples-1.0.1.jar ./temp/
cd temp
jar xvf hadoop-examples-1.0.1.jar
rm -rf META-INF
rm -f hadoop-examples-1.0.1.jar
jar cvf ../hadoop-examples-1.0.1-0.jar .
cd ..
vi input.txt # 여러 단어들을 입력 하시면 됩니다. 예제가 word count 이므로
a
aa
b
bb
a
aaa
b
bb
c
cc
ccc
dd
cc
# 이와 같이 넣어 봤습니다.aa
b
bb
a
aaa
b
bb
c
cc
ccc
dd
cc
hadoop jar hadoop-examples-1.0.1.0.jar org.apache.hadoop.examples.WordCount input.txt output
# 실행 결과
henry@ubuntu:~/app/hadoop$ hadoop jar hadoop-examples-1.0.1.0.jar org.apache.hadoop.examples.WordCount input.txt output
Warning: $HADOOP_HOME is deprecated.
12/02/29 15:16:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library
****file:/home/henry/app/hadoop-1.0.1/input.txt
12/02/29 15:16:06 INFO input.FileInputFormat: Total input paths to process : 1
12/02/29 15:16:07 INFO mapred.JobClient: Running job: job_local_0001
12/02/29 15:16:07 INFO util.ProcessTree: setsid exited with exit code 0
12/02/29 15:16:07 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@ae533a
12/02/29 15:16:07 INFO mapred.MapTask: io.sort.mb = 100
12/02/29 15:16:07 INFO mapred.MapTask: data buffer = 79691776/99614720
12/02/29 15:16:07 INFO mapred.MapTask: record buffer = 262144/327680
12/02/29 15:16:07 INFO mapred.MapTask: Starting flush of map output
12/02/29 15:16:07 INFO mapred.MapTask: Finished spill 0
12/02/29 15:16:07 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/02/29 15:16:08 INFO mapred.JobClient: map 0% reduce 0%
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/02/29 15:16:10 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6782a9
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Merger: Merging 1 sorted segments
12/02/29 15:16:10 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 82 bytes
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/02/29 15:16:13 INFO mapred.LocalJobRunner: reduce > reduce
12/02/29 15:16:13 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/02/29 15:16:14 INFO mapred.JobClient: map 100% reduce 100%
12/02/29 15:16:14 INFO mapred.JobClient: Job complete: job_local_0001
12/02/29 15:16:14 INFO mapred.JobClient: Counters: 20
12/02/29 15:16:14 INFO mapred.JobClient: File Output Format Counters
12/02/29 15:16:14 INFO mapred.JobClient: Bytes Written=56
12/02/29 15:16:14 INFO mapred.JobClient: FileSystemCounters
12/02/29 15:16:14 INFO mapred.JobClient: FILE_BYTES_READ=288058
12/02/29 15:16:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=354898
12/02/29 15:16:14 INFO mapred.JobClient: File Input Format Counters
12/02/29 15:16:14 INFO mapred.JobClient: Bytes Read=36
12/02/29 15:16:14 INFO mapred.JobClient: Map-Reduce Framework
12/02/29 15:16:14 INFO mapred.JobClient: Map output materialized bytes=86
12/02/29 15:16:14 INFO mapred.JobClient: Map input records=13
12/02/29 15:16:14 INFO mapred.JobClient: Reduce shuffle bytes=0
12/02/29 15:16:14 INFO mapred.JobClient: Spilled Records=18
12/02/29 15:16:14 INFO mapred.JobClient: Map output bytes=88
12/02/29 15:16:14 INFO mapred.JobClient: Total committed heap usage (bytes)=324665344
12/02/29 15:16:14 INFO mapred.JobClient: CPU time spent (ms)=0
12/02/29 15:16:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=115
12/02/29 15:16:14 INFO mapred.JobClient: Combine input records=13
12/02/29 15:16:14 INFO mapred.JobClient: Reduce input records=9
12/02/29 15:16:14 INFO mapred.JobClient: Reduce input groups=9
12/02/29 15:16:14 INFO mapred.JobClient: Combine output records=9
12/02/29 15:16:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
12/02/29 15:16:14 INFO mapred.JobClient: Reduce output records=9
12/02/29 15:16:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
12/02/29 15:16:14 INFO mapred.JobClient: Map output records=13
Warning: $HADOOP_HOME is deprecated.
12/02/29 15:16:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library
****file:/home/henry/app/hadoop-1.0.1/input.txt
12/02/29 15:16:06 INFO input.FileInputFormat: Total input paths to process : 1
12/02/29 15:16:07 INFO mapred.JobClient: Running job: job_local_0001
12/02/29 15:16:07 INFO util.ProcessTree: setsid exited with exit code 0
12/02/29 15:16:07 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@ae533a
12/02/29 15:16:07 INFO mapred.MapTask: io.sort.mb = 100
12/02/29 15:16:07 INFO mapred.MapTask: data buffer = 79691776/99614720
12/02/29 15:16:07 INFO mapred.MapTask: record buffer = 262144/327680
12/02/29 15:16:07 INFO mapred.MapTask: Starting flush of map output
12/02/29 15:16:07 INFO mapred.MapTask: Finished spill 0
12/02/29 15:16:07 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/02/29 15:16:08 INFO mapred.JobClient: map 0% reduce 0%
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/02/29 15:16:10 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6782a9
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Merger: Merging 1 sorted segments
12/02/29 15:16:10 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 82 bytes
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/02/29 15:16:13 INFO mapred.LocalJobRunner: reduce > reduce
12/02/29 15:16:13 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/02/29 15:16:14 INFO mapred.JobClient: map 100% reduce 100%
12/02/29 15:16:14 INFO mapred.JobClient: Job complete: job_local_0001
12/02/29 15:16:14 INFO mapred.JobClient: Counters: 20
12/02/29 15:16:14 INFO mapred.JobClient: File Output Format Counters
12/02/29 15:16:14 INFO mapred.JobClient: Bytes Written=56
12/02/29 15:16:14 INFO mapred.JobClient: FileSystemCounters
12/02/29 15:16:14 INFO mapred.JobClient: FILE_BYTES_READ=288058
12/02/29 15:16:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=354898
12/02/29 15:16:14 INFO mapred.JobClient: File Input Format Counters
12/02/29 15:16:14 INFO mapred.JobClient: Bytes Read=36
12/02/29 15:16:14 INFO mapred.JobClient: Map-Reduce Framework
12/02/29 15:16:14 INFO mapred.JobClient: Map output materialized bytes=86
12/02/29 15:16:14 INFO mapred.JobClient: Map input records=13
12/02/29 15:16:14 INFO mapred.JobClient: Reduce shuffle bytes=0
12/02/29 15:16:14 INFO mapred.JobClient: Spilled Records=18
12/02/29 15:16:14 INFO mapred.JobClient: Map output bytes=88
12/02/29 15:16:14 INFO mapred.JobClient: Total committed heap usage (bytes)=324665344
12/02/29 15:16:14 INFO mapred.JobClient: CPU time spent (ms)=0
12/02/29 15:16:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=115
12/02/29 15:16:14 INFO mapred.JobClient: Combine input records=13
12/02/29 15:16:14 INFO mapred.JobClient: Reduce input records=9
12/02/29 15:16:14 INFO mapred.JobClient: Reduce input groups=9
12/02/29 15:16:14 INFO mapred.JobClient: Combine output records=9
12/02/29 15:16:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
12/02/29 15:16:14 INFO mapred.JobClient: Reduce output records=9
12/02/29 15:16:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
12/02/29 15:16:14 INFO mapred.JobClient: Map output records=13
henry@ubuntu:~/app/hadoop$ cat output/*
a 2
aa 1
aaa 1
b 2
bb 2
c 1
cc 2
ccc 1
dd 1
a 2
aa 1
aaa 1
b 2
bb 2
c 1
cc 2
ccc 1
dd 1
참 쉽죠.. ^^;
그럼 Hadoop Example 코드를 볼까요?
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.examples; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
아주 기초적인 내용이니 활용은 함께 공부해 보아요.. ^^