hadoop-1.0.1 설치 및 테스트 맛보기

ITWeb/Hadoop일반 2012. 2. 29. 15:26
[참고사이트]
http://blog.softwaregeeks.org/archives/category/develop/hadoop 

http://www.ibm.com/developerworks/kr/library/l-hadoop-1/
http://www.ibm.com/developerworks/kr/library/l-hadoop-2/
http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html



hadoop-1.0.1 설치 및 테스트를 해보자.. (Single Machine 설정 입니다.)
기존에 0.20.X 버전과 설치 방법과 테스트 방법은 동일 합니다.

먼저 필요한 파일들을 다운 받아야 겠죠.
- JDK : http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u31-download-1501634.html
- Hadoop : http://mirror.apache-kr.org//hadoop/common/hadoop-1.0.1/
- 파일을 다운로드 받으셔서 압축을 해제 하신 후 아래 폴더 구조에 맞게 넣어 주시면 됩니다.

기본 디렉토리 구조
- 저는 그냥 일반 계정에 설치를 하였습니다.
- 일반적으로는 hadoop 계정을 만드셔서 해당 계정에 설치를 하시면 될 것 같습니다.
/home
    /henry
        /app
            /jdk
            /hadoop

환경설정하기
- bash 설정입니다.
- vi .bash_profile 또는 .bashrc 에 추가 합니다.
export JAVA_HOME=/home/henry/app/jdk
export HADOOP_HOME=/home/henry/app/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH

샘플 실행하기
- 0.20.X 와 같이 example jar 파일이 있습니다.
- hadoop-examples-1.0.1.jar
$HADOOP_HOME에서 수행하는 command 순서 입니다.
mkdir temp
cp hadoop-examples-1.0.1.jar ./temp/
cd temp
jar xvf hadoop-examples-1.0.1.jar
rm -rf META-INF
rm -f hadoop-examples-1.0.1.jar
jar cvf ../hadoop-examples-1.0.1-0.jar .
cd ..
vi input.txt # 여러 단어들을 입력 하시면 됩니다. 예제가 word count 이므로
a
aa
b
bb
a
aaa
b
bb
c
cc
ccc
dd
cc
# 이와 같이 넣어 봤습니다.
hadoop jar hadoop-examples-1.0.1.0.jar org.apache.hadoop.examples.WordCount input.txt output
# 실행 결과
henry@ubuntu:~/app/hadoop$ hadoop jar hadoop-examples-1.0.1.0.jar org.apache.hadoop.examples.WordCount input.txt output
Warning: $HADOOP_HOME is deprecated.

12/02/29 15:16:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library
****file:/home/henry/app/hadoop-1.0.1/input.txt
12/02/29 15:16:06 INFO input.FileInputFormat: Total input paths to process : 1
12/02/29 15:16:07 INFO mapred.JobClient: Running job: job_local_0001
12/02/29 15:16:07 INFO util.ProcessTree: setsid exited with exit code 0
12/02/29 15:16:07 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@ae533a
12/02/29 15:16:07 INFO mapred.MapTask: io.sort.mb = 100
12/02/29 15:16:07 INFO mapred.MapTask: data buffer = 79691776/99614720
12/02/29 15:16:07 INFO mapred.MapTask: record buffer = 262144/327680
12/02/29 15:16:07 INFO mapred.MapTask: Starting flush of map output
12/02/29 15:16:07 INFO mapred.MapTask: Finished spill 0
12/02/29 15:16:07 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/02/29 15:16:08 INFO mapred.JobClient:  map 0% reduce 0%
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/02/29 15:16:10 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6782a9
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Merger: Merging 1 sorted segments
12/02/29 15:16:10 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 82 bytes
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/02/29 15:16:10 INFO mapred.LocalJobRunner:
12/02/29 15:16:10 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/02/29 15:16:13 INFO mapred.LocalJobRunner: reduce > reduce
12/02/29 15:16:13 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/02/29 15:16:14 INFO mapred.JobClient:  map 100% reduce 100%
12/02/29 15:16:14 INFO mapred.JobClient: Job complete: job_local_0001
12/02/29 15:16:14 INFO mapred.JobClient: Counters: 20
12/02/29 15:16:14 INFO mapred.JobClient:   File Output Format Counters
12/02/29 15:16:14 INFO mapred.JobClient:     Bytes Written=56
12/02/29 15:16:14 INFO mapred.JobClient:   FileSystemCounters
12/02/29 15:16:14 INFO mapred.JobClient:     FILE_BYTES_READ=288058
12/02/29 15:16:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=354898
12/02/29 15:16:14 INFO mapred.JobClient:   File Input Format Counters
12/02/29 15:16:14 INFO mapred.JobClient:     Bytes Read=36
12/02/29 15:16:14 INFO mapred.JobClient:   Map-Reduce Framework
12/02/29 15:16:14 INFO mapred.JobClient:     Map output materialized bytes=86
12/02/29 15:16:14 INFO mapred.JobClient:     Map input records=13
12/02/29 15:16:14 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/02/29 15:16:14 INFO mapred.JobClient:     Spilled Records=18
12/02/29 15:16:14 INFO mapred.JobClient:     Map output bytes=88
12/02/29 15:16:14 INFO mapred.JobClient:     Total committed heap usage (bytes)=324665344
12/02/29 15:16:14 INFO mapred.JobClient:     CPU time spent (ms)=0
12/02/29 15:16:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=115
12/02/29 15:16:14 INFO mapred.JobClient:     Combine input records=13
12/02/29 15:16:14 INFO mapred.JobClient:     Reduce input records=9
12/02/29 15:16:14 INFO mapred.JobClient:     Reduce input groups=9
12/02/29 15:16:14 INFO mapred.JobClient:     Combine output records=9
12/02/29 15:16:14 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
12/02/29 15:16:14 INFO mapred.JobClient:     Reduce output records=9
12/02/29 15:16:14 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
12/02/29 15:16:14 INFO mapred.JobClient:     Map output records=13


henry@ubuntu:~/app/hadoop$ cat output/*
a    2
aa    1
aaa    1
b    2
bb    2
c    1
cc    2
ccc    1
dd    1


참 쉽죠.. ^^;
그럼 Hadoop Example 코드를 볼까요?
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
    
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
      
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
  
  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: wordcount <in> <out>");
      System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

아주 기초적인 내용이니 활용은 함께 공부해 보아요.. ^^
: