Hadoop 막 따라하고 테스트 하기...

ITWeb/Hadoop일반 2012. 3. 6. 12:01
[참고문서]
http://apache.mirror.cdnetworks.com//hadoop/common/
http://wiki.apache.org/hadoop/GettingStartedWithHadoop
http://wiki.apache.org/hadoop/HowToConfigure
http://wiki.apache.org/hadoop/QuickStart
http://hadoop.apache.org/common/docs/current/cluster_setup.html
http://hadoop.apache.org/common/docs/current/single_node_setup.html


[Prepare to Start the Hadoop Cluster]
Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.

Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.


[Standalone Operation]
[hadoop-0.21.0]
cd $HADOOP_HOME
mkdir input
cp conf/*.xml input
bin/hadoop jar hadoop-mapred-examples-0.21.0.jar grep input output 'dfs[a-z.]+'
cat output/*

[hadoop-0.22.0]
ubuntu:~/app/hadoop-0.22.0$ mkdir input
ubuntu:~/app/hadoop-0.22.0$ cp conf/*.xml input
ubuntu:~/app/hadoop-0.22.0$ bin/hadoop jar hadoop-mapred-examples-0.22.0.jar grep input output 'dfs[a-z.]+'
ubuntu:~/app/hadoop-0.22.0$ cat output/*

[hadoop-1.0.1]
ubuntu:~/app/hadoop-1.0.1$ mkdir input
ubuntu:~/app/hadoop-1.0.1$ cp conf/*.xml input
ubuntu:~/app/hadoop-1.0.1$ bin/hadoop jar hadoop-mapred-examples-0.22.0.jar grep input output 'dfs[a-z.]+'
ubuntu:~/app/hadoop-1.0.1$ cat output/*
- 직접 해보시면 아시겠지만.. 동일하게 동작하며 똑같은 결과가 나옵니다.


[Pseudo-Distributed Operation]
[hadoop-0.21.0]
{conf/core-site.xml}
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

{conf/hdfs-site.xml}
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

{conf/mapred-site.xml}
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
    </property>
</configuration>

{Setup passphraseless ssh}
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
- "ssh: connect to host localhost port 22: Connection refused" 가 나오면 일단 ssh 가 정상 설치 되어 있는지 확인을 하고, 설치가 되어 있다면 /etc/ssh/sshd_config 에 port 설정은 잘 되어 있는지 보시고 restart 후 재시도 하시면 될 겁니다. (더 상세한 내용은 구글링을 통해서 해결해 보세요.)

{Execution}
ubuntu:~/app/hadoop-0.21.0$ bin/hadoop namenode -format
ubuntu:~/app/hadoop-0.21.0$ bin/start-all.sh

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

    NameNode - http://localhost:50070/
    JobTracker - http://localhost:50030/

ubuntu:~/app/hadoop-0.21.0$ bin/hadoop fs -put conf input
ubuntu:~/app/hadoop-0.21.0$ bin/hadoop jar hadoop-mapred-examples-0.21.0.jar grep input output 'dfs[a-z.]+'
ubuntu:~/app/hadoop-0.21.0$ bin/hadoop fs -get output output
ubuntu:~/app/hadoop-0.21.0$ cat output/*
or
ubuntu:~/app/hadoop-0.21.0$ bin/hadoop fs -cat output/*
ubuntu:~/app/hadoop-0.21.0$ cat output/*


이하 다른 버전들도 동일하게 테스트 수행 하면 됨.
다음에는 HDFS 에 읽고/쓰기를 테스트 해보려 합니다.
: