Hadoop 1.0.x の設定

Solaris 11 で Hadoop 1.0.x をセットアップしたときのメモです。

ソースダウンロード

hadoop-1.0.4.tar.gz

インストール(ディレクトリのコピー)

ダウンロードした圧縮ファイルを展開し /usr/local にコピー。今後新しいバージョンに入れ替えるときのためにシンボリックリンクを設定

# cp -r hadoop-1.0.4 /usr/local
# ln -s hadoop-1.0.4 hadoop
# ls -ld /usr/local/hadoop
lrwxrwxrwx   1 root     root          12  1411:47 /usr/local/hadoop -> hadoop-1.0.4

環境変数設定

PATH追加

Hadoop をコピーしたディレクトリの直ぐ下にある bin を PATH にセット

$ export PATH=$PATH:/usr/local/hadoop/bin

HADOOP_HOME解除

以前は HADOOP_HOME という環境変数を設定する必要があったが使われなくなったもよう。念のため unset しておく。

$ unset HADOOP_HOME

設定

/usr/local/hadoop/conf 以下の3つの設定ファイルを編集

  • hadoop-env.sh
  • core-site.xml
  • hdfs-site.xml
  • mapred-site.xml

なお、以下の設定では Hadoop がインストールされているホストのホスト名を sol11 としていますので適時読み替えてください。

hadoop-env.sh

JAVA_HOME を設定

 ...
export JAVA_HOME=/usr/java
 ...

core-site.xml

ファイルシステム名(fs.default.name)を設定

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://sol11:9000</value>
</property>
</configuration>

hdfs-site.xml

NameNode のネームスペースとトランザクションログを格納するディレクトリ(dfs.name.dir) と、DataNode がブロックを格納するディレクトリ(dfs.data.dir)を設定

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
  <name>dfs.name.dir</name>
  <value>/var/tmp/hadoop/dfs/name</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/var/tmp/hadoop/dfs/data</value>
</property>

</configuration>

mapred-site.xml

JobTracker ホストのホスト名(mapred.job.tracker) を設定

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
   <name>mapred.job.tracker</name>
   <value>sol11:9001</value>
</property>
</configuration>

HDFS のフォーマット

$ hadoop namenode -format
13/01/04 09:48:03 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = sol11/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
************************************************************/
13/01/08 09:48:03 INFO util.GSet: VM type       = 32-bit
13/01/08 09:48:03 INFO util.GSet: 2% max memory = 17.81375 MB
13/01/08 09:48:03 INFO util.GSet: capacity      = 2^22 = 4194304 entries
13/01/08 09:48:03 INFO util.GSet: recommended=4194304, actual=4194304
13/01/08 09:48:03 INFO namenode.FSNamesystem: fsOwner=kaizawa
13/01/08 09:48:03 INFO namenode.FSNamesystem: supergroup=supergroup
13/01/08 09:48:03 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/01/08 09:48:03 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/01/08 09:48:03 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/01/08 09:48:03 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/01/08 09:48:04 INFO common.Storage: Image file of size 113 saved in 0 seconds.
13/01/08 09:48:04 INFO common.Storage: Storage directory /var/tmp/hadoop/dfs/name has been successfully formatted.
13/01/08 09:48:04 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at sol11/127.0.0.1
************************************************************/

起動

hadoopの各デーモンの起動

sol11 % /usr/local/hadoop/bin/start-all.sh
starting namenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-kaizawa-namenode-sol11.out
localhost: starting datanode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-kaizawa-datanode-sol11.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-kaizawa-secondarynamenode-sol11.out
starting jobtracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-kaizawa-jobtracker-sol11.out
localhost: starting tasktracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-kaizawa-tasktracker-sol11.out

動作テスト

HDFS のテスト

messages ファイルを HDFS に保存してみる

$ hadoop dfs -copyFromLocal /var/adm/messages /

HDFS上の message ファイルの中を読んでみる

$ hadoop dfs -cat /messages
Apr 15 13:18:49 sol11 sendmail[1102]: [ID 702911 mail.crit] My unqualified host name (sol11) unknown; sleeping for retry
Apr 15 13:19:49 sol11 sendmail[1102]: [ID 702911 mail.alert] unable to qualify my own domain name (sol11) -- using short name

Hadoop のテスト

Hadoop の添付されているサンプルを使って円周率を求める

$ cd /usr/local/hadoop
$ % hadoop jar hadoop\-examples\-1.0.4.jar pi 1 10
Number of Maps  = 1
Samples per Map = 10
Wrote input for Map #0
Starting Job
13/01/08 09:57:20 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/04 09:57:20 INFO mapred.JobClient: Running job: job_201301080948_0001
13/01/04 09:57:21 INFO mapred.JobClient:  map 0% reduce 0%
13/01/04 09:57:40 INFO mapred.JobClient:  map 100% reduce 0%
13/01/04 09:57:56 INFO mapred.JobClient:  map 100% reduce 100%
13/01/04 09:58:01 INFO mapred.JobClient: Job complete: job_201301080948_0001
13/01/04 09:58:01 INFO mapred.JobClient: Counters: 27
13/01/04 09:58:01 INFO mapred.JobClient:   Job Counters
13/01/04 09:58:01 INFO mapred.JobClient:     Launched reduce tasks=1
13/01/04 09:58:01 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=19266
13/01/04 09:58:01 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/01/04 09:58:01 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/01/04 09:58:01 INFO mapred.JobClient:     Launched map tasks=1
13/01/04 09:58:01 INFO mapred.JobClient:     Data-local map tasks=1
13/01/04 09:58:01 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14567
13/01/04 09:58:01 INFO mapred.JobClient:   File Input Format Counters
13/01/04 09:58:01 INFO mapred.JobClient:     Bytes Read=118
13/01/04 09:58:01 INFO mapred.JobClient:   File Output Format Counters
13/01/04 09:58:01 INFO mapred.JobClient:     Bytes Written=97
13/01/04 09:58:01 INFO mapred.JobClient:   FileSystemCounters
13/01/04 09:58:01 INFO mapred.JobClient:     FILE_BYTES_READ=28
13/01/04 09:58:01 INFO mapred.JobClient:     HDFS_BYTES_READ=238
13/01/04 09:58:01 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=43307
13/01/04 09:58:01 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
13/01/04 09:58:01 INFO mapred.JobClient:   Map-Reduce Framework
13/01/04 09:58:01 INFO mapred.JobClient:     Map output materialized bytes=28
13/01/04 09:58:01 INFO mapred.JobClient:     Map input records=1
13/01/04 09:58:01 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/01/04 09:58:01 INFO mapred.JobClient:     Spilled Records=4
13/01/04 09:58:01 INFO mapred.JobClient:     Map output bytes=18
13/01/04 09:58:01 INFO mapred.JobClient:     Total committed heap usage (bytes)=245891072
13/01/04 09:58:01 INFO mapred.JobClient:     Map input bytes=24
13/01/04 09:58:01 INFO mapred.JobClient:     Combine input records=0
13/01/04 09:58:01 INFO mapred.JobClient:     SPLIT_RAW_BYTES=120
13/01/04 09:58:01 INFO mapred.JobClient:     Reduce input records=2
13/01/04 09:58:01 INFO mapred.JobClient:     Reduce input groups=2
13/01/04 09:58:01 INFO mapred.JobClient:     Combine output records=0
13/01/04 09:58:01 INFO mapred.JobClient:     Reduce output records=0
13/01/04 09:58:01 INFO mapred.JobClient:     Map output records=2
Job Finished in 41.727 seconds
Estimated value of Pi is 3.60000000000000000000

本家設定ページ

Cluster Setup