Hadoop 1.0.x の設定
Solaris 11 で Hadoop 1.0.x をセットアップしたときのメモです。
ソースダウンロード
インストール(ディレクトリのコピー)
ダウンロードした圧縮ファイルを展開し /usr/local にコピー。今後新しいバージョンに入れ替えるときのためにシンボリックリンクを設定
# cp -r hadoop-1.0.4 /usr/local # ln -s hadoop-1.0.4 hadoop # ls -ld /usr/local/hadoop lrwxrwxrwx 1 root root 12 1月 4日 11:47 /usr/local/hadoop -> hadoop-1.0.4
環境変数設定
PATH追加
Hadoop をコピーしたディレクトリの直ぐ下にある bin を PATH にセット
$ export PATH=$PATH:/usr/local/hadoop/bin
設定
/usr/local/hadoop/conf 以下の3つの設定ファイルを編集
なお、以下の設定では Hadoop がインストールされているホストのホスト名を sol11 としていますので適時読み替えてください。
core-site.xml
ファイルシステム名(fs.default.name)を設定
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://sol11:9000</value> </property> </configuration>
hdfs-site.xml
NameNode のネームスペースとトランザクションログを格納するディレクトリ(dfs.name.dir) と、DataNode がブロックを格納するディレクトリ(dfs.data.dir)を設定
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.name.dir</name> <value>/var/tmp/hadoop/dfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/var/tmp/hadoop/dfs/data</value> </property> </configuration>
mapred-site.xml
JobTracker ホストのホスト名(mapred.job.tracker) を設定
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>sol11:9001</value> </property> </configuration>
HDFS のフォーマット
$ hadoop namenode -format 13/01/04 09:48:03 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = sol11/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 ************************************************************/ 13/01/08 09:48:03 INFO util.GSet: VM type = 32-bit 13/01/08 09:48:03 INFO util.GSet: 2% max memory = 17.81375 MB 13/01/08 09:48:03 INFO util.GSet: capacity = 2^22 = 4194304 entries 13/01/08 09:48:03 INFO util.GSet: recommended=4194304, actual=4194304 13/01/08 09:48:03 INFO namenode.FSNamesystem: fsOwner=kaizawa 13/01/08 09:48:03 INFO namenode.FSNamesystem: supergroup=supergroup 13/01/08 09:48:03 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/01/08 09:48:03 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 13/01/08 09:48:03 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 13/01/08 09:48:03 INFO namenode.NameNode: Caching file names occuring more than 10 times 13/01/08 09:48:04 INFO common.Storage: Image file of size 113 saved in 0 seconds. 13/01/08 09:48:04 INFO common.Storage: Storage directory /var/tmp/hadoop/dfs/name has been successfully formatted. 13/01/08 09:48:04 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at sol11/127.0.0.1 ************************************************************/
起動
hadoopの各デーモンの起動
sol11 % /usr/local/hadoop/bin/start-all.sh starting namenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-kaizawa-namenode-sol11.out localhost: starting datanode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-kaizawa-datanode-sol11.out localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-kaizawa-secondarynamenode-sol11.out starting jobtracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-kaizawa-jobtracker-sol11.out localhost: starting tasktracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-kaizawa-tasktracker-sol11.out
動作テスト
HDFS のテスト
messages ファイルを HDFS に保存してみる
$ hadoop dfs -copyFromLocal /var/adm/messages /
HDFS上の message ファイルの中を読んでみる
$ hadoop dfs -cat /messages Apr 15 13:18:49 sol11 sendmail[1102]: [ID 702911 mail.crit] My unqualified host name (sol11) unknown; sleeping for retry Apr 15 13:19:49 sol11 sendmail[1102]: [ID 702911 mail.alert] unable to qualify my own domain name (sol11) -- using short name
Hadoop のテスト
Hadoop の添付されているサンプルを使って円周率を求める
$ cd /usr/local/hadoop $ % hadoop jar hadoop\-examples\-1.0.4.jar pi 1 10 Number of Maps = 1 Samples per Map = 10 Wrote input for Map #0 Starting Job 13/01/08 09:57:20 INFO mapred.FileInputFormat: Total input paths to process : 1 13/01/04 09:57:20 INFO mapred.JobClient: Running job: job_201301080948_0001 13/01/04 09:57:21 INFO mapred.JobClient: map 0% reduce 0% 13/01/04 09:57:40 INFO mapred.JobClient: map 100% reduce 0% 13/01/04 09:57:56 INFO mapred.JobClient: map 100% reduce 100% 13/01/04 09:58:01 INFO mapred.JobClient: Job complete: job_201301080948_0001 13/01/04 09:58:01 INFO mapred.JobClient: Counters: 27 13/01/04 09:58:01 INFO mapred.JobClient: Job Counters 13/01/04 09:58:01 INFO mapred.JobClient: Launched reduce tasks=1 13/01/04 09:58:01 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19266 13/01/04 09:58:01 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/01/04 09:58:01 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/01/04 09:58:01 INFO mapred.JobClient: Launched map tasks=1 13/01/04 09:58:01 INFO mapred.JobClient: Data-local map tasks=1 13/01/04 09:58:01 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14567 13/01/04 09:58:01 INFO mapred.JobClient: File Input Format Counters 13/01/04 09:58:01 INFO mapred.JobClient: Bytes Read=118 13/01/04 09:58:01 INFO mapred.JobClient: File Output Format Counters 13/01/04 09:58:01 INFO mapred.JobClient: Bytes Written=97 13/01/04 09:58:01 INFO mapred.JobClient: FileSystemCounters 13/01/04 09:58:01 INFO mapred.JobClient: FILE_BYTES_READ=28 13/01/04 09:58:01 INFO mapred.JobClient: HDFS_BYTES_READ=238 13/01/04 09:58:01 INFO mapred.JobClient: FILE_BYTES_WRITTEN=43307 13/01/04 09:58:01 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215 13/01/04 09:58:01 INFO mapred.JobClient: Map-Reduce Framework 13/01/04 09:58:01 INFO mapred.JobClient: Map output materialized bytes=28 13/01/04 09:58:01 INFO mapred.JobClient: Map input records=1 13/01/04 09:58:01 INFO mapred.JobClient: Reduce shuffle bytes=0 13/01/04 09:58:01 INFO mapred.JobClient: Spilled Records=4 13/01/04 09:58:01 INFO mapred.JobClient: Map output bytes=18 13/01/04 09:58:01 INFO mapred.JobClient: Total committed heap usage (bytes)=245891072 13/01/04 09:58:01 INFO mapred.JobClient: Map input bytes=24 13/01/04 09:58:01 INFO mapred.JobClient: Combine input records=0 13/01/04 09:58:01 INFO mapred.JobClient: SPLIT_RAW_BYTES=120 13/01/04 09:58:01 INFO mapred.JobClient: Reduce input records=2 13/01/04 09:58:01 INFO mapred.JobClient: Reduce input groups=2 13/01/04 09:58:01 INFO mapred.JobClient: Combine output records=0 13/01/04 09:58:01 INFO mapred.JobClient: Reduce output records=0 13/01/04 09:58:01 INFO mapred.JobClient: Map output records=2 Job Finished in 41.727 seconds Estimated value of Pi is 3.60000000000000000000