Amazon EMR – 无法运行程序<path> ./mapper.js“:error = 2,没有这样的文件或目录

我正在使用Nodejs进行Amazon EMR作业。 我试图改变文件来使用UNIX行尾,但它仍然无法正常工作。 这是错误:

2016-11-27 09:16:53,794 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/mnt1/yarn/usercache/hadoop/appcache/application_1480232881564_0005/container_1480232881564_0005_01_000002/./mapper.js] 2016-11-27 09:16:53,803 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir 2016-11-27 09:16:53,804 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start 2016-11-27 09:16:53,804 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: job.local.dir is deprecated. Instead, use mapreduce.job.local.dir 2016-11-27 09:16:53,804 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 2016-11-27 09:16:53,805 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 2016-11-27 09:16:53,805 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 2016-11-27 09:16:53,805 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir 2016-11-27 09:16:53,806 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file 2016-11-27 09:16:53,806 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 2016-11-27 09:16:53,806 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length 2016-11-27 09:16:53,806 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files 2016-11-27 09:16:53,807 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 2016-11-27 09:16:53,807 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 2016-11-27 09:16:53,816 ERROR [main] org.apache.hadoop.streaming.PipeMapRed: configuration exception java.io.IOException: Cannot run program "/mnt1/yarn/usercache/hadoop/appcache/application_1480232881564_0005/container_1480232881564_0005_01_000002/./mapper.js": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) 

这是我的群集命令:

 aws emr create-cluster --auto-scaling-role EMR_AutoScaling_DefaultRole --applications Name=Hadoop --bootstrap-actions '[{"Path":"s3://ccvikas/installNode.sh","Name":"Custom action"}]' --ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","SubnetId":"subnet-9db906c6","EmrManagedSlaveSecurityGroup":"sg-d9ee70a4","EmrManagedMasterSecurityGroup":"sg-deee70a3"}' --service-role EMR_DefaultRole --release-label emr-5.2.0 --steps '[{"Args":["hadoop-streaming","-files","s3://ccvikas/js/mapper.js","-mapper","mapper.js","-reducer","mapper.js","-input","s3://commoncrawl/crawl-data/CC-MAIN-2016-40/segments/1474738659496.36/warc/CC-MAIN-20160924173739-00000-ip-10-143-35-109.ec2.internal.warc.gz","-output","s3://ccvikas/out8"],"Type":"CUSTOM_JAR","ActionOnFailure":"CANCEL_AND_WAIT","Jar":"command-runner.jar","Properties":"","Name":"Streaming program"}]' --name 'My cluster' --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"m1.xlarge","Name":"Master - 1"},{"InstanceCount":1,"InstanceGroupType":"CORE","InstanceType":"m1.xlarge","Name":"Core - 2"}]' --scale-down-behavior TERMINATE_AT_INSTANCE_HOUR --region us-east-1 

这是我的步骤命令:

 hadoop-streaming -files s3://ccvikas/js/mapper.js,s3://ccvikas/js/reducer.js -mapper mapper.js -reducer reducer.js -input s3://commoncrawl/crawl-data/CC-MAIN-2016-40/segments/1474738659496.36/warc/CC-MAIN-20160924173739-00000-ip-10-143-35-109.ec2.internal.warc.gz -output s3://ccvikas/out 

问题是我的引导程序操作没有正确安装nodejs。 所以我修改了我的bootstrap动作,如下安装最新的nodejs。

 #!/bin/bash is_aml=`uname -r | grep amzn1.x86_64 | wc -l` if [ is_aml=1 ]; then sudo curl --silent --location https://rpm.nodesource.com/setup_7.x | sudo bash - sudo yum -y install nodejs else echo "Unsupported OS" exit -1 fi 

这种错误的另一个原因可能是:
– 不要在你的mapper和reducer文件中使用正确的shebang行
– 传递保存在windows环境中的mapper和reducer文件(在windows line endings中)=>使用UNIX行尾来解决问题。

Interesting Posts