How to submit a job to a remote JobTracker
One of the first things you start thinking after your very first hadoop jobs is what should I do to submit a job to a remote JobTracker? Cause otherwise you’re restricted to run this jobs using the same user and the same machine where your JobTracker is.
So here there are a few hints on how to do it.
The jobtracker should not only be listening to localhost connections, to do what we’ll update the mapred-site.xml file, like this:
<name >mapred.job.tracker </name >
<value >192.168.1.1:54311 </value >
after that the JobTracker in this location start accepting connection to this port.
you’ve to obviously update all references to localhost, like for example the ones located under master or slaves files.
Next thing to update is the core-site.xml file with
The hadoop distributed file system should be not just listening to localhost.
After this slightly little changes we must tell our client’s to submit the job to the desired JobTracker, so to do that we’ve to:
Configuration config = new Configuration();
then our job is ready to be submitted to the desired JobTracker, but there is still two other little things we’ve to take care of.
First of all we’ve to be aware that all classes related to the job should be properly submitted to tracker, to do that we can for example compile the job as a jar and run it from the console or using the hadoop command.
And finally we’ve only to take care that we’ve access to the right hdfs location, in order to get that up and working we’ve to:
Tell our job to connect to the right hdfs location:
Be sure that the user we’re using to run the job exist and have the right permissions to access the desired location. In order to archive this we should
- Be sure the user can write to the staging directory. If we’ve to update the location, we can do that adding the next property in the mapred-site.xml file.
- Be sure the user can access and write the right hdfs directories. To archive that, might be the most easy thing to add our user to the supergroup or any other group used inside our system. If we want to update the group configuration we can do it adding the next property to the hdfs-site.xml file:
And run jobs from “anywhere”!!. ;-)