How to submit a job to a remote JobTracker

One of the first things you start thinking after your very first hadoop jobs is what should I do to submit a job to a remote JobTracker? Cause otherwise you’re restricted to run this jobs using the same user and the same machine where your JobTracker is.

So here there are a few hints on how to do it.

The jobtracker should not only be listening to localhost connections, to do what we’ll update the mapred-site.xml file, like this:

<property >
<name >mapred.job.tracker </name >
<value > </value >
</property >

after that the JobTracker in this location start accepting connection to this port.

you’ve to obviously update all references to localhost, like for example the ones located under master or slaves files.

Next thing to update is the core-site.xml file with


The hadoop distributed file system should be not just listening to localhost.

After this slightly little changes we must tell our client’s to submit the job to the desired JobTracker, so to do that we’ve to:

Configuration config = new Configuration();
config.set("mapred.job.tracker", "");

then our job is ready to be submitted to the desired JobTracker, but there is still two other little things we’ve to take care of.

First of all we’ve to be aware that all classes related to the job should be properly submitted to tracker, to do that we can for example compile the job as a jar and run it from the console or using the hadoop command.

And finally we’ve only to take care that we’ve access to the right hdfs location, in order to get that up and working we’ve to:
Tell our job to connect to the right hdfs location:

config.set("", "hdfs://");

Be sure that the user we’re using to run the job exist and have the right permissions to access the desired location. In order to archive this we should



And run jobs from “anywhere”!!. ;-)


Now read this

Benchmarking a Hadoop Cluster

HDFS benchmarking In order to benchmark the HDFS we’re mainly interested on the performance of reading and writing operations, and to archive that Apache Hadoop give us the TestDFSIO tool. Later on we’re going to talk also about doing... Continue →