Thursday, February 5, 2015

Hadoop Job Submission Errors

Ugh, for the life of me I couldn't figure this out today until the "Duh ... I didn't do ..".  We all have those days.  Hopefully this will help someone out there a little quicker.

I couldn't submit a small Hadoop job today and was repeatedly getting errors like this:


Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class FOOCLASS not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1961)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.ClassNotFoundException: Class WordCount$WordMapper not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1867)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1959)
        ... 8 more 

The key line of error output was:

WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).

After Googling I tried all the normal expectations.

1)

Make sure the jar has all the classes in it that it's supposed to have.

2)

Make sure you're calling job.setJarByClass() in your code.

3)

Make sure the permissions on the jar file are correct.

But still couldn't get it to work.  Then a stackoverflow post suggested calling job.setJar() to manually set the jar by it's full path.

This worked.  Why?

Duh ... Hadoop couldn't find the jar submitted in question.  So step 4 to try.

4)

Make sure your jar can be found in the environment variable HADOOP_CLASSPATH.

An alternate option may be to ensure your jar is the classpath of   yarn.application.classpath.  But I didn't try this.

No comments:

Post a Comment