Hadoop components need to have Hadoop libraries accessible from CloverETL. The libraries are needed by HadoopReader, HadoopWriter, ExecuteMapReduce, HDFS and Hive.
The Hadoop libraries are necessary to establish a Hadoop connection, see Hadoop connection.
There are two officially supported versions of Hadoop:
Cloudera 4
version 4.1.2
and Cloudera 5
version 5.6.0.
Other versions close to this one might work, but the compatibility is not guaranteed.
Cloudera 4 |
Cloudera 5 |
The below mentioned libraries are needed for the connection to Cloudera 4.
hadoop-common-2.0.0-cdh4.1.2.jar
hadoop-auth-2.0.0-cdh4.1.2.jar
guava-11.0.2.jar
avro-1.7.1.cloudera.2.jar
commons-cli-1.2.jar
commons-configuration-1.6.jar
commons-lang-2.5.jar
hadoop-hdfs-2.0.0-cdh4.1.2.jar
protobuf-java-2.4.0a.jar
aopalliance-1.0.jar
asm-3.2.jar
avro-1.7.1.cloudera.2.jar
commons-io-2.1.jar
guice-3.0.jar
guice-servlet-3.0.jar
hadoop-annotations-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-app-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-common-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-core-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-hs-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-jobclient-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-shuffle-2.0.0-cdh4.1.2.jar
jackson-core-asl-1.8.8.jar
jackson-mapper-asl-1.8.8.jar
javax.inject-1.jar
jersey-core-1.8.jar
jersey-guice-1.8.jar
jersey-server-1.8.jar
log4j-1.2.17.jar
netty-3.2.4.Final.jar
paranamer-2.3.jar
protobuf-java-2.4.0a.jar
snappy-java-1.0.4.1.jar
hadoop-yarn-common-2.0.0-cdh4.1.2.jar
hadoop-yarn-api-2.0.0-cdh4.1.2.jar
hive-jdbc-0.8.1.jar
hadoop-core-0.20.205.jar
hive-exec-0.8.1.jar
hive-metastore-0.8.1.jar
hive-service-0.8.1.jar
libfb303-0.7.0.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar
The below mentioned libraries are needed for the connection to Cloudera 5.
hadoop-common-2.6.0-cdh5.6.0.jar
hadoop-auth-2.6.0-cdh5.6.0.jar
guava-15.0.jar
avro-1.7.6-cdh5.6.0.jar
htrace-core4-4.0.1-incubating.jar
servlet-api-3.0.jar
hadoop-hdfs-2.6.0-cdh5.6.0.jar
protobuf-java-2.5.0.jar
hadoop-annotations-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-app-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-common-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-core-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-hs-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-jobclient-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-shuffle-2.6.0-cdh5.6.0.jar
jackson-core-asl-1.9.2.jar
jackson-mapper-asl-1.9.12.jar
hadoop-yarn-api-2.6.0-cdh5.6.0.jar
hadoop-yarn-client-2.6.0-cdh5.6.0.jar
hadoop-yarn-common-2.6.0-cdh5.6.0.jar
hive-jdbc-1.1.0-cdh5.6.0.jar
hive-exec-1.1.0-cdh5.6.0.jar
hive-metastore-1.1.0-cdh5.6.0.jar
hive-service-1.1.0-cdh5.6.0.jar
libfb303-0.9.2.jar
slf4j-api-1.7.5.jar
slf4j-log4j12-1.7.5.jar
The libraries can be found in your CDH installation or in a package downloaded from Cloudera.
Required libraries from CDH reside in the directories from the following list.
/usr/lib/hadoop
/usr/lib/hadoop-hdfs
/usr/lib/hadoop-mapreduce
/usr/lib/hadoop-yarn
+ 3rd party libraries are located in lib subdirectories
The files can be found also in a package downloaded from Cloudera on the following locations.
share/hadoop/common
share/hadoop/hdfs
share/hadoop/mapreduce2
share/hadoop/yarn
+ lib subdirectories