Hadoop: How To Include Third Party Library In Python Mapreduce
I am writing MapReduce job in Python, and want to use some third libraries like chardet. I konw that we can use option -libjars=... to include them for java MapReduce. But how to i
Solution 1:
Problem has been solved by zipimport
.
Then I zip chardet
to file module.mod
, and used like this:
importer = zipimport.zipimporter('module.mod')
chardet = importer.load_module('chardet')
Add -file module.mod
in hadoop streaming command.
Now chardet
can be used in script.
More details shown in: How can I include a python package with Hadoop streaming job?
Post a Comment for "Hadoop: How To Include Third Party Library In Python Mapreduce"