Hadoop: How To Include Third Party Library In Python Mapreduce
I am writing MapReduce job in Python, and want to use some third libraries like chardet. I konw that we can use option -libjars=... to include them for java MapReduce. But how to i
Solution 1:
Problem has been solved by zipimport.
Then I zip chardet to file module.mod, and used like this:
importer = zipimport.zipimporter('module.mod')
chardet = importer.load_module('chardet')
Add -file module.mod in hadoop streaming command.
Now chardet can be used in script.
More details shown in: How can I include a python package with Hadoop streaming job?
Post a Comment for "Hadoop: How To Include Third Party Library In Python Mapreduce"