Flatmap Over List Of Custom Objects In Pyspark
I'm getting an error when running flatMap() on a list of objects of a class. It works fine for regular python data types like int, list etc. but I'm facing an error when the list c
Solution 1:
Error you get is completely unrelated to flatMap
. If you define node
class in your main script it is accessible on a driver but it is not distributed to the workers. To make it work you should place node
definition inside separate module and makes sure it is distributed to the workers.
- Create separate module with
node
definition, lets call itnode.py
Import this
node
class inside your main script:from node import node
Make sure module is distributed to the workers:
sc.addPyFile("node.py")
Now everything should work as expected.
On a side note:
- PEP 8 recommends CapWords for class names. It is not a hard requirement but it makes life easier
__repr__
method should return a string representation of an object. At least make sure it is astring
, but a proper representation is even better:def__repr__(self): return"node({0})".format(repr(self.value))
Post a Comment for "Flatmap Over List Of Custom Objects In Pyspark"