Python Multiprocessing - OverflowError('cannot Serialize A Bytes Object Larger Than 4GiB')
Solution 1:
Apparently is there an open issue about this topic , and there is a few related initiatives described on this particular answer. I Found a way to change the default pickle
protocol that is used in the multiprocessing
library based on this answer. As was pointed out in the comments this solution Only works with Linux and OS multiprocessing lib
Answer :
You first create a new separated module
pickle4reducer.py
from multiprocessing.reduction import ForkingPickler, AbstractReducer
class ForkingPickler4(ForkingPickler):
def __init__(self, *args):
if len(args) > 1:
args[1] = 2
else:
args.append(2)
super().__init__(*args)
@classmethod
def dumps(cls, obj, protocol=4):
return ForkingPickler.dumps(obj, protocol)
def dump(obj, file, protocol=4):
ForkingPickler4(file, protocol).dump(obj)
class Pickle4Reducer(AbstractReducer):
ForkingPickler = ForkingPickler4
register = ForkingPickler4.register
dump = dump
And then, in your main script you need to add the following:
import pickle4reducer
import multiprocessing as mp
ctx = mp.get_context()
ctx.reducer = pickle4reducer.Pickle4Reducer()
with mp.Pool(4) as p:
# do something
That will probably solve the problem of the overflow.
But, warning, you might consider reading this before doing anything or you might reach the same error as me:
'i' format requires -2147483648 <= number <= 2147483647
(the reason of this error is well explained in the link above). Long story short, multiprocessing
send data through all its process using the pickle
protocol, if you are already reaching the 4gb
limit, that probably means that you might consider redefining your functions more as "void" methods rather than input/output methods. All this inbound/outbound data increase the RAM usage, is probably inefficient by construction (my case) and it might be better to point all process to the same object rather than create a new copy for each call.
hope this helps.
Solution 2:
Supplementing answer from Pablo
The following problem can be resolved be Python3.8, if you are okay to use this version of python:
'i' format requires -2147483648 <= number <= 2147483647
Post a Comment for "Python Multiprocessing - OverflowError('cannot Serialize A Bytes Object Larger Than 4GiB')"