Skip to content Skip to sidebar Skip to footer

How Can I Convert An Absolutely Massive Number To A String In A Reasonable Amount Of Time?

This is quite an odd problem I know, but I'm trying to get a copy of the current largest prime number in a file. Getting the number in integer form is fairly easy. I just run this.

Solution 1:

Repeated string concatenation is notoriously inefficient since Python strings are immutable. I would go for

strprime = str(prime)

In my benchmarks, this is consistently the fastest solution. Here's my little benchmark program:

import decimal

def f1(x):
    ''' Definition by OP '''
    strprime = ""
    while x > 0:
        strprime += str(x%10)
        x //= 10
    return strprime

def digits(x):
    while x > 0:
        yield x % 10
        x //= 10

def f2(x):
    ''' Using string.join() to avoid repeated string concatenation '''
    return "".join((chr(48 + d) for d in digits(x)))

def f3(x):
    ''' Plain str() '''
    return str(x)

def f4(x):
    ''' Using Decimal class'''
    return decimal.Decimal(x).to_eng_string()

x = 2**100

if __name__ == '__main__':
    import timeit
    for i in range(1,5):
        funcName = "f" + str(i)
        print(funcName+ ": " + str(timeit.timeit(funcName + "(x)", setup="from __main__ import " + funcName + ", x")))

For me, this prints (using Python 2.7.10):

f1: 15.3430171013
f2: 20.8928260803
f3: 0.310356140137
f4: 2.80087995529

Solution 2:

Python's integer to string conversion algorithm uses a simplistic algorithm with a running of O(n**2). As the length of the number doubles, the conversion time quadruples.

Some simple tests on my computer show the increase in running time:

$ time py35 -c "n=str(2**1000000)"
user    0m1.808s
$ time py35 -c "n=str(2**2000000)"
user    0m7.128s
$ time py35 -c "n=str(2**4000000)"
user    0m28.444s
$ time py35 -c "n=str(2**8000000)"
user    1m54.164s

Since the actual exponent is about 10 times larger than my last test value, it should take about 100 times longer. Or just over 3 hours.

Can it be done faster? Yes. There are several methods that are faster.

Method 1

It is faster to divide the very large number by a power-of-10 into two roughly equal-sized but smaller numbers. The process is repeated until the numbers are relatively small. Then str() is used on each number and leading zeroes are used to pad the result to the same length as the last power-of-10. Then the strings are joined to form the final result. This method is used by the mpmath library and the documentation implies it should be about 3x faster.

Method 2

Python's integers are stored in binary format. Binary is great for calculations but binary-to-decimal conversion is the bottleneck. It is possible to define your own integer type that stores the value in blocks of 100 (or some similar value) decimal digits. Operations (exponentiation, multiplication, division) will be slower but conversion to a string will be very fast.

Many years ago, I implemented such a class and used efficient algorithms for multiplication and division. The code is no longer available on the Internet but I did find a backup copy that I tested. The running time was reduced to ~14 seconds.

Update

I updated the DecInt code referenced above and it is now available at https://github.com/casevh/DecInt.

If Python's native integer type is used, the total running time is less than 14 seconds on my computer. If gmpy2's integer type is used instead, the running time is ~3.5 seconds.

$ py35 DecInt.py
Calculating 2^74207281
Exponentiation time: 3.236
Conversion to decimal format: 0.304
Total elapsed time: 3.540
Length of result: 22338618 digits

Method 3

I maintain the gmpy2 library that provide easy access to the GMP library for fast integer arithmetic. GMP implements Method 1 in highly optimized C and assembly code and calculates the prime number and the string representation in ~5 seconds.

Method 4

The decimal module in Python stores values as decimal digits. Recent versions of Python 3 include a C implementation of the decimal library that is much faster that the pure-Python implementation include with Python 2. The C implementation run in just over 3 seconds on my computer.

from decimal import *
getcontext().prec = 23000000
getcontext().Emin = -999999999
getcontext().Emax = 999999999
x=Decimal(2)**74207281 - 1
s=str(x)

Solution 3:

Took about 32 seconds to output the file using WinGhci (Haskell language):

import System.IO

main = writeFile "prime.txt" (show (2^74207281 - 1))

The file was 21 megabytes; the last four digits, 6351.


Solution 4:

There is gmp, the GNU Multiple Precision Arithmetic Library. It is particularly designed at handling huge numbers fast.


Post a Comment for "How Can I Convert An Absolutely Massive Number To A String In A Reasonable Amount Of Time?"