Reading Lines From A File Using Python
Solution 1:
I would highly suggest operating line-by-line instead of reading in the entire file all at once (in other words, don't use .read()).
withopen('tweets.txt', 'r') as fileinput:
for line in fileinput:
line = line.lower()
# ... do something with line ...# (for example, write the line to a new file, or print it)This will automatically take advantage of Python's built-in buffering capabilities.
Solution 2:
Try to work on the file one line at a time:
lowered = []
withopen('tweets.txt', 'r') as handle:
for line in handle:
# keep accumulating the results ...
lowered.append(line.lower())
# or just dump the to stdout right awayprint(line)
for line in lowered:
# print or write to file or whatever you requireThat way you reduce the memory overhead, which, in case of large files might lead to swapping and kill performance.
Here are some benchmarks on a file with about 1M lines:
# (1) real 0.223 user 0.195 sys 0.026 pcpu 98.71withopen('medium.txt') as handle:
for line in handle:
pass# (2) real 0.295 user 0.262 sys 0.025 pcpu 97.21withopen('medium.txt') as handle:
for i, line inenumerate(handle):
passprint(i) # 1031124# (3) real 21.561 user 5.072 sys 3.530 pcpu 39.89withopen('medium.txt') as handle:
for i, line inenumerate(handle):
print(line.lower())
# (4) real 1.702 user 1.605 sys 0.089 pcpu 99.50
lowered = []
withopen('medium.txt') as handle:
for i, line inenumerate(handle):
lowered.append(line.lower())
# (5) real 2.307 user 1.983 sys 0.159 pcpu 92.89
lowered = []
withopen('medium.txt', 'r') as handle:
for i, line inenumerate(handle):
lowered.append(line.lower())
withopen('lowered.txt', 'w') as handle:
for line in lowered:
handle.write(line)
You can also iterator over two files at once:
# (6) real 1.944 user 1.666 sys 0.115 pcpu 91.59withopen('medium.txt', 'r') as src, open('lowered.txt', 'w') as sink:
for i, line inenumerate(src):
sink.write(line.lower())
Results as table:
# (1) noop 0.223# (2) w/ enumerate 0.295# (4) list buffer 1.702# (6) on-the-fly 1.944# (5) r -> list buffer -> w 2.307# (3) stdout print 21.561Solution 3:
Change your script as follows:
withopen('tweets.txt', 'r') as fileinput:
for line in fileinput:
"""do what you need to do with each line"""
line = line.lower()
So, basically, don't read in the whole file into memory using read(), just iterate over lines of the opened file. When you read a huge file into memory your process may grow to a point where the system needs to swap out parts of it, and that will make it very slow.
Post a Comment for "Reading Lines From A File Using Python"