Skip to content Skip to sidebar Skip to footer

Parsing Data From Text File

I have a text file that has content like this: ******** ENTRY 01 ******** ID: 01 Data1: 0.1834869385E-002 Data2: 10.9598489301 Data3:

Solution 1:

It is very far from CSV, actually.

You can use the file as an iterator; the following generator function yields complete sections:

defload_sections(filename):
    withopen(filename, 'r') as infile:
        line = ''whileTrue:
            whilenot line.startswith('****'): 
                line = next(infile)  # raises StopIteration, ending the generatorcontinue# find next entry

            entry = {}
            for line in infile:
                line = line.strip()
                ifnot line: break

                key, value = map(str.strip, line.split(':', 1))
                entry[key] = value

            yield entry

This treats the file as an iterator, meaning that any looping advances the file to the next line. The outer loop only serves to move from section to section; the inner while and for loops do all the real work; first skip lines until a **** header section is found (otherwise discarded), then loop over all non-empty lines to create a section.

Use the function in a loop:

forsectioninload_sections(filename):
    printsection

Repeating your sample data in a text file results in:

>>> for section in load_sections('/tmp/test.txt'):
... print section
... 
{'Data4': '715', 'Data1': '0.1834869385E-002', 'ID': '01', 'Data3': '-0.1091356549E+001', 'Data2': '10.9598489301'}
{'Data4': '715', 'Data1': '0.1834869385E-002', 'ID': '01', 'Data3': '-0.1091356549E+001', 'Data2': '10.9598489301'}
{'Data4': '715', 'Data1': '0.1834869385E-002', 'ID': '01', 'Data3': '-0.1091356549E+001', 'Data2': '10.9598489301'}

You can add some data converters to that if you want to; a mapping of key to callable would do:

converters = {'ID': int, 'Data1': float, 'Data2': float, 'Data3': float, 'Data4': int}

then in the generator function, instead of entry[key] = value do entry[key] = converters.get(key, lambda v: v)(value).

Solution 2:

my_file:

********ENTRY01********ID:01Data1:0.1834869385E-002Data2:10.9598489301Data3:-0.1091356549E+001Data4:715ID:02Data1:0.18348674325E-012Data2:10.9598489301Data3:0.0Data4:5748ID:03Data1:20.1834869385E-002Data2:10.954576354Data3:10.13476858762435E+001Data4:7456

Python script:

import re

withopen('my_file', 'r') as f:
    data  = list()
    group = dict()
    for key, value in re.findall(r'(.*):\s*([\dE+-.]+)', f.read()):
        if key in group:
            data.append(group)
            group = dict()
        group[key] = value
    data.append(group)

print data

Printed output:

[
    {
        'Data4': '715',
        'Data1': '0.1834869385E-002',
        'ID': '01',
        'Data3': '-0.1091356549E+001',
        'Data2': '10.9598489301'
    },
    {
        'Data4': '5748',
        'Data1': '0.18348674325E-012',
        'ID': '02',
        'Data3': '0.0',
        'Data2': '10.9598489301'
    },
    {
        'Data4': '7456',
        'Data1': '20.1834869385E-002',
        'ID': '03',
        'Data3': '10.13476858762435E+001',
        'Data2': '10.954576354'
    }
]

Solution 3:

A very simple approach could be

all_objects = []

withopen("datafile") as f:
    for L in f:
        if L[:3] == "***":
            # Line starts with asterisks, create a new object
            all_objects.append({})
        elif":"in L:
            # Line is a key/value field, update current object
            k, v = map(str.strip, L.split(":", 1))
            all_objects[-1][k] = v

Post a Comment for "Parsing Data From Text File"