Skip to content Skip to sidebar Skip to footer

Parsing Parenthesized List In Python's Imaplib

I am looking for simple way to split parenthesized lists that come out of IMAP responses into Python lists or tuples. I want to go from '(BODYSTRUCTURE ('text' 'plain' ('charset' '

Solution 1:

pyparsing's nestedExpr parser function parses nested parentheses by default:

from pyparsing import nestedExpr

text = '(BODYSTRUCTURE ("text""plain" ("charset""ISO-8859-1") NIL NIL "quotedprintable" 1207 50 NIL NIL NIL NIL))'

print nestedExpr().parseString(text)

prints:

[['BODYSTRUCTURE', ['"text"', '"plain"', ['"charset"', '"ISO-8859-1"'], 'NIL', 'NIL', '"quoted printable"', '1207', '50', 'NIL', 'NIL', 'NIL', 'NIL']]]

Here is a slightly modified parser, which does parse-time conversion of integer strings to integers, from "NIL" to None, and stripping quotes from quoted strings:

from pyparsing import (nestedExpr, Literal, Word, alphanums, 
    quotedString, replaceWith, nums, removeQuotes)

NIL = Literal("NIL").setParseAction(replaceWith(None))
integer = Word(nums).setParseAction(lambda t:int(t[0]))
quotedString.setParseAction(removeQuotes)
content = (NIL | integer | Word(alphanums))

print nestedExpr(content=content, ignoreExpr=quotedString).parseString(text)

Prints:

[['BODYSTRUCTURE', ['text', 'plain', ['charset', 'ISO-8859-1'], None, None, 'quoted-printable', 1207, 50, None, None, None, None]]]

Solution 2:

The fact that there's nested tuples makes this impossible with a regex. You'll have to write a parser to denote when you're inside a parenthesis or not.

You could try

tuple('(BODYSTRUCTURE ("text""plain" ("charset""ISO-8859-1") NIL NIL "quoted-printable"120750 NIL NIL NIL NIL))'.replace("NIL", "None").split(' '))

Edit: Well I got something that works with your example, not sure it's what you want though.

BODYSTRUCTURE needs to be defined somewhere.

eval(",".join([a forain '(BODYSTRUCTURE ("text""plain" ("charset""ISO-8859-1") NIL NIL "quoted-printable"120750 NIL NIL NIL NIL))'.replace("NIL", "None").split(' ')]))

Solution 3:

Taking out only internal part of the server answer containing actualy the body structure:

struct = ('(((("TEXT""PLAIN" ("CHARSET""ISO-8859-1") NIL NIL "7BIT" 16 2)'
         '("TEXT""HTML" ("CHARSET""ISO-8859-1") NIL NIL "QUOTED-PRINTABLE"'
         ' 392 6) "ALTERNATIVE")("IMAGE""GIF" ("NAME""538.gif") '
         '"<538@goomoji.gmail>" NIL "BASE64" 172)("IMAGE""PNG" ("NAME" '
         '"4F4.png") "<gtalk.4F4@goomoji.gmail>" NIL "BASE64" 754) "RELATED")'
         '("IMAGE""JPEG" ("NAME""avatar_airbender.jpg") NIL NIL "BASE64"'
         ' 157924) "MIXED")')

Next step is to replace some tokens, what would prepair string to transform into python types:

struct = struct.replace(' ', ',').replace(')(', '),(')

Using built-in module compiler to parse our structure:

import compiler
expr = compiler.parse(struct.replace(' ', ',').replace(')(', '),('), 'eval')

Performing simple recursive function to transform expression:

deftransform(expression):
    ifisinstance(expression, compiler.transformer.Expression):
        return transform(expression.node)
    elifisinstance(expression, compiler.transformer.Tuple):
        returntuple(transform(item) for item in expression.nodes)
    elifisinstance(expression, compiler.transformer.Const):
        return expression.value
    elifisinstance(expression, compiler.transformer.Name):
        returnNoneif expression.name == 'NIL'else expression.name

And finally we get the desired result as nested python tuples:

result= transform(expr)
print result

(((('TEXT', 'PLAIN', ('CHARSET', 'ISO-8859-1'), None, None, '7BIT', 16, 2), ('TEXT', 'HTML', ('CHARSET', 'ISO-8859-1'), None, None, 'QUOTED-PRINTABLE', 392, 6), 'ALTERNATIVE'), ('IMAGE', 'GIF', ('NAME', '538.gif'), '<538@goomoji.gmail>', None, 'BASE64', 172), ('IMAGE', 'PNG', ('NAME', '4F4.png'), '<gtalk.4F4@goomoji.gmail>', None, 'BASE64', 754), 'RELATED'), ('IMAGE', 'JPEG', ('NAME', 'avatar_airbender.jpg'), None, None, 'BASE64', 157924), 'MIXED')

From where we can recognize different headers of body structure:

text, attachments = (result[0], result[1:])

Post a Comment for "Parsing Parenthesized List In Python's Imaplib"