How To Take Preceding Element When Iterating Over Xml In Python?

Question

I have an XML structured like this:

Solution 1:

If I fully understand your needs, you want to select text nodes which respect the following condition :

bbox value of the text node - bbox value of the preceding text nodes not greater than 10.

You could try with XSL and XPath. First the XSL code (mandatory step to compare bbox value with XPath in the next step) :

<xsl:stylesheetversion="1.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:outputomit-xml-declaration="no"indent="yes"/><xsl:templatematch="@bbox"><xsl:attributename="{name()}"><xsl:value-ofselect="substring(.,1,3)" /></xsl:attribute></xsl:template><xsl:templatematch="@font"><xsl:attributename="{name()}"><xsl:text>NUMPTY+ImprintMTnum</xsl:text></xsl:attribute></xsl:template><xsl:templatematch="*[not(node())]"/><xsl:strip-spaceelements="*"/><xsl:templatematch="@*|node()"><xsl:copy><xsl:apply-templatesselect="@*|node()"/></xsl:copy></xsl:template></xsl:stylesheet>

Then :

import lxml.etree as IP

xml = IP.parse(xml_filename)
xslt = IP .parse(xsl_filename)
transform = IP.XSLT(xslt)

Then request with :

tree = IP.parse(transform)
for nodes in tree.xpath("//text[@bbox<preceding::text[1]/@bbox+11]"):
    print(nodes)

Replace //text[@bbox<preceding::text[1]/@bbox+11] with //text[@bbox>preceding::text[1]/@bbox] to test with your sample data (will select text nodes with greater bbox value than the preceding text bbox value).

Python Programming Language

How To Take Preceding Element When Iterating Over Xml In Python?

Solution 1:

Post a Comment for "How To Take Preceding Element When Iterating Over Xml In Python?"