memory efficient way to change and parse a large XML file in python -
i want parse large xml file (25 gb) in python, , change of elements.
i tried elementtree xml.etree takes time @ first step (elementtree.parse).
i read somewhere sax fast , not load entire file memory parsing not modifying.
'iterparse' should parsing not modifying.
is there other option fast , memory efficient?
what important here need streaming parser, sax is. (there built in sax implementation in python , lxml provides one.) problem since trying modify xml file, have rewrite xml file read it.
an xml file text file, can't go , change data in middle of text file without rewriting entire text file (unless data exact same size unlikely)
you can use sax read in each element , register event write each element after been read , modified. if changes simple may faster not bother xml parsing , match text looking for.
if doing signinficant work large of xml file, shouldn't using xml file, should using database.
the problem have run here same issue cobol programmers on mainframes had when working file based data
Comments
Post a Comment