2.2 KiB
2.2 KiB
| name | description | author | version | date |
|---|---|---|---|---|
| lxml-iterparse-recover | Handle malformed XML in lxml iterparse with recover mode. Use when: (1) lxml.etree.XMLSyntaxError "AttValue: ' expected" or similar parse errors on large XML files like Apple Health exports, (2) need streaming XML parsing that tolerates broken attributes, (3) tried passing parser=XMLParser(recover=True) to iterparse and got "unexpected keyword argument 'parser'". The recover flag is a direct parameter of iterparse, not passed via a parser object. | Claude Code | 1.0.0 | 2026-02-08 |
lxml iterparse: Recovering from Malformed XML
Problem
Large XML files (e.g., Apple Health exports) sometimes contain malformed attribute
values with unescaped characters. lxml's iterparse raises XMLSyntaxError and
aborts parsing, losing all data after the corrupt element.
Context / Trigger Conditions
- Parsing large XML files with
lxml.etree.iterparse - Error:
lxml.etree.XMLSyntaxError: AttValue: ' expected, line NNNN, column NNN - The malformed XML is from an external source you can't control (Apple Health, etc.)
- You want to skip corrupt elements and continue parsing
Solution
Pass recover=True directly to iterparse — it's a first-class parameter:
from lxml import etree
context = etree.iterparse(
file_path,
events=("end",),
tag=("Record", "Workout"),
recover=True, # Skip malformed elements instead of aborting
)
Common mistake: Trying to pass a parser object:
# WRONG — iterparse does NOT accept a parser= keyword
parser = etree.XMLParser(recover=True)
context = etree.iterparse(file_path, parser=parser)
# TypeError: __init__() got an unexpected keyword argument 'parser'
Verification
Parse the full file without XMLSyntaxError. Check context.error_log after
parsing to see which elements were skipped.
Notes
recover=Truedefaults toTruefor HTML mode,Falsefor XML mode- Recovered elements may have missing or truncated attributes — always validate parsed values before using them
- Other useful iterparse flags:
huge_tree=Truefor very deep/large documents - The full list of iterparse parameters can be viewed with
help(etree.iterparse)