Hello, I would like to make un update. I managed to find somekind of solution with regex, but I have one problem. In code below, I can only get as many arguments as many times I wrote "([^:]+):(.+?))?(?:(<COMMA>)", so the regex expects up to 2 <COMMA> instances per "record" which produces up to 3 Attribute elements, but there might be a situation when I have in my file +100 arguments separated by <COMMA>. Do you have any idea how to make it find every argument without writing a mile long line of regex ?
import re from lxml import etree root = etree.Element("Module") with open("datafile.data", "r") as f: df = f.read() result = re.finditer(r'(?s)\n?(\d{1,5})=>(?:([^:]+):(.+?))(?:(<COMMA>)([^:]+):(.+?))?(?:(<COMMA>)([^:]+):(.+?))?(<END_OF_OBJECT_LINE>\n)', df) for m in result: obj = etree.SubElement(root, "Object") obj.set("id", m.groups()[0]) at = etree.SubElement(obj, "Attribute") at.set("name",m.groups()[1]) at.set("value",m.groups()[2]) for idx in range(len(m.groups())): if m.groups()[idx] == '<COMMA>': at = etree.SubElement(obj, "Attribute") at.set("name",m.groups()[idx + 1]) at.set("value",m.groups()[idx + 2]) print(etree.tostring(root, pretty_print=True).decode("utf-8"))