Apr-13-2022, 05:04 PM
I have some generated data files I want to format to XML:
1234=>item1:something11: something11<COMMA>item4:something12: 12something<END_OF_OBJECT_LINE> 1238=>item8:something12: something11:<END_OF_OBJECT_LINE> 2345=>item2:something12: something11:<END_OF_OBJECT_LINE> 123=>item1:something1: something11<COMMA>item2:something: 11something<COMMA>item4:something: 11something<END_OF_OBJECT_LINE>What I Tried to do is to replace some specified regular expression to make it look like XML:
with open("OGfile.data", "r") as f: with open("tempfile.data", "w") as fo: # formating file to XML format contents = f.readlines() contents.insert(0, "<?xml version='1.0' encoding='UTF-8'?>\n<Module>\n<Object id='") contents =[w.replace("<END_OF_OBJECT_LINE>\n", "'/>\n</Object>\n<Object id='") for w in contents] contents =[w.replace("=>", "'>\n <Attribute name='") for w in contents] contents =[w.replace('<COMMA>', "'/>\n <Attribute name='") for w in contents] contents =[w.replace(':something', "' value='something") for w in contents] # saving formated file to new file contents = "".join(contents) fo.write(contents) # fixing invalid last line from formated file with open("tempfile.data", "r") as f2: with open("finalfile.data", "w") as fo2: contents2 = f2.readlines() contents2 = [w.replace("<END_OF_OBJECT_LINE>", "'/>\n</Object>\n</Module>") for w in contents2] contents2 = "".join(contents2) fo2.write(contents2)and It works fine, I made it into:
<?xml version='1.0' encoding='UTF-8'?> <Module> <Object id='1234'> <Attribute name='item1' value='something11: something11'/> <Attribute name='item4' value='something12: 12something'/> </Object> <Object id='1238'> <Attribute name='item8' value='something12: something11:'/> </Object> <Object id='2345'> <Attribute name='item2' value='something12: something11:'/> </Object> <Object id='123'> <Attribute name='item1' value='something1: something11'/> <Attribute name='item2' value='something: 11something'/> <Attribute name='item4' value='something: 11something'/> </Object> </Module>BUT, there is one problem, I am changing
contents =[w.replace(':something', "' value='something") for w in contents]
just by taking this value but if it would start with something different instead of "something" i would be doomed. I have been thinking about using regex to take string between "Attribute name:"
and "<COMMA>"
or "<END_OF_OBJECT_LINE>"
, but my attemps failed misserably because I am quite new into programming and python. It could be also done much better if I could somehow insert convert this .data file into dictionary and then make it into xml in proper way, but I have no idea how to separate it corretly to dictionary. Do you have any suggestions?