Apr-13-2022, 08:29 PM
(Apr-13-2022, 05:04 PM)malcoverc Wrote: I have some generated data files I want to format to XML:See section 3.3.3 of the XML definition. Be aware that it says that newlines are replaced by spaces, and then that sequences of spaces be reduced to a single space, so if I have read the spec correctly, you may not end up with what you expect to end up with. See the example table right before section 3.4.
1234=>item1:something11: something11<COMMA>item4:something12: 12something<END_OF_OBJECT_LINE> 1238=>item8:something12: something11:<END_OF_OBJECT_LINE> 2345=>item2:something12: something11:<END_OF_OBJECT_LINE> 123=>item1:something1: something11<COMMA>item2:something: 11something<COMMA>item4:something: 11something<END_OF_OBJECT_LINE>What I Tried to do is to replace some specified regular expression to make it look like XML:
with open("OGfile.data", "r") as f: with open("tempfile.data", "w") as fo: # formating file to XML format contents = f.readlines() contents.insert(0, "<?xml version='1.0' encoding='UTF-8'?>\n<Module>\n<Object id='") contents =[w.replace("<END_OF_OBJECT_LINE>\n", "'/>\n</Object>\n<Object id='") for w in contents] contents =[w.replace("=>", "'>\n <Attribute name='") for w in contents] contents =[w.replace('<COMMA>', "'/>\n <Attribute name='") for w in contents] contents =[w.replace(':something', "' value='something") for w in contents] # saving formated file to new file contents = "".join(contents) fo.write(contents) # fixing invalid last line from formated file with open("tempfile.data", "r") as f2: with open("finalfile.data", "w") as fo2: contents2 = f2.readlines() contents2 = [w.replace("<END_OF_OBJECT_LINE>", "'/>\n</Object>\n</Module>") for w in contents2] contents2 = "".join(contents2) fo2.write(contents2)and It works fine, I made it into:
<?xml version='1.0' encoding='UTF-8'?> <Module> <Object id='1234'> <Attribute name='item1' value='something11: something11'/> <Attribute name='item4' value='something12: 12something'/> </Object> <Object id='1238'> <Attribute name='item8' value='something12: something11:'/> </Object> <Object id='2345'> <Attribute name='item2' value='something12: something11:'/> </Object> <Object id='123'> <Attribute name='item1' value='something1: something11'/> <Attribute name='item2' value='something: 11something'/> <Attribute name='item4' value='something: 11something'/> </Object> </Module>BUT, there is one problem, I am changingcontents =[w.replace(':something', "' value='something") for w in contents]
just by taking this value but if it would start with something different instead of "something" i would be doomed. I have been thinking about using regex to take string between"Attribute name:"
and"<COMMA>"
or"<END_OF_OBJECT_LINE>"
, but my attemps failed misserably because I am quite new into programming and python. It could be also done much better if I could somehow insert convert this .data file into dictionary and then make it into xml in proper way, but I have no idea how to separate it corretly to dictionary. Do you have any suggestions?
You have not shown an example of the regular expressions you tried. The regular expression syntax is very straightforward, but the key is in using parentheses to specify the pattern you are looking for, but that is unclear since you refer to :something as your desired pattern, but there is no place in the input I see :something appearing. If you could show the string before the replace and after the replace (print statements are very good for this) as well as the pattern you are using, it would make things a lot clearer.