Dec-05-2019, 12:50 AM
Not sure, that there is more efficient way to do this, rather than using a loop; First, you need to define a processor, a function which consumes an xml-string and returns a value what you want (extract some value(s) from xml-string, convert them etc.).
Here is minimal working example:
try/except blocks to handle cases when data/xml-string is corrupted (or has unexpected structure).
def xml_processor(xml_string): # do processing return "The value what you want"There are different ways to write such a function. If xml-string has relatively simple structure, you can try to build a regular expression which do the work. For example, if you want to extract text within tag "InsuredSignatureOK" ('Yes' in the example above), you can define a regular expression for this. No special xml-parsing libraries will be needed in this case. However, this approach will work only in simple cases. Otherwise, you will need to use libraries for parsing xml-documents. You can use
xml
package -- which is the part of Python, or install lxml (for example).Here is minimal working example:
import pandas as pd import xml df = pd.DataFrame({"yourColumn": ["""<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns2:application xmlns:ns2="http://www.abc.com/rules/"><InsuredSignatureOK>Yes</InsuredSignatureOK></ns2:application> """, """<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns2:application xmlns:ns2="http://www.abc.com/rules/"><InsuredSignatureOK>Yes</InsuredSignatureOK></ns2:application>"""]}) def xml_processor(s): el = xml.dom.minidom.parseString(s) tag = el.getElemntByTagName("InsuredSignatureOK")[0] return tag.childNodes[0].data df.yourColumn = df.yourColumn.apply(xml_processor)Note,
xml_processor
I just wrote is very specific, and you probably will need to write your own and usetry/except blocks to handle cases when data/xml-string is corrupted (or has unexpected structure).