Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Parse XML String in Pandas Dataframe
Here is my situation:

I have a pandas dataframe that contains one column with an xml string for each row. I need to be able to parse the xml string for each row to see the data elements of the xml file. All the code I have been able to find is code to parse an actual xml file. I do not have the xml file, rather just the xml string (below is an example):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns2:application xmlns:ns2=""><InsuredSignatureOK>Yes</InsuredSignatureOK></ns2:application>

How could I return the 'Yes' if I wanted to see InsuredSignatureOK? My only thought was using a loop but I heard that is not the best way to go about it for large dataframes. I have never worked with xml before and am newish to python, so any help is greatly appreciated! Smile
Not sure, that there is more efficient way to do this, rather than using a loop; First, you need to define a processor, a function which consumes an xml-string and returns a value what you want (extract some value(s) from xml-string, convert them etc.).

def xml_processor(xml_string): 
    # do processing
    return "The value what you want"  
There are different ways to write such a function. If xml-string has relatively simple structure, you can try to build a regular expression which do the work. For example, if you want to extract text within tag "InsuredSignatureOK" ('Yes' in the example above), you can define a regular expression for this. No special xml-parsing libraries will be needed in this case. However, this approach will work only in simple cases. Otherwise, you will need to use libraries for parsing xml-documents. You can use xml package -- which is the part of Python, or install lxml (for example).
Here is minimal working example:

import pandas as pd
import xml
df = pd.DataFrame({"yourColumn": ["""<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns2:application xmlns:ns2=""><InsuredSignatureOK>Yes</InsuredSignatureOK></ns2:application> """, """<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns2:application xmlns:ns2=""><InsuredSignatureOK>Yes</InsuredSignatureOK></ns2:application>"""]}) 

def xml_processor(s): 
    el = xml.dom.minidom.parseString(s) 
    tag = el.getElemntByTagName("InsuredSignatureOK")[0] 
    return tag.childNodes[0].data 

df.yourColumn = df.yourColumn.apply(xml_processor)
Note, xml_processor I just wrote is very specific, and you probably will need to write your own and use
try/except blocks to handle cases when data/xml-string is corrupted (or has unexpected structure).
Thanks for your reply! I ended up finding a simpler approach and thought I would share for anyone dealing with XML string (although it does use a loop):

import pandas as pd
import xml.etree.ElementTree as ET

#establish dataframe
df = pd.DataFrame(myTable)

for x, row in df.iterrows() :
    myroot = ET.fromstring(row['myColumn']
    for InsuredSignatureOK in myroot.iter('InsuredSignatureOK') :

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  strange error from pandas dataframe djf123 1 286 Jul-27-2020, 05:25 AM
Last Post: scidam
  Pandas DataFrame not updating HelpMePlease 3 276 Jul-11-2020, 07:19 PM
Last Post: jefsummers
  Pandas DataFrame visual Truman 8 344 Jul-10-2020, 06:11 AM
Last Post: hussainmujtaba
  Pandas DataFrame and unmatched column sritsv19 0 281 Jul-07-2020, 12:52 PM
Last Post: sritsv19
  Pandas DataFrame Concatenate problems Kristenl2784 1 207 Jul-01-2020, 01:28 AM
Last Post: hussainmujtaba
  Difference of two columns in Pandas dataframe zinho 2 510 Jun-17-2020, 03:36 PM
Last Post: zinho
  error bars with dataframe and pandas Hucky 4 433 Apr-27-2020, 02:02 AM
Last Post: Hucky
  Python Pandas DataFrame Help AmericanEagle1989 1 275 Apr-12-2020, 12:37 PM
Last Post: AmericanEagle1989
  How does pyplot know what was plotted by the output of pandas.DataFrame(...).cumprod( codeowl 2 305 Mar-28-2020, 08:27 AM
Last Post: j.crater
  Ordering of pandas DataFrame new_to_python 5 362 Mar-15-2020, 06:08 PM
Last Post: new_to_python

Forum Jump:

Users browsing this thread: 1 Guest(s)