Python Forum
Parse XML String in Pandas Dataframe
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parse XML String in Pandas Dataframe
#2
Not sure, that there is more efficient way to do this, rather than using a loop; First, you need to define a processor, a function which consumes an xml-string and returns a value what you want (extract some value(s) from xml-string, convert them etc.).

def xml_processor(xml_string): 
    # do processing
    return "The value what you want"  
There are different ways to write such a function. If xml-string has relatively simple structure, you can try to build a regular expression which do the work. For example, if you want to extract text within tag "InsuredSignatureOK" ('Yes' in the example above), you can define a regular expression for this. No special xml-parsing libraries will be needed in this case. However, this approach will work only in simple cases. Otherwise, you will need to use libraries for parsing xml-documents. You can use xml package -- which is the part of Python, or install lxml (for example).
Here is minimal working example:

import pandas as pd
import xml
 
df = pd.DataFrame({"yourColumn": ["""<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns2:application xmlns:ns2="http://www.abc.com/rules/"><InsuredSignatureOK>Yes</InsuredSignatureOK></ns2:application> """, """<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns2:application xmlns:ns2="http://www.abc.com/rules/"><InsuredSignatureOK>Yes</InsuredSignatureOK></ns2:application>"""]}) 

def xml_processor(s): 
    el = xml.dom.minidom.parseString(s) 
    tag = el.getElemntByTagName("InsuredSignatureOK")[0] 
    return tag.childNodes[0].data 

df.yourColumn = df.yourColumn.apply(xml_processor)
Note, xml_processor I just wrote is very specific, and you probably will need to write your own and use
try/except blocks to handle cases when data/xml-string is corrupted (or has unexpected structure).
Reply


Messages In This Thread
Parse XML String in Pandas Dataframe - by creedX - Dec-04-2019, 03:19 PM
RE: Parse XML String in Pandas Dataframe - by scidam - Dec-05-2019, 12:50 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
Question [Solved] Formatting cells of a pandas dataframe into an OpenDocument ods spreadsheet Calab 1 735 Mar-01-2025, 04:51 AM
Last Post: Calab
  Find duplicates in a pandas dataframe list column on other rows Calab 2 2,266 Sep-18-2024, 07:38 PM
Last Post: Calab
  Find strings by index from a list of indexes in a different Pandas dataframe column Calab 3 1,662 Aug-26-2024, 04:52 PM
Last Post: Calab
  Add NER output to pandas dataframe dg3000 0 1,174 Apr-22-2024, 08:14 PM
Last Post: dg3000
  HTML Decoder pandas dataframe column mbrown009 3 2,729 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  Use pandas to obtain cartesian product between a dataframe of int and equations? haihal 0 2,040 Jan-06-2023, 10:53 PM
Last Post: haihal
  Parse Nested JSON String in Python rwalde 4 5,274 Sep-08-2022, 10:32 AM
Last Post: rwalde
  how to parse this array with pandas? netanelst 1 2,088 May-17-2022, 12:42 PM
Last Post: netanelst
  Pandas Dataframe Filtering based on rows mvdlm 0 2,094 Apr-02-2022, 06:39 PM
Last Post: mvdlm
  Pandas dataframe: calculate metrics by year mcva 1 3,458 Mar-02-2022, 08:22 AM
Last Post: mcva

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020