Python Forum
HTML Decoder pandas dataframe column
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
HTML Decoder pandas dataframe column
#1
I am getting html that I want to decode. If I do it with an example it works but not with my pandas dataframe. Any suggestions?

 #!/usr/bin/env python
# coding: utf-8

# import statements
import requests
import pandas as pd
import html

# constants
url = "https://chartexp1.sha.maryland.gov/CHARTExportClientService/getDMSMapDataJSON.do"

# getting response
response = requests.request("GET", url).json()

# converting to dataframe
df = pd.DataFrame(response['data'])

#adding new column/converting msgHTML Encoded to decoded
df['decodedHtml'] = html.unescape(df['msgHTML'])

# saving dataframe to csv
df.to_csv('output/response_python.csv')


##TESTING ONLY##
myHtml = "<body><h1> How to use html.unescape() in Python </h1></body>"
encodedHtml = html.escape(myHtml)
print("Encoded HTML: ", encodedHtml)
decodedHtml = html.unescape(encodedHtml)

print("Decoded HTML: ", decodedHtml)

print(html.unescape('&copy; 2023'))

 
Reply
#2
Hi,

the information provided is a bit thin... Is `response['data'] really HTML? I tried to make an API call with the URL from you post, but I receive a time-out error...

What do you get instead when exporting your dataframe to CSV?

Regards, noisefloor
Reply
#3
Thanks for the reply. I apologize for the lack of information

The issue is with the following line:

#adding new column/converting msgHTML Encoded to decoded
df['decodedHtml'] = html.unescape(df['msgHTML'])
The issue is df['msgHTML'] has content similiar to the following

&lt;table class='dmsMsg'&gt;&lt;tr class='dmsMsgRow'&gt;&lt;td class='dmsMsgTextCenter'&gt;I-695       15 MILES&lt;/td&gt;&lt;/tr&gt;&lt;tr class='dmsMsgRow'&gt;&lt;td class='dmsMsgTextCenter'&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr class='dmsMsgRow'&gt;&lt;td class='dmsMsgTextCenter'&gt; 14 MINUTES&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
What I am attempting to do is the convert that to the following format

<table class='dmsMsg'><tr class='dmsMsgRow'><td class='dmsMsgTextCenter'>I-695       15 MILES</td></tr><tr class='dmsMsgRow'><td class='dmsMsgTextCenter'>&nbsp;</td></tr><tr class='dmsMsgRow'><td class='dmsMsgTextCenter'> 14 MINUTES</td></tr></table>
Reply
#4
html.unsescape(str) cannot be used in a vectorized solution. Have to fall back to using DataFrame.apply(func)
df["msgHTML"] = df["msgHTML"].apply(html.unescape)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question [Solved] Formatting cells of a pandas dataframe into an OpenDocument ods spreadsheet Calab 1 678 Mar-01-2025, 04:51 AM
Last Post: Calab
  Find duplicates in a pandas dataframe list column on other rows Calab 2 2,195 Sep-18-2024, 07:38 PM
Last Post: Calab
  Find strings by index from a list of indexes in a different Pandas dataframe column Calab 3 1,631 Aug-26-2024, 04:52 PM
Last Post: Calab
  Create new column in dataframe Scott 10 3,596 Jun-30-2024, 10:18 PM
Last Post: Scott
  attempt to split values from within a dataframe column mbrown009 9 5,951 Jun-20-2024, 07:59 PM
Last Post: AdamHensley
  Putting column name to dataframe, can't work. jonah88888 2 3,279 Jun-18-2024, 09:19 PM
Last Post: AdamHensley
  Add NER output to pandas dataframe dg3000 0 1,157 Apr-22-2024, 08:14 PM
Last Post: dg3000
  concat 3 columns of dataframe to one column flash77 2 2,135 Oct-03-2023, 09:29 PM
Last Post: flash77
  Use pandas to obtain cartesian product between a dataframe of int and equations? haihal 0 2,019 Jan-06-2023, 10:53 PM
Last Post: haihal
  pandas column percentile nuncio 7 4,556 Aug-10-2022, 04:41 AM
Last Post: nuncio

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020