Python Forum
How to remove html content from a column of the datafarme in Python3.6? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to remove html content from a column of the datafarme in Python3.6? (/thread-11788.html)



How to remove html content from a column of the datafarme in Python3.6? - PrateekG - Jul-25-2018

Hi,

I have a csv file which have a column-merchant product id.
This column has an unexpected data in the form of html like - <p class=""MsoNormal"" style=""border:

For further processing I need to remove this from my dataframe.
How can I remove this unexpected html value from this column? PFA sheet containing some data for testing also.


RE: How to remove html content from a column of the datafarme in Python3.6? - Larz60+ - Jul-25-2018

the file is not csv, it's slsx
row 210 has html code enbedded in it.
probably some sort of formatting, but if it's not needed, open in a spreadsheet and delete that line, then save as csv file which will be easier to read.

I have attached modified file


RE: How to remove html content from a column of the datafarme in Python3.6? - PrateekG - Jul-26-2018

(Jul-25-2018, 09:38 PM)Larz60+ Wrote: the file is not csv, it's slsx
row 210 has html code enbedded in it.
probably some sort of formatting, but if it's not needed, open in a spreadsheet and delete that line, then save as csv file which will be easier to read.

I have attached modified file

I thought there must be a Python way to deal with this.
I have given the sample only, I have total 4000 records in which there are many product code with html data and that should be removed.


RE: How to remove html content from a column of the datafarme in Python3.6? - Larz60+ - Jul-26-2018

this appears to be a script you can modify to do what you wish:
https://stackoverflow.com/questions/20105118/convert-xlsx-to-csv-correctly-using-python


Don't run this code. as it only writes a new file and that's not what you want to do.
I'll check this post later today, and if someone hasn't answered it by then, I'll see what I can suggest.