Python Forum
How to remove html content from a column of the datafarme in Python3.6?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to remove html content from a column of the datafarme in Python3.6?
#1
Hi,

I have a csv file which have a column-merchant product id.
This column has an unexpected data in the form of html like - <p class=""MsoNormal"" style=""border:

For further processing I need to remove this from my dataframe.
How can I remove this unexpected html value from this column? PFA sheet containing some data for testing also.

Attached Files

.xlsx   df.xlsx (Size: 11.92 KB / Downloads: 318)
Reply
#2
the file is not csv, it's slsx
row 210 has html code enbedded in it.
probably some sort of formatting, but if it's not needed, open in a spreadsheet and delete that line, then save as csv file which will be easier to read.

I have attached modified file

Attached Files

.csv   df.csv (Size: 2.43 KB / Downloads: 145)
Reply
#3
(Jul-25-2018, 09:38 PM)Larz60+ Wrote: the file is not csv, it's slsx
row 210 has html code enbedded in it.
probably some sort of formatting, but if it's not needed, open in a spreadsheet and delete that line, then save as csv file which will be easier to read.

I have attached modified file

I thought there must be a Python way to deal with this.
I have given the sample only, I have total 4000 records in which there are many product code with html data and that should be removed.
Reply
#4
this appears to be a script you can modify to do what you wish:
https://stackoverflow.com/questions/2010...ing-python


Don't run this code. as it only writes a new file and that's not what you want to do.
I'll check this post later today, and if someone hasn't answered it by then, I'll see what I can suggest.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Convert dataframe from str back to datafarme Creepy 1 641 Jul-07-2023, 02:13 PM
Last Post: snippsat
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 939 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  color column in mail html df bnadir55 0 730 Aug-14-2022, 07:11 AM
Last Post: bnadir55
  How to remove a column or two columns in a correlation heatmap? lulu43366 3 5,250 Sep-30-2021, 03:47 PM
Last Post: lulu43366
  Want to remove the text from a particular column in excel shantanu97 2 2,155 Jul-05-2021, 05:42 PM
Last Post: eddywinch82
  reading html and edit chekcbox to html jacklee26 5 3,096 Jul-01-2021, 10:31 AM
Last Post: snippsat
  Remove single and double quotes from a csv file in 3 to 4 column shantanu97 0 7,002 Mar-31-2021, 10:52 AM
Last Post: shantanu97
  Not able to use boto library with compressed content in python3 avinash2020 1 1,897 Aug-13-2020, 09:24 PM
Last Post: avinash2020
  How to remove duplicate elements in HTML? Xiesxes 2 2,876 Mar-04-2020, 12:02 PM
Last Post: Larz60+
  Gnuradio python3 is not compatible python3 xmlrpc library How Can I Fix İt ? muratoznnnn 3 4,936 Nov-07-2019, 05:47 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020