How to remove html content from a column of the datafarme in Python3.6? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How to remove html content from a column of the datafarme in Python3.6? (/thread-11788.html) |
How to remove html content from a column of the datafarme in Python3.6? - PrateekG - Jul-25-2018 Hi, I have a csv file which have a column-merchant product id. This column has an unexpected data in the form of html like - <p class=""MsoNormal"" style=""border: For further processing I need to remove this from my dataframe. How can I remove this unexpected html value from this column? PFA sheet containing some data for testing also. RE: How to remove html content from a column of the datafarme in Python3.6? - Larz60+ - Jul-25-2018 the file is not csv, it's slsx row 210 has html code enbedded in it. probably some sort of formatting, but if it's not needed, open in a spreadsheet and delete that line, then save as csv file which will be easier to read. I have attached modified file RE: How to remove html content from a column of the datafarme in Python3.6? - PrateekG - Jul-26-2018 (Jul-25-2018, 09:38 PM)Larz60+ Wrote: the file is not csv, it's slsx I thought there must be a Python way to deal with this. I have given the sample only, I have total 4000 records in which there are many product code with html data and that should be removed. RE: How to remove html content from a column of the datafarme in Python3.6? - Larz60+ - Jul-26-2018 this appears to be a script you can modify to do what you wish: https://stackoverflow.com/questions/20105118/convert-xlsx-to-csv-correctly-using-python Don't run this code. as it only writes a new file and that's not what you want to do. I'll check this post later today, and if someone hasn't answered it by then, I'll see what I can suggest. |