Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Reading a html file
#3
I have saved a website in my HD as html file and I try to use BeautifulSoup to scrape it.
the problem is that for some reason < gets replaces to &lt; and > to &gt;


example:
from bs4 import BeautifulSoup as soup
my_url = open('test2.html', 'r')
page_soup = soup(my_url, "html.parser")
print (page_soup)
Reply


Messages In This Thread
Reading a html file - by peterl - Aug-20-2018, 01:37 PM
RE: Reading a html file - by metulburr - Aug-20-2018, 01:48 PM
RE: Reading a html file - by peterl - Aug-20-2018, 02:25 PM
RE: Reading a html file - by snippsat - Aug-20-2018, 02:57 PM
RE: Reading a html file - by peterl - Aug-20-2018, 03:16 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
Lightbulb Python Obstacles | Kung-Fu | Full File HTML Document Scrape and Store it in MariaDB BrandonKastning 5 3,121 Dec-29-2021, 02:26 AM
Last Post: BrandonKastning
  show csv file in flask template.html rr28rizal 8 35,358 Apr-12-2021, 09:24 AM
Last Post: adamabusamra
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,917 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Open and read a tab delimited file from html using python cgi luffy 2 2,811 Aug-24-2020, 06:25 AM
Last Post: luffy
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,472 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Sending file html ? JohnnyCoffee 3 89,473 Sep-06-2019, 04:32 PM
Last Post: snippsat
  Problem parsing website html file thefpgarace 2 3,333 May-01-2018, 11:09 AM
Last Post: Standard_user
  bs4 : output html content into a txt file smallabc 2 23,555 Jan-02-2018, 04:18 PM
Last Post: snippsat
  How to print particular text areas fron an HTML file (not site) Chris 10 7,355 Dec-11-2017, 09:20 AM
Last Post: j.crater
  read text file using python and display its output to html using django amit 0 18,428 Jul-23-2017, 06:14 AM
Last Post: amit

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020