Jun-07-2021, 03:04 AM
Hi all,
As a newbie, I've been trying to get some practice in webscraping by trying to extract different elements on a html page, but I decided that rather than keep hitting an actual website for data (as a newbie I have a fair few failings!), I'll temporarily save the html into a local file, so that I can load the file locally and keep practising.
I've spent more than 5 hours trying to get python to read my html file and use it with BeautifulSoup and after reading about it in different places and still failing, I thought it was time to reach out for some advice.
Here's the last code I tried:
I've tried looking up youtube or searching google, but perhaps I'm looking up the wrong stuff as I can't seem to find something that works.
The file I'm trying to refer to is saved as a html file. I've attached it- just in case there's a problem there.
Can anyone instruct me how to instruct python to open a local html file and use it with BeautifulSoup please?
As a newbie, I've been trying to get some practice in webscraping by trying to extract different elements on a html page, but I decided that rather than keep hitting an actual website for data (as a newbie I have a fair few failings!), I'll temporarily save the html into a local file, so that I can load the file locally and keep practising.
I've spent more than 5 hours trying to get python to read my html file and use it with BeautifulSoup and after reading about it in different places and still failing, I thought it was time to reach out for some advice.
Here's the last code I tried:
from bs4 import BeautifulSoup with open("C:\Users\[UserName]\Desktop\localhtml.html") as fp: soup = BeautifulSoup(fp, 'html.parser')And here's the error I'm getting:
Error:"C:\Users\[UserName\PycharmProjects\PracticeProject\venv\Scripts\python.exe" "C:/Users/[UserName]/PycharmProjects/PracticeProject/testscrape.py"
File "C:\Users\[UserName]\PycharmProjects\PracticeProject\testscrape.py", line 29
with open("C:\Users\[UserName\Desktop\localhtml.html") as fp:
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
As I type in the url reference into the code, it seems to suggest it- so I assume I'm referencing it properly. I've tried looking up youtube or searching google, but perhaps I'm looking up the wrong stuff as I can't seem to find something that works.
The file I'm trying to refer to is saved as a html file. I've attached it- just in case there's a problem there.
Can anyone instruct me how to instruct python to open a local html file and use it with BeautifulSoup please?
Attached Files