Python Forum

Full Version: Using Local Html Data
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all,

As a newbie, I've been trying to get some practice in webscraping by trying to extract different elements on a html page, but I decided that rather than keep hitting an actual website for data (as a newbie I have a fair few failings!), I'll temporarily save the html into a local file, so that I can load the file locally and keep practising.

I've spent more than 5 hours trying to get python to read my html file and use it with BeautifulSoup and after reading about it in different places and still failing, I thought it was time to reach out for some advice.

Here's the last code I tried:

from bs4 import BeautifulSoup

with open("C:\Users\[UserName]\Desktop\localhtml.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')
And here's the error I'm getting:

Error:
"C:\Users\[UserName\PycharmProjects\PracticeProject\venv\Scripts\python.exe" "C:/Users/[UserName]/PycharmProjects/PracticeProject/testscrape.py" File "C:\Users\[UserName]\PycharmProjects\PracticeProject\testscrape.py", line 29 with open("C:\Users\[UserName\Desktop\localhtml.html") as fp: ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
As I type in the url reference into the code, it seems to suggest it- so I assume I'm referencing it properly.

I've tried looking up youtube or searching google, but perhaps I'm looking up the wrong stuff as I can't seem to find something that works.



The file I'm trying to refer to is saved as a html file. I've attached it- just in case there's a problem there.

Can anyone instruct me how to instruct python to open a local html file and use it with BeautifulSoup please?
Don't use backslash with paths on Windows. A backslash with certain characters (in this case \U) is escape sequence. Use raw string or double backslash or forward slash
(Jun-07-2021, 03:11 AM)buran Wrote: [ -> ]Don't use backslash with paths on Windows. A backslash with certain characters (in this case \U) is escape sequence. Use raw string or double backslash or forward slash

Thanks buran.

How annoyingly simple that was! Big Grin

Appreciate you taking the time to explain- lots to learn!