Python Forum
Using Local Html Data - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Using Local Html Data (/thread-33886.html)



Using Local Html Data - knight2000 - Jun-07-2021

Hi all,

As a newbie, I've been trying to get some practice in webscraping by trying to extract different elements on a html page, but I decided that rather than keep hitting an actual website for data (as a newbie I have a fair few failings!), I'll temporarily save the html into a local file, so that I can load the file locally and keep practising.

I've spent more than 5 hours trying to get python to read my html file and use it with BeautifulSoup and after reading about it in different places and still failing, I thought it was time to reach out for some advice.

Here's the last code I tried:

from bs4 import BeautifulSoup

with open("C:\Users\[UserName]\Desktop\localhtml.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')
And here's the error I'm getting:

Error:
"C:\Users\[UserName\PycharmProjects\PracticeProject\venv\Scripts\python.exe" "C:/Users/[UserName]/PycharmProjects/PracticeProject/testscrape.py" File "C:\Users\[UserName]\PycharmProjects\PracticeProject\testscrape.py", line 29 with open("C:\Users\[UserName\Desktop\localhtml.html") as fp: ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
As I type in the url reference into the code, it seems to suggest it- so I assume I'm referencing it properly.

I've tried looking up youtube or searching google, but perhaps I'm looking up the wrong stuff as I can't seem to find something that works.



The file I'm trying to refer to is saved as a html file. I've attached it- just in case there's a problem there.

Can anyone instruct me how to instruct python to open a local html file and use it with BeautifulSoup please?


RE: Using Local Html Data - buran - Jun-07-2021

Don't use backslash with paths on Windows. A backslash with certain characters (in this case \U) is escape sequence. Use raw string or double backslash or forward slash


RE: Using Local Html Data - knight2000 - Jun-07-2021

(Jun-07-2021, 03:11 AM)buran Wrote: Don't use backslash with paths on Windows. A backslash with certain characters (in this case \U) is escape sequence. Use raw string or double backslash or forward slash

Thanks buran.

How annoyingly simple that was! Big Grin

Appreciate you taking the time to explain- lots to learn!