Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Using Local Html Data
#1
Hi all,

As a newbie, I've been trying to get some practice in webscraping by trying to extract different elements on a html page, but I decided that rather than keep hitting an actual website for data (as a newbie I have a fair few failings!), I'll temporarily save the html into a local file, so that I can load the file locally and keep practising.

I've spent more than 5 hours trying to get python to read my html file and use it with BeautifulSoup and after reading about it in different places and still failing, I thought it was time to reach out for some advice.

Here's the last code I tried:

from bs4 import BeautifulSoup

with open("C:\Users\[UserName]\Desktop\localhtml.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')
And here's the error I'm getting:

Error:
"C:\Users\[UserName\PycharmProjects\PracticeProject\venv\Scripts\python.exe" "C:/Users/[UserName]/PycharmProjects/PracticeProject/testscrape.py" File "C:\Users\[UserName]\PycharmProjects\PracticeProject\testscrape.py", line 29 with open("C:\Users\[UserName\Desktop\localhtml.html") as fp: ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
As I type in the url reference into the code, it seems to suggest it- so I assume I'm referencing it properly.

I've tried looking up youtube or searching google, but perhaps I'm looking up the wrong stuff as I can't seem to find something that works.



The file I'm trying to refer to is saved as a html file. I've attached it- just in case there's a problem there.

Can anyone instruct me how to instruct python to open a local html file and use it with BeautifulSoup please?

Attached Files

.html   localhtml.html (Size: 32.79 KB / Downloads: 331)
Reply
#2
Don't use backslash with paths on Windows. A backslash with certain characters (in this case \U) is escape sequence. Use raw string or double backslash or forward slash
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
(Jun-07-2021, 03:11 AM)buran Wrote: Don't use backslash with paths on Windows. A backslash with certain characters (in this case \U) is escape sequence. Use raw string or double backslash or forward slash

Thanks buran.

How annoyingly simple that was! Big Grin

Appreciate you taking the time to explain- lots to learn!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 877 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  simple html page with update data korenron 3 2,585 Nov-15-2021, 09:31 AM
Last Post: jamesaarr
  reading html and edit chekcbox to html jacklee26 5 3,020 Jul-01-2021, 10:31 AM
Last Post: snippsat
  HTML to Python to Windows .bat and back to HTML perfectservice33 0 1,918 Aug-22-2019, 06:31 AM
Last Post: perfectservice33
  Fetching html files from local directories shiva 3 3,383 Mar-20-2018, 05:12 PM
Last Post: wavic
  convert text file data to HTML table in python Tirumal 5 12,407 Dec-29-2017, 04:44 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020