Python Forum
Downloading Page Source From URL List
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Downloading Page Source From URL List
#12
I don't think you will use \ in hyperlinks.

If your web address looks like:

URL = r"http://gibberish-rubbish-trash.com/more_gibberish?page="
and your regex is like this:

e = re.compile(r"(\w+\.\w+)\\(.*)")
You will not find the whole web address because, notwithstanding all hyphenated words, - is not \w:

Output:
e.search(URL) <re.Match object; span=(25, 55), match='trash.com\\more_gibberish?page='>
I had trouble with web addresses containing -

This finds the base web address:

URL = r"http://gibberish-rubbish-trash.com/more_gibberish?page="
e = re.compile(r"//(\S+\.\w+)")
res = e.search(URL)
res.group(1)
Output:
'gibberish-rubbish-trash.com'
Reply


Messages In This Thread
RE: Downloading Page Source From URL List - by Pedroski55 - Jun-08-2024, 06:40 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Help with to check an Input list data with a data read from an external source sacharyya 3 668 Mar-09-2024, 12:33 PM
Last Post: Pedroski55
  Downloading images from webpages H84Gabor 2 2,045 Sep-29-2021, 05:39 PM
Last Post: snippsat
  Downloading a module Xlsxwriter dan789 6 11,648 Jan-26-2019, 02:13 PM
Last Post: dan789
  "if statement" and downloading a dataset Alberto 1 2,601 Jan-25-2018, 01:44 PM
Last Post: ka06059
  Downloading and using pyperclip PMPythonlearner 2 5,217 Dec-31-2017, 04:37 PM
Last Post: PMPythonlearner
  Problem downloading 2.7.8 Mac OSX Benjipincus 2 3,183 Dec-18-2017, 01:33 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020