Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping external URLs from pages
#1
I'd like to start this off by saying I know nothing at all about Python. If what I want to do is possible, I'll need to find someone to help me to do it.

I have a list of a few million URLs that I gathered using ScrapeBox. I would like to scrape all of those pages for external domains. My end goal is to find available domains that can be registered. I'd like to find a more cost effective method of scraping external URLs. ScrapeBox uses Windows hosting and to scale it to what I want isn't cost effective.

How difficult would this be to do in Python? Also, would it be costly to churn through millions of URLs?
Reply
#2
I don't think I got the idea...but check this:

First you have to read the list of domain, modify them and after that, ping the new address.
If the ping returns OK, you know that the modified domains is taken.
Reply
#3
I do a little scraping, maybe I can help you.
Let me know.
Thank you
Renny
Reply
#4
@Blue Dog: This is not a post in Jobs section. If you can and want to contribute to discussion please share your thoughts in a public post in the thread.
@Apook, let me know if you want to move it to Jobs section-that is if you want to hire someone to do it for you.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
This isn't a post seeking someone to do it. I just want to know if it's possible to be done. I have a text file with millions of URLs. Each URL is a web page. I need to have all the links on each of the pages scraped and put into a text file. I was wondering if this is possible. At this point I don't have the ability to hire anyone to do it. I just want to know if this is possible.
Reply
#6
Yes, it is possible, and probably easier than you expect it would be :)

The modules requests (to fetch the page at the url) and BeautifulSoup (to quickly and easily parse that page) will make this very easy for you.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Need help opening pages when web scraping templeowls 1 304 Feb-29-2024, 06:45 PM
Last Post: snippsat
  BeautifulSoup not parsing other URLs giddyhead 0 1,194 Feb-23-2022, 05:35 PM
Last Post: giddyhead
  Need logic on how to scrap 100K URLs goodmind 2 2,615 Jun-29-2020, 09:53 AM
Last Post: goodmind
  scraping multiple pages from table bandar 1 2,685 Jun-27-2020, 10:43 PM
Last Post: Larz60+
  Scraping Multiple Pages mbadatanut 1 4,218 May-08-2020, 02:30 AM
Last Post: Larz60+
  Scraping not moving to the next pages in a website jithin123 0 1,944 Mar-23-2020, 06:10 PM
Last Post: jithin123
  Scraping from multiple URLS to print in a single line. jb89 4 3,358 Jan-29-2020, 06:12 AM
Last Post: perfringo
  Scrape multiple urls LXML santdoyle 1 3,548 Oct-26-2019, 09:53 PM
Last Post: snippsat
  Need to Verify URLs; getting SSLError rahul_goswami 0 2,196 Aug-20-2019, 10:17 AM
Last Post: rahul_goswami
  Scrap text out of td table from URLS Gochix2020 4 5,644 Aug-03-2019, 02:56 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020