Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need To Scrape Some Links
#3
(Oct-09-2018, 12:30 AM)micseydel Wrote: For future reference, providing that in output tags instead of linking off-site would be appreciated. I'm in between tasks at work at the moment but I usually ignore these kinds of posts. (I say this purely to be helpful, not trying to call you out or anything.)

Having looked at your link... Regular expressions aren't good for parsing arbitrary HTML, but they should be great for handling regular links. Have you tried writing a simple regex? Detecting the end in the general case is the tricky part (though I'm sure there's a Stack Overflow answer to that), but in your case it might be as simple as looking for quotes.

I linked that way because I noticed a few of the links I'm scraping are connected to porn or spammy ads and malware. I haven't really vetted them, they're completely random. What exactly does output do? How can I use it?

I should have included some code that I tried:

html = """
<html><head></head>
<body>
<div id="nx646" data-advertentie-url="https://service.sportsads.nl/www/delivery/asyncjs.php"></div>
</body>
</html>
"""

soup = BeautifulSoup(html, 'lxml')

result = soup.body.findAll(text=re.compile(r'www/delivery', re.IGNORECASE | re.DOTALL))

if len(result) > 0:
    result = "Link Found"
    
print(result)
I've never written any regex it's completely foreign to me. I tried using a generator but it was confusing. I need to somehow mix this code:

re.findall(r'"([^"]*)"', inputString)
With a query that finds the other footprint: "www/delivery"

So ONLY text inside of quotations that ALSO matches the footprint.
Reply


Messages In This Thread
Need To Scrape Some Links - by digitalmatic7 - Oct-09-2018, 12:18 AM
RE: Need To Scrape Some Links - by micseydel - Oct-09-2018, 12:30 AM
RE: Need To Scrape Some Links - by digitalmatic7 - Oct-09-2018, 02:33 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,091 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  webscrapping links and then enter those links to scrape data kirito85 2 3,152 Jun-13-2019, 02:23 AM
Last Post: kirito85
  BS4 - How Can I Scrape These Links? digitalmatic7 1 2,302 May-07-2018, 03:05 AM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020