Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Two part scraping?
#1
Hi,

I'm trying to figure out 2 things.
1st) Scrape a page for all it's News Links
2nd) Using those scraped links - open each link and scrape that page for Title/Article/Image

I'm totally new to Python and trying to pick up little things here and there.

I have some code.

This bit of code will get the title and content - but I can't figure out how to get the image URL - i've not put any code into this example.

from bs4 import BeautifulSoup
import requests, openpyxl

excel = openpyxl.Workbook()
print(excel.sheetnames)
sheet = excel.active
sheet.title = "News"
print(excel.sheetnames)
sheet.append(['title', 'body', 'image'])

source = requests.get('https://portswigger.net/daily-swig/couple-charged-with-laundering-proceeds-from-4-5bn-bitfinex-cryptocurrency-hack')
source.raise_for_status()

soup = BeautifulSoup(source.text, 'html.parser')
text = soup.find_all(class_="post-card")
for news in text:
    title = news.find('h1').text
    body = news.find(class_="post-content").text
    image = image.find()

print(title,body, image)
sheet.append([title, body, image])

excel.save('news1.xlsx')
As for scraping the URLs, i'm not having much luck.

The site is https://portswigger.net/daily-swig/dark-web

the code is this - but probably no point in even adding this and nothing is working
from bs4 import BeautifulSoup
import requests, openpyxl


source = requests.get('https://portswigger.net/daily-swig/dark-web')
source.raise_for_status()

soup = BeautifulSoup(source.text, 'html.parser')
text = soup.find_all('div', class_="tile-container is-absolute dailyswig size0 style1 textstyle7")

print(text)
But even if this was working I still don't know how I can take those links and feed them into the script to scrape the title/content/image and put that into a xlsx file.

Any help would be great.
Thanks.
Reply


Messages In This Thread
Two part scraping? - by never5000 - Feb-22-2022, 02:36 PM
RE: Two part scraping? - by snippsat - Feb-23-2022, 03:49 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Questions abou Web-scraping part-2 Tutorial ljmetzger 2 2,933 Mar-25-2018, 09:14 PM
Last Post: ljmetzger
  Detect comments part using web scraping seco 7 5,125 Jan-18-2018, 10:06 PM
Last Post: seco

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020