Feb-22-2022, 02:36 PM
Hi,
I'm trying to figure out 2 things.
1st) Scrape a page for all it's News Links
2nd) Using those scraped links - open each link and scrape that page for Title/Article/Image
I'm totally new to Python and trying to pick up little things here and there.
I have some code.
This bit of code will get the title and content - but I can't figure out how to get the image URL - i've not put any code into this example.
The site is https://portswigger.net/daily-swig/dark-web
the code is this - but probably no point in even adding this and nothing is working
Any help would be great.
Thanks.
I'm trying to figure out 2 things.
1st) Scrape a page for all it's News Links
2nd) Using those scraped links - open each link and scrape that page for Title/Article/Image
I'm totally new to Python and trying to pick up little things here and there.
I have some code.
This bit of code will get the title and content - but I can't figure out how to get the image URL - i've not put any code into this example.
from bs4 import BeautifulSoup import requests, openpyxl excel = openpyxl.Workbook() print(excel.sheetnames) sheet = excel.active sheet.title = "News" print(excel.sheetnames) sheet.append(['title', 'body', 'image']) source = requests.get('https://portswigger.net/daily-swig/couple-charged-with-laundering-proceeds-from-4-5bn-bitfinex-cryptocurrency-hack') source.raise_for_status() soup = BeautifulSoup(source.text, 'html.parser') text = soup.find_all(class_="post-card") for news in text: title = news.find('h1').text body = news.find(class_="post-content").text image = image.find() print(title,body, image) sheet.append([title, body, image]) excel.save('news1.xlsx')As for scraping the URLs, i'm not having much luck.
The site is https://portswigger.net/daily-swig/dark-web
the code is this - but probably no point in even adding this and nothing is working
from bs4 import BeautifulSoup import requests, openpyxl source = requests.get('https://portswigger.net/daily-swig/dark-web') source.raise_for_status() soup = BeautifulSoup(source.text, 'html.parser') text = soup.find_all('div', class_="tile-container is-absolute dailyswig size0 style1 textstyle7") print(text)But even if this was working I still don't know how I can take those links and feed them into the script to scrape the title/content/image and put that into a xlsx file.
Any help would be great.
Thanks.