Aug-13-2024, 06:02 PM
(This post was last modified: Aug-13-2024, 06:06 PM by scraperwannaB.)
Hi,
I am new to python, but trying to learn. I am trying with the scraper code below:
Given:
I load up the page, there is an RSS feed link which is what I use to get the xml file I will eventually extract the titles from, but I tried scrolling down some to show the page 2 url in the url window, but I can no longer see the RSS button to scrape as I did when at the top of the page. Is there more coding to the python script, or can this be resolved by changing the RSS feed url (I did that actually, and I wasn't getting RSS results) so I am assuming I need something additional added to my python code.
So what my main concern here is this: If my source to the xml is through a link on the top of the webpage (webpage has infinity scrolling) how can I get the entire webpages xml into that one xml file, or is that even possible? If not possible, how should I change my python code to make that possible?
Any help greatly appreciated!
swb
I am new to python, but trying to learn. I am trying with the scraper code below:
Given:
# import required modules from bs4 import BeautifulSoup # reading content file = open("output.xml", "r") contents = file.read() # parsing soup = BeautifulSoup(contents, 'xml') titles = soup.find_all('title') # display content for data in titles: print(data.get_text())I would like extract data from a webpage that I want to get only the contents of the title tags. The problem I am having is that the page has infinity scrolling on the page. I can get the xml from the page, but I can only seem to get the default pages titles. Let me be a bit more detailed:
I load up the page, there is an RSS feed link which is what I use to get the xml file I will eventually extract the titles from, but I tried scrolling down some to show the page 2 url in the url window, but I can no longer see the RSS button to scrape as I did when at the top of the page. Is there more coding to the python script, or can this be resolved by changing the RSS feed url (I did that actually, and I wasn't getting RSS results) so I am assuming I need something additional added to my python code.
So what my main concern here is this: If my source to the xml is through a link on the top of the webpage (webpage has infinity scrolling) how can I get the entire webpages xml into that one xml file, or is that even possible? If not possible, how should I change my python code to make that possible?
Any help greatly appreciated!
swb