Python Forum

Full Version: Scraping Websites to post on Telegram
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am newer to Python, but understand the basics. I am trying to create a script that will scrape from a list of websites to news articles or RSS feeds from websites to find a keyword. Then when it finds the specific keyword(s) out of the list I provided send a message to a group i created on telegram with a link to the article.

At the end of the day I am trying to stay up to date with all new gaming articles that have to do with 8k gaming. Its something that interests me so I am searching for all playstation, xbox and computer topics that also have to do with 8k, gaming, VR, TV. I am putting together a list of websites that tend to talk about this type of tech so i don't have to look daily for something that might not hit but once a month or twice a month if i'm lucky.

I have found a lot of really useful guides out there that have given me some good info but i just dont know how to put it all together.
  • I know that I need to use beautifulsoup to parse the html info out of the feeds.
  • I created the bot with botfather on telegram and have the api token.
  • I assume I will need to create arrays with all my rss feeds/websites that I want to look for information from. That way i can create the loop for it to cycle through each of these.
  • Inside of the first array it would also need to go through the keywords that i want to search within each web page with. Using a loop I would need to go through each of these on each page before moving onto the next feed on the array from above. If it finds any of the keywords I want it to skip the rest so i don't get the same link 4 times. Which I assume I could make it leave the loop once a variable becomes true?
  • I have been able to get to this point of finding the information from a single page but not multiple, but not be able to get it from multiple pages and give me a link to the article.

I posted this previously but somehow missed this part of the forum so maybe this will help a little more.
This is about as far as I have gotten and I have more than enough desire to learn how to do this if you know of any place that can show me more on this. Its kind of an odd request so googling a how to was a little hard. I had to take it piece by piece.

Thank you.
a simple way to make unique elements is to use set. You just put your links in a list and wrap them in set, and it removes duplicates.
>>> set(['link','link','link','link2','link2'])
{'link', 'link2'}
There are feedparsing libraries already existing.