Python Forum

Full Version: NewsFeed Ready To Use Script
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

The reason why I am posting my question here is because of doing research on Google and finding many sources of info that what i want could be done in Python and hoping there is some tool (script) already completed and ready to be used for this. The tool that would be tracking exactly defined URLs all the time while computer is ON with working internet connection.

Please, if you have any suggestion what to do, I would appreciate for a lifetime and would like to politely ask you to please reply.

I have several websites where news are posted daily. In average around 800 feeds per day per one website. Some of those websites DON'T support RSS and when i was trying some free of charge services like TheOldReader, etc I got error message saying that website(s) don't support feeds at all. I really want to get those news that are being posted to my gmail email account IMMEDIATELY (this is the most important!) when they occur. I was trying different website updates delivery services but it didn't work. May i ask what else could i do so i wouldn't need to check every single website every single minute if there is any new post or not? Why did i say minute? Because timing is very important - i must get it as fast as possible. Example of such website is:

https://uk.reuters.com/news/archive/bondsNews
http://www.wallstreetreporter.com/catego...ness-news/
https://www.fxstreet.com/news/latest?q=&...dexPro&p=0
http://www.netdania.com/Products/News/Re...lysis.aspx

and many more. Whenever something new is posted to this exact website and NOT main one uk.reuters.com, I would like to get email message to my gmail with appropriate title and content including redirect link. Immediately in max one second. As I said the main problem is that RSS feed or anything similar is NOT supported by majority of websites i am interested in and wondering what else could be done? There needs to be some way to get update with info about new post to my email address.

I tried many websites changes tracking services such as:

https://visualping.io

But there are several problems. I am unable to afford paid service (otherwise i would had hired developer before making this message). Even if few services I found are free of charge, they don't deliver the alert about new news posts immediately via email. Another possible problem is that they have quantity limit of alerts for new posts. Also tried the famous free service called TheOldReader but when i added URL, i got error message saying that either no feeds are there or that news feeds are not supported.

Could anyone please tell me where can i get such script - already completed, ready to be used with defining wanted URLs? It is important that exact webpage (including multiple pages if any) is tracked and alerted instead of entire website. Example if i define, one of URLs as www.website.com/page1 then this exact URL must be tracked and news about new posts alerted via email and NOT www.website.com

I believe and hope that such tool (script) already exist, just asking for help please where could i find it? Would love to hire developer but my financial budget is close to zero and wouldn't be able to pay the development work.

Thank you in advance.
Moved to general coding help, since the completed scripts section is for already-completed scripts, not requests.

This tool does not already exist. To build it would require some effort, so you're going to have to either learn Python (and enough web to reverse-engineer those pages) or pay someone. If neither is an option, you're unlikely to get this tool.
Could anyone please help with this? I think that my post was overlooked...
Did you not see my reply?
i am not aware of an existing tools where you just input your URL's and get your data you are looking for back. Most people just hardcode their own URL's into their site parsing code because its such a minuscule task for programmers. However, this can be done by someone who knows python/selenium or basic web parsing quite easily. In fact i have done this type of stuff before. However the task of making the tool can be daunting depending on how many sites and pages you are parsing. Each site would need to be manually looked at to define how the HTML is, to extract the correct data. Which is not bad for a couple sites, but time consuming for many. Thats why i doubt there are existing tools like this, because each site requires a different path (thus different code) to obtain the same data such as latest posts (unlike RSS feeds). You also have the added bonus of needing to update the tool when the website changes their code from year to year.

If you want to make the tool yourself, we have a great tutorial to get you started. If you get stuck along the way we can help you. I wouldnt mind showing you how to extract the data from a couple sites. Maybe you would get the hang of it after that and do the rest.

If you want to pay someone to do it for you. We can move this thread to the jobs section. You will have to give more information such as what types of posts you want exactly extracted from those sites, as well as how much you would pay the programmer.
(Dec-30-2018, 04:17 AM)metulburr Wrote: [ -> ]In fact i have done this type of stuff before.

some are https, some are http. Some are with www and some without.

If you have done it before, and assuming that you worked with the websites that don't support RSS or such feeds extracting technology then i see no reason why your tool or whatever you have done, wouldn't work on the websites i am interested for. How much would you charge me for your tool that you did? I will see if i could borrow few coins of euros to be able to pay you. Very few samples of URLs are in my first message.
You have to write it alone or pay to someone. Why such a script doesn't exist? Slightly changes on the website could ruin the script. It's not a protocol. However, I use some web scraping scripts and I see that such changes are not happening too often.
(Dec-30-2018, 12:58 PM)YourFriend0 Wrote: [ -> ]assuming that you worked with the websites that don't support RSS or such feeds extracting technology then i see no reason why your tool or whatever you have done, wouldn't work on the websites i am interested for.
Websites have different layouts, different html structure, different url structure...
That means you would need separate code to handle each website.

Cases where you can have a single scraper work on multiple websites are very rare, unless you're scraping some very generic metadata.
(Dec-31-2018, 08:57 AM)stranac Wrote: [ -> ]
(Dec-30-2018, 12:58 PM)YourFriend0 Wrote: [ -> ]assuming that you worked with the websites that don't support RSS or such feeds extracting technology then i see no reason why your tool or whatever you have done, wouldn't work on the websites i am interested for.
Websites have different layouts, different html structure, different url structure...
That means you would need separate code to handle each website.

Cases where you can have a single scraper work on multiple websites are very rare, unless you're scraping some very generic metadata.

I thought there is some tool where you define which element is on which AREA of the website (e.g. picture, title, body of news,...) and then web scraping tool 24/7, all the time when being runned is checking whether some new posts occurs on this webpage...?