Python Forum
Find today's RSS entries with feedparser
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Find today's RSS entries with feedparser
#1
I'm trying to create the worlds most basic RSS reader - I just want to know the number (len) of entries that were only created TODAY. I don't care if anything older was modified on today, just entries created.

Should I even be using feedparser do this? It seems I could just findAll <datepub> tags with BeautifulSoup and match it today's date, but everyone insists I should be using feedparser. Note: I'm still new at this, so it's hours of struggling either way.

Here's where I'm at - this shows that I have 50 entries:

import feedparser
dgtw = feedparser.parse('https://investorshub.advfn.com/boards/rss.aspx?board_id=22658')
print (len(dgtw['entries']))
This just shows the published date of the first entry:

print(dgtw.entries[0].published)
I just want to findAll published dates that match today's date and give me a len number/return.

I don't see anything in the docs about this specifically: https://pythonhosted.org/feedparser/date-parsing.html
Reply
#2
(Jun-11-2019, 11:40 PM)Biks Wrote: It seems I could just findAll <datepub> tags with BeautifulSoup and match it today's date
You could to that.
Now will task work fine from feedparser to.
If just set a range() to number like eg 10 and loop.
>>> for n in range(10):
...     dgtw['entries'][n]['published']
...     
'Tue, 11 Jun 2019 16:07:41 GMT'
'Tue, 11 Jun 2019 15:03:52 GMT'
'Tue, 11 Jun 2019 14:43:21 GMT'
'Tue, 11 Jun 2019 13:58:31 GMT'
'Tue, 11 Jun 2019 12:24:25 GMT'
'Tue, 11 Jun 2019 03:13:45 GMT'
'Mon, 10 Jun 2019 21:56:47 GMT'
'Mon, 10 Jun 2019 19:58:24 GMT'
'Mon, 10 Jun 2019 19:38:59 GMT'
'Mon, 10 Jun 2019 14:07:38 GMT'
So see that there is 6 entries today.
Then can just do a quick hack.
>>> from datetime import datetime
>>> 
>>> today = datetime.today().day
>>> today
12
>>> today_match =  f', {today}'
>>> today_match
', 12'
when have ', 12' can just use in to look if there is a match in published date string.
>>> today_match =  ', 11'
>>> count = 0
>>> for n in range(10):
...     p_date = dgtw['entries'][n]['published']     
...     if today_match in p_date:
...         count += 1       
...         
>>> count
6
Reply
#3
OK, my grasp of Python code is tenuous. :) I'm having a hard time building the final code from your examples. (sorry)

How do I even see the list of date and times column for:

for n in range(10):
    dgtw['entries'][n]['published']
If I print(dgtw), I see all entries.

I noticed you have today_match listed 3 times. Do I need all three?

today_match =  f', {today}'
    today_match
', 12'
    today_match =  ', 11'
What's the final output supposed to look like?
Reply
#4
Here put together.
import feedparser
from datetime import datetime

dgtw = feedparser.parse('https://investorshub.advfn.com/boards/rss.aspx?board_id=22658')
today = datetime.today().day
today_match = f', {today}'
count = 0
for n in range(50):
    p_date = dgtw['entries'][n]['published']
    if today_match in p_date:
        count += 1

print(f'published entries {datetime.today().ctime()},til now is <{count}>')
Output:
published entries Wed Jun 12 17:50:04 2019,til now is <2>
Can just set a higher number as here 50,also more entries that will ever be published in day.
Then count will be correct.
Reply
#5
Hey this is great! Thanks! The last thing I did was toss the final number into a variable:

total = (f'{count}')
print(total)


I'm tossing that number into a Google sheet. (I managed to figure out how to do that by myself) :P

Thanks again!
Reply
#6
#one option might be using of filtering the dates using system dates

from datetime import datetime
import feedparser

# Current date time in local system
dt = datetime.now()
today = datetime.today().day
count = 0

dgtw = feedparser.parse('https://investorshub.advfn.com/boards/rss.aspx?board_id=22658')

if dt==today:
   for each in range(len(dgtw['entries'])):
          count += 1
print(count)
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020