Dec-31-2021, 02:06 AM
I need to get a list of forks for a Github project showing ahead and behind stats as well as last commit date. I've written a Python script and everything works great except for the date.
For some reason during operation it will randomly return a page with an error stating "Failed to load latest commit information." and I cannot retrieve the date from the page although ahead and behind stats are fine every time. I've tried adding time.sleep() with various delays but it doesn't seem to make a difference.
Here's the script:
Does anyone have any ideas why this is happening and/or a potential work around?
For some reason during operation it will randomly return a page with an error stating "Failed to load latest commit information." and I cannot retrieve the date from the page although ahead and behind stats are fine every time. I've tried adding time.sleep() with various delays but it doesn't seem to make a difference.
Here's the script:
import requests, re, os, sys, time, datetime, browser_cookie3 headers = { "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0" } def text_from_url(url): cookiejar = browser_cookie3.load() response = requests.get(url, headers=headers, cookies=cookiejar) return response.text forklist_url = sys.argv[1].strip()+"/network/members" forklist_htm = text_from_url(forklist_url) is_root = True for match in re.finditer('<a (class=""|class="Link--secondary") href="(/([^/"]*)/[^/"]*)">', forklist_htm): fork_url = 'https://github.com'+match.group(2) fork_owner_login = match.group(3) fork_htm = text_from_url(fork_url) #with open("download.htm", "w") as text_file: # text_file.write(fork_htm) #while "Failed to load latest commit information." in fork_htm: # time.sleep(1) # fork_htm = text_from_url(fork_url) match_ahead = re.search('([0-9]+) commits? ahead', fork_htm) match_behind = re.search('([0-9]+) commits? behind', fork_htm) match_branch = re.search('This branch is (.*) (up to date with|ahead|behind) (.*)[.]', fork_htm) match_date = re.search('<relative-time(.*?)datetime="(.*?)"(.*?)>', fork_htm) items = [] if match_ahead: items.append('+'+match_ahead.group(1)) if match_behind: items.append('-'+match_behind.group(1)) if match_branch: items.append(match_branch.group(3)) if "This branch is up to date with master." in fork_htm: items = ['up-to-date']; if match_date: items.append(datetime.datetime.strptime(match_date.group(2), '%Y-%m-%dT%H:%M:%SZ').strftime('%Y-%m-%d')) if is_root: print(fork_url+' (root)'); else: print(fork_url+' ('+' '.join(items)+')') is_root = False time.sleep(1)Usage Example:
python3 list-forks.py https://github.com/itinance/react-native-fs
Does anyone have any ideas why this is happening and/or a potential work around?