Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Getting a list of forks on Github
#1
I need to get a list of forks for a Github project showing ahead and behind stats as well as last commit date. I've written a Python script and everything works great except for the date.

For some reason during operation it will randomly return a page with an error stating "Failed to load latest commit information." and I cannot retrieve the date from the page although ahead and behind stats are fine every time. I've tried adding time.sleep() with various delays but it doesn't seem to make a difference.

Here's the script:
import requests, re, os, sys, time, datetime, browser_cookie3

headers = {	"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0" }

def text_from_url(url):
	cookiejar = browser_cookie3.load()
	response = requests.get(url, headers=headers, cookies=cookiejar)
	return response.text

forklist_url = sys.argv[1].strip()+"/network/members"
forklist_htm = text_from_url(forklist_url)

is_root = True
for match in re.finditer('<a (class=""|class="Link--secondary") href="(/([^/"]*)/[^/"]*)">', forklist_htm):
	fork_url = 'https://github.com'+match.group(2)
	fork_owner_login = match.group(3)
	fork_htm = text_from_url(fork_url)
	
	#with open("download.htm", "w") as text_file:
	#	text_file.write(fork_htm)
	
	#while "Failed to load latest commit information." in fork_htm:
	#	time.sleep(1)
	#	fork_htm = text_from_url(fork_url)
			
	match_ahead = re.search('([0-9]+) commits? ahead', fork_htm)
	match_behind = re.search('([0-9]+) commits? behind', fork_htm)
	match_branch = re.search('This branch is (.*) (up to date with|ahead|behind) (.*)[.]', fork_htm)
	match_date = re.search('<relative-time(.*?)datetime="(.*?)"(.*?)>', fork_htm)

	items = []

	if match_ahead:
		items.append('+'+match_ahead.group(1))

	if match_behind:
		items.append('-'+match_behind.group(1))
	
	if match_branch:
		items.append(match_branch.group(3))

	if "This branch is up to date with master." in fork_htm:
		items = ['up-to-date'];
	
	if match_date:
		items.append(datetime.datetime.strptime(match_date.group(2), '%Y-%m-%dT%H:%M:%SZ').strftime('%Y-%m-%d'))

	if is_root:
		print(fork_url+' (root)');
	else:
		print(fork_url+' ('+' '.join(items)+')')
		
	is_root = False
	
	time.sleep(1)
Usage Example:
python3 list-forks.py https://github.com/itinance/react-native-fs

Does anyone have any ideas why this is happening and/or a potential work around?
Reply


Messages In This Thread
Getting a list of forks on Github - by headkaze - Dec-31-2021, 02:06 AM
RE: Getting a list of forks on Github - by headkaze - Jan-05-2022, 09:27 PM
RE: Getting a list of forks on Github - by DeaD_EyE - Jan-05-2022, 10:04 PM
RE: Getting a list of forks on Github - by headkaze - Jan-05-2022, 10:29 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020