Python Forum
Help needed - UN voting records
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help needed - UN voting records
#1
Hi, I haven't done any programming in 20 years (last did DB2 and Teradata) but an urgent task has emerged and I am not sure how to complete it

Basically, I want to create a spreadsheet listing the results of every United Nations General Assembly and United Nations Security Council resolution that was put to vote since 2022. Each row would have the title of the resolution, the resolution number, the date of the vote and how each of the 193 members voted (yes, no, abstain or absent).

The data is available through the search page of the UN digital library (accessible here: https://digitallibrary.un.org/search?cc=...oting+Data). However I can only read one resolution at a time (there were 288 votes in 2023 alone).

A typical page for a vote is here: https://digitallibrary.un.org/record/4045078?ln=en. It includes various formats like BibTex, MARC and MARCXML.

Any ideas if a Python script might solve this problem? Welcome ideas!

Many thanks in advance.
Reply
#2
The MARCXML version looks very easy to process. Just use Python's lxml module. Also there are specialized packages in Pypi such as marcxml2csv and marcxml_parser.
Skaperen likes this post
« We can solve any problem by introducing an extra level of indirection »
Reply
#3
I am not an expert, like some of the guys here, but maybe this will help:

A page with voting results looks like this:

Quote:https://digitallibrary.un.org/record/4045078

You can get all these record pages with the code below.

After that, you need to process each page to get the actual voting results. Haven't tried that yet. Maybe you can indicate how you want the data presented.

import re
import requests
from bs4 import BeautifulSoup

# this address only seems to contain 50 voting records  
url = 'https://digitallibrary.un.org/search?cc=Voting+Data&ln=en&c=Voting+Data'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# get all links
links = soup.find_all('a')
# in the links we are looking for this kind of pattern: /record/4047224
base = "https://digitallibrary.un.org"
pattern = re.compile(r'/record/\d+')
# a list to take all the addresses of pages with voting results
addresses = []
for link in links:
    try:
        url  = link['href']
        res = pattern.match(url)
        if res:
            print(res.group())
            address = base + res.group()
            addresses.append(address)
    except KeyError:
        continue
Now you need to feed the addresses you have into requests and extract the voting data for each address in addresses.

Shouldn't be too difficult!

The voting results look like this:

Quote:AFGHANISTAN
Y ALBANIA
Y ALGERIA
Y ANDORRA
A ANGOLA
Y ANTIGUA AND BARBUDA
Y ARGENTINA
Y ARMENIA
Y AUSTRALIA
Y AUSTRIA
AZERBAIJAN
Y BAHAMAS
Y BAHRAIN

Y must be yes, A probably abstained. There doesn't seem to be N for no, maybe nothing is no.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020