Help needed - UN voting records

Help needed - UN voting records - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Help needed - UN voting records (/thread-42143.html)

Help needed - UN voting records - Mardy - May-18-2024

Hi, I haven't done any programming in 20 years (last did DB2 and Teradata) but an urgent task has emerged and I am not sure how to complete it

Basically, I want to create a spreadsheet listing the results of every United Nations General Assembly and United Nations Security Council resolution that was put to vote since 2022. Each row would have the title of the resolution, the resolution number, the date of the vote and how each of the 193 members voted (yes, no, abstain or absent).

The data is available through the search page of the UN digital library (accessible here: https://digitallibrary.un.org/search?cc=Voting+Data&ln=en&c=Voting+Data). However I can only read one resolution at a time (there were 288 votes in 2023 alone).

A typical page for a vote is here: https://digitallibrary.un.org/record/4045078?ln=en. It includes various formats like BibTex, MARC and MARCXML.

Any ideas if a Python script might solve this problem? Welcome ideas!

Many thanks in advance.

RE: Help needed - UN voting records - Gribouillis - May-18-2024

The MARCXML version looks very easy to process. Just use Python's lxml module. Also there are specialized packages in Pypi such as marcxml2csv and marcxml_parser.

RE: Help needed - UN voting records - Pedroski55 - May-19-2024

I am not an expert, like some of the guys here, but maybe this will help:

A page with voting results looks like this:

Quote:https://digitallibrary.un.org/record/4045078

You can get all these record pages with the code below.

After that, you need to process each page to get the actual voting results. Haven't tried that yet. Maybe you can indicate how you want the data presented.

import re
import requests
from bs4 import BeautifulSoup

# this address only seems to contain 50 voting records  
url = 'https://digitallibrary.un.org/search?cc=Voting+Data&ln=en&c=Voting+Data'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# get all links
links = soup.find_all('a')
# in the links we are looking for this kind of pattern: /record/4047224
base = "https://digitallibrary.un.org"
pattern = re.compile(r'/record/\d+')
# a list to take all the addresses of pages with voting results
addresses = []
for link in links:
    try:
        url  = link['href']
        res = pattern.match(url)
        if res:
            print(res.group())
            address = base + res.group()
            addresses.append(address)
    except KeyError:
        continue

Now you need to feed the addresses you have into requests and extract the voting data for each address in addresses.

Shouldn't be too difficult!

The voting results look like this:

Quote:AFGHANISTAN
Y ALBANIA
Y ALGERIA
Y ANDORRA
A ANGOLA
Y ANTIGUA AND BARBUDA
Y ARGENTINA
Y ARMENIA
Y AUSTRALIA
Y AUSTRIA
AZERBAIJAN
Y BAHAMAS
Y BAHRAIN

Y must be yes, A probably abstained. There doesn't seem to be N for no, maybe nothing is no.