Python Forum
Updating column name with translation - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Updating column name with translation (/thread-43229.html)



Updating column name with translation - bobbydave - Sep-17-2024

I have a column of countries. However mixed within this country column are spanish country names.
I have found the list of names in Spanish, translated them to english but now I want to either create another column showing the original column plus where my translate country will be or just replace the entry.

Quote:OriginalColumn of Countries
France
Portugal
Spain
México
Suiza
Perú
Alemania
Suecia
Reino Unido

I have my translated list

Quote:Mexico
Swiss
Peru
Germany
Sweden
United Kingdom
Now I want to put in another column and where i have found the translations in english, to insert them there (or if that fails, just update the original column with the translation so I should then have a list of all english countries

import pycountry
import pandas as pd
from pathlib import Path
from langdetect import detect 
from googletrans import Translator


FILENAME = r"POS.xlsx"
COUNTRYNAME = 'Country'
df = pd.read_excel(FILENAME)

def all_names() -> set[str]:
    # all Country objects have a "name" attribute
    return {country.name for country in pycountry.countries} # type: ignore

def all_official_names() -> set[str]:
    s: set[str] = set()
    for country in pycountry.countries:
        # not all Country objects have an "official_name" attribute
        try:
            s.add(country.official_name) # type: ignore
        except AttributeError:
            pass
    return s

def get_df_countries(filename: Path) -> set[str]:
    # construct a set because country names may be duplicated in the spreadsheet column
    # this potentially improves runtime performance when parsing the country names later
    return set(pd.read_excel(FILENAME)[COUNTRYNAME])

# Function to detect language of a word
def detect_language(word):
    try:
        return detect(word)
    except:
        return 'unknown'
    
translator = Translator()

if __name__ == "__main__":
    names = all_names() | all_official_names()
    for country in get_df_countries(Path(FILENAME)):
        status = "valid" if country in names else "invalid"
        if status =="invalid":
            #print(f"{country} is {status}")
            # Translate country back into english
            translated_country = (translator.translate(country).text)
            print(translated_country)

Any ideas how i can show either the new translations replaced but still show all others countries already in english?