Sep-17-2024, 03:40 PM
I have a column of countries. However mixed within this country column are spanish country names.
I have found the list of names in Spanish, translated them to english but now I want to either create another column showing the original column plus where my translate country will be or just replace the entry.
I have my translated list
Any ideas how i can show either the new translations replaced but still show all others countries already in english?
I have found the list of names in Spanish, translated them to english but now I want to either create another column showing the original column plus where my translate country will be or just replace the entry.
Quote:OriginalColumn of Countries
France
Portugal
Spain
México
Suiza
Perú
Alemania
Suecia
Reino Unido
I have my translated list
Quote:MexicoNow I want to put in another column and where i have found the translations in english, to insert them there (or if that fails, just update the original column with the translation so I should then have a list of all english countries
Swiss
Peru
Germany
Sweden
United Kingdom
import pycountry import pandas as pd from pathlib import Path from langdetect import detect from googletrans import Translator FILENAME = r"POS.xlsx" COUNTRYNAME = 'Country' df = pd.read_excel(FILENAME) def all_names() -> set[str]: # all Country objects have a "name" attribute return {country.name for country in pycountry.countries} # type: ignore def all_official_names() -> set[str]: s: set[str] = set() for country in pycountry.countries: # not all Country objects have an "official_name" attribute try: s.add(country.official_name) # type: ignore except AttributeError: pass return s def get_df_countries(filename: Path) -> set[str]: # construct a set because country names may be duplicated in the spreadsheet column # this potentially improves runtime performance when parsing the country names later return set(pd.read_excel(FILENAME)[COUNTRYNAME]) # Function to detect language of a word def detect_language(word): try: return detect(word) except: return 'unknown' translator = Translator() if __name__ == "__main__": names = all_names() | all_official_names() for country in get_df_countries(Path(FILENAME)): status = "valid" if country in names else "invalid" if status =="invalid": #print(f"{country} is {status}") # Translate country back into english translated_country = (translator.translate(country).text) print(translated_country)
Any ideas how i can show either the new translations replaced but still show all others countries already in english?