Python Forum
Block of code, scope of variables and surprising exception
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Block of code, scope of variables and surprising exception
#1
Hi

I recently decided to learn Python, based on the Python documentation https://docs.python.org/3. I take the opportunity of this learning to code a script, of which the aim is to create a translation table to collate ancient greek.

The issue concerns the definition of what is a block of code, and its scope as regards the definition of variables.

Besides those definitions, it appears a very strange handling of a line of code, which both produces the correct output and raises an exception.

Hereunder is an extract from this script.

#!/usr/bin/ python3
# -*- coding: utf-8 -*-
def lireUnicode (bloc):
	symbol=None
	titre=None
	unicd=None
	if symbol==None:
		print('symbol is None')
	from html.parser import HTMLParser
	import requests
	class MyHTMLParser(HTMLParser):
	################# traitement des tags de début (<tag	…)
		def handle_starttag(self, tag, attrs):
			global detone
			nonlocal deb_car
			if tag == 'html':
				deb_car=False
			elif tag == 'section' and 'symbols-block__grid' in attrs[0]:
				deb_car=True
			elif tag == 'div' and deb_car: #'symbols-grid__item u0000 symbol-copy' in attrs[0] :
				print(attrs)
				for x in attrs:
					if x[0]== "data-symbol":
						symbol=x[1]
					elif x[0]=="title":
						titre=x[1]
					elif x[0]=="onclick":
						unicd='U+' + x[1].split("/")[2]
					else:
						pass
#				for x in attrs[1:4]:
#					print (x[0].replace("data-","") + " = '" + x[1] + "'")
#				for x in attrs[1:4]:
#					exec(x[0].replace("data-","") + " = '" + x[1] + "'")
				print(symbol)
				print(unicd)
				print(titre)
			else:
				pass
	######################### traitement des tags de fin (/> ou </tag>
		def handle_endtag(self, tag):
			nonlocal deb_car
			if tag == 'section':
				deb_car = False

	try:
		html=requests.get(bloc, timeout=2 )
	except requests.exceptions.ConnectionError:
		print("La connexion au site de l'Unicode n'a pas pu être établie")
		return False
	except requests.exceptions.Timeout:
		print("Le serveur de l'Unicode ne répond pas")
		return False
	else:
		pass
	deb_car=False
	parser=MyHTMLParser(convert_charrefs=True)
	parser.feed(html.text)
	parser.close()
To show up the issue, I ran
>>> import sys
>>> sys.path[0]='/home/remi/Documents/programmation/python' 
>>> import exemple
>>> exemple.lireUnicode("https://unicode-table.com/fr/blocks/greek-extended/")
symbol is None
[('class', 'symbols-grid__item u1c00 symbol-copy'), ('data-symbol', 'ἀ'), ('title', 'Lettre minuscule grecque alpha esprit doux'), ('onclick', "location.href='/fr/1F00/'")]
ἀ
U+1F00
Lettre minuscule grecque alpha esprit doux
[('class', 'symbols-grid__symbol')]
Traceback (most recent call last):
Here is the whole error message
Error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/remi/Documents/programmation/python/exemple.py", line 58, in lireUnicode parser.feed(html.text) File "/usr/lib/python3.8/html/parser.py", line 111, in feed self.goahead(0) File "/usr/lib/python3.8/html/parser.py", line 171, in goahead k = self.parse_starttag(i) File "/usr/lib/python3.8/html/parser.py", line 345, in parse_starttag self.handle_starttag(tag, attrs) File "/home/remi/Documents/programmation/python/exemple.py", line 35, in handle_starttag print(symbol) UnboundLocalError: local variable 'symbol' referenced before assignment Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook from apport.fileutils import likely_packaged, get_recent_crashes File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module> from apport.report import Report File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module> import apport.fileutils File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module> from apport.packaging_impl import impl as packaging File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module> import apt File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module> import apt_pkg ModuleNotFoundError: No module named 'apt_pkg' Original exception was: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/remi/Documents/programmation/python/exemple.py", line 58, in lireUnicode parser.feed(html.text) File "/usr/lib/python3.8/html/parser.py", line 111, in feed self.goahead(0) File "/usr/lib/python3.8/html/parser.py", line 171, in goahead k = self.parse_starttag(i) File "/usr/lib/python3.8/html/parser.py", line 345, in parse_starttag self.handle_starttag(tag, attrs) File "/home/remi/Documents/programmation/python/exemple.py", line 35, in handle_starttag print(symbol) UnboundLocalError: local variable 'symbol' referenced before assignment >>>
What do you think of all of that.

Arbiel
Reply
#2
The error traceback begins at line 50.
However the code you present contains the error line at line 47.
Please submit actual code that causes the error, it could be in the missing lines.
error line numbers must match code.
Reply
#3
Hi Larz60+

Thank you for your input.

I updated my first post and introduced the 4 comment lines 25 to 28, which are old lines of code. I reran the test and the contents of this first post is the output of this rerun.

Arbiel

P.S : In a new test, I've just shown that the error arises at the second run.
Reply
#4
immediately after line 3 add
    symbol = None
Reply
#5
Hi Larz60+

I beg your pardon Larz60+, the script certainly runs correctly. I'm just stupid. As you can see in the hereunder post, each second line of the html area which I analyze differs from what I expect.

Don't waste your time anylonger. I'll return if I encounter more difficulties

Arbiel

I included lines as you suggested, and added some print() commands. I updated my first post accordingly.

I ran a new test, which shows that
1)the error arises at the second run.
2)at this second run, attrs only contains 1 item, and the second term of the tuple is quite surprising as can be seen in the hereunder display of the beginning of the section I analyze.
Quote: <section class="symbols-block__grid">
<div
class="symbols-grid__item u1c00 symbol-copy"
data-symbol="ἀ"
title="Lettre minuscule grecque alpha esprit doux"
onclick="location.href='/fr/1F00/'" >
<div class="symbols-grid__symbol">ἀ</div> <div class="symbols-grid__symbol-code">U+1F00</div>
</div>
<div
class="symbols-grid__item u1c00 symbol-copy"
data-symbol="ἁ"
title="Lettre minuscule grecque alpha esprit rude"
onclick="location.href='/fr/1F01/'" >
<div class="symbols-grid__symbol">ἁ</div> <div class="symbols-grid__symbol-code">U+1F01</div>
</div>
<div
class="symbols-grid__item u1c00 symbol-copy"
data-symbol="ἂ"
title="Lettre minuscule grecque alpha esprit doux et accent grave"
onclick="location.href='/fr/1F02/'" >
<div class="symbols-grid__symbol">ἂ</div> <div class="symbols-grid__symbol-code">U+1F02</div>
</div>

This modifies the question : how come that at the second run, attrs does not contains the 3 items as it should ?

Arbiel
Reply
#6
Why are you declaring the class inside your function? It's unnecessary (for a start, that means that class will be redefined every time that function is called) and perhaps more importantly here, interrupts the readability of your function - seeing all the details takes your focus away from what the function is actually for.
Reply
#7
Hi ndc85430

Thank you for your input. I'm presently learning Python, and I obviously make mistakes.

I perfectly understand your point, and will modify my code, and my way of coding in the future.

In the meantime, I realized the mistake I did in forgetting to read thorougly the html pages I am analyzing in this script : the succeding <div> do not have all 4 attributes as I was imaging.
Reply
#8
(Apr-04-2020, 04:19 PM)arbiel Wrote: will modify my code, and my way of coding in the future.

While doing it also refactor your (current and future) code to not use global and nonlocal.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#9
Hi

I think I have at last understood, and I hope this new code conforms to your advice :

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#pour classer les mots d'une phrase par ordre alphabétique
#liste=str.split(phrase)
#liste.sort(key=grec_vocab.atone)

def les_dia():
	"""Noms anglais des diacritiques."""
	return  {'DASIA', 'DIALYTIKA', 'MACRON', 'OXIA', 'PERISPOMENI', 'PROSGEGRAMMENI', 'PSILI', 'TONOS', 'VARIA', 'VRACHY', 'YPOGEGRAMMENI'}
def les_symb():
	"""Symboles qui portent le nom de lettres."""
	return {"BETA", "EPSILON", "KAPPA", "PHI", "PI", "RHO", "SIGMA", "THETA", "UPSILON"}
def les_grecques():
	"""Fichier des lettres et leurs caractéristiques"""
	return '/home/remi/Documents/programmation/python/grec_vocab/grec.xhtml'
def les_unicodes():
	"""Url des pages de définition des lettres"""
	return "https://unicode-table.com/fr/blocks/greek-coptic/", "https://unicode-table.com/fr/blocks/greek-extended/"
def la_table():
	"""Table de correspondance."""
	return '/home/remi/Documents/programmation/python/grec_vocab/tabTrans.py'
def les_determ():
	"""Éléments du nom qui déterminent la traduction."""
	return {"CAPITAL", "WITH", "SYMBOL"}
def les_minus():
	"""Début du nom des lettres minuscules."""
	return {"GREEK SMALL LETTER " }
def le_rep():
	"""Répertoire du script."""
	return '/home/remi/Documents/programmation/python/grec_vocab'

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
################# traitement des tags de début (<tag	…)
	dic=dict()
	def handle_starttag(self, tag, attrs):
		if tag == 'div' and len(attrs) > 2 and "disabled" not in attrs[0][1] and attrs[1][0] == 'data-symbol':
			symbol=attrs[1][1]
			titre=attrs[2][1]
			unicd=attrs[3][1]
			atone=atonique(symbol)
			if atone[0]!=symbol:
				self.dic[symbol]=atone[0]
	def fin (self):
		return self.dic

def lire_tabTrans():
	import pickle
	try:
		with open(la_table(),"rb") as traduc:
			données=pickle.Unpickler(traduc)
			dictionnaire=données.load()
	except FileNotFoundError:
		dictionnaire=[]
	return dictionnaire

def enreg_tabTrans(dictionnaire):
	import pickle
	with open(la_table(),"wb") as les_données:
		données=pickle.Pickler(les_données)
		données.dump(dictionnaire)
	return 0

def atonique (polytonique: str) -> str:
	"""Caractère rendu minuscule et exempt de diacritiques.

		Recherche dans le nom standard Unicode, en anglais, de la présence de mots nécessitant le traitement"""
	import unicodedata
	nom_Unicode=unicodedata.name(polytonique)
	liste_termes=set(nom_Unicode.split(None, -1))
	liste_dia={}
	nat="minus"
	lettre=""
	if len(les_determ().intersection(liste_termes)) > 0:
		if "SYMBOL" in nom_Unicode:
			symbole=les_symb().intersection (liste_termes)
			if len(symbole) > 0:
				nom_Unicode="GREEK SMALL LETTER " + symbole.pop()
		else:
			if "WITH" in nom_Unicode:
				nom_Unicode=nom_Unicode[: nom_Unicode.find("WITH")-1]
				liste_dia=liste_termes.intersection(les_dia())
			if "CAPITAL" in nom_Unicode:
				nom_Unicode=nom_Unicode.replace("CAPITAL", "SMALL")
				nat="majus"
	try:
		LeCaract=unicodedata.lookup(nom_Unicode)
	except:
# le nom du caractère minuscule Yot ('GREEK LETTER YOT) ne contient pas le mot SMALL
		LeCaract="ϳ"
	return LeCaract, lettre, liste_dia, nat

def gen_xml (nom, symbole, code, dia, nat):
	xml="<carac "
	xml=xml+"id='" + symbole + "' "
	xml=xml +"titre='" + nom + "' "
	print("<carac id='" + symbole +"' titre='" + nom + "' unicode='" + code.split("/")[2] +"' />")

def gen_dic_trad ():
	import requests
	parser=MyHTMLParser(convert_charrefs=True)
	for bloc in les_unicodes():
		try:
			html=requests.get(bloc, timeout=2 )
		except requests.exceptions.ConnectionError:
			print("La connexion au site de l'Unicode n'a pas pu être établie")
			return False
		except requests.exceptions.Timeout:
			print("Le serveur de l'Unicode ne répond pas")
			return False
		else:
			pass
		parser.feed(html.text)
	table=parser.fin()
	parser.close()
	enreg_tabTrans(table)
	return table

def atone(mot_polytone: str)  -> str:
	dictrans=lire_tabTrans()
	if len(dictrans) == 0:
		dictrans=gen_dic_trad()
	table=str.maketrans(dictrans)
	return mot_polytone.translate(table)

def charge():
	importlib.reload("grec_vocab")

import sys

if len(sys.path[0]) == 0:
	sys.path[0]=le_rep()
	import importlib
import grec_vocab as gr

if __name__ == "__main__":
	liste=sys.argv[1:]
	liste.sort(key=atone)
	print(liste)
Arbiel
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to create a variable only for use inside the scope of a while loop? Radical 10 1,679 Nov-07-2023, 09:49 AM
Last Post: buran
  Library scope mike_zah 2 830 Feb-23-2023, 12:20 AM
Last Post: mike_zah
  python multiple try except block in my code -- can we shorten code mg24 10 6,083 Nov-10-2022, 12:48 PM
Last Post: DeaD_EyE
  Scope of variable confusion Mark17 10 2,826 Feb-24-2022, 06:03 PM
Last Post: deanhystad
  "If Len(Word) == 0" Code Block in Pig Latin Program new_coder_231013 3 2,052 Jan-02-2022, 06:03 PM
Last Post: deanhystad
  Variable scope issue melvin13 2 1,529 Nov-29-2021, 08:26 PM
Last Post: melvin13
  Try,Except,Else to check that user has entered either y or n (Code block pasted) RandomNameGenerator 3 2,325 Jun-29-2021, 08:21 PM
Last Post: RandomNameGenerator
  Variable scope - "global x" didn't work... ptrivino 5 3,029 Dec-28-2020, 04:52 PM
Last Post: ptrivino
  Python Closures and Scope muzikman 2 1,799 Dec-14-2020, 11:21 PM
Last Post: muzikman
  code with exception arguments doen't work MaartenRo 1 1,938 Aug-09-2020, 06:06 AM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020