Hello,
Some rows might have an empty column.
How can I check it's filled with data?
file = open("output.txt", 'at',encoding='utf-8')
for row in cur.execute("select id,introtext,fulltext from content"):
soup = BeautifulSoup(row["introtext"], 'lxml')
file.write(soup.body.text)
#Not null?
#AttributeError: 'NoneType' object has no attribute 'text'
#if row["fulltext"]:
if len(row["fulltext"]):
soup = BeautifulSoup(row["fulltext"], 'lxml')
file.write(soup.body.text)
file.close()
Thank you.
Untested, but what about checking that
row
returns
True
?
for row in cur.execute("select id,introtext,fulltext from content"):
if row:
soup = BeautifulSoup(row["introtext"], 'lxml')
file.write(soup.body.text)
Checking "if row:" is useless. If row was falsey, you wouldn't be in the body of the loop and you couldn't perform the test. If you are performing the test you know that row is not falsey, so performing the test will be truey.
The solution depends on what you want to do. If all you care about is fulltext, you could write a check into your query.
for row in cur.exececute("SELECT fulltext FROM content WHERE fulltext <> '' AND fulltext IS NOT NULL"):
Now, if you are in the body of the loop, you know that fulltext is not empty.
But your query selects id, introtext and fulltext, which probably means you want the query to return introtext even if fulltext is empty. In this case you have to test the retuned values.
if row["fulltext"]:
do_something_with_full_text(row["fulltext"])
You can check if the column is filled with data by using the built-in python function len() to check the length of the column value. If the length is greater than zero, it means that the column is not empty and has data.
In the current code, you can see that the if statement if len(row["fulltext"]): is checking whether the fulltext column is filled with data before parsing it with BeautifulSoup and writing it to the output file.
You can also use if row["fulltext"] is not None: or if row["fulltext"] != None: or if row["fulltext"] != "": or if bool(row["fulltext"])
Note that if the column is filled with a whitespace or any whitespace characters, the above conditions will still consider it as filled.
Option 1: you can use the len() function to check if the fulltext column is filled with data before parsing it with BeautifulSoup
file = open("output.txt", 'at',encoding='utf-8')
for row in cur.execute("select id,introtext,fulltext from content"):
soup = BeautifulSoup(row["introtext"], 'lxml')
file.write(soup.body.text)
if len(row["fulltext"]) > 0:
soup = BeautifulSoup(row["fulltext"], 'lxml')
file.write(soup.body.text)
file.close()
Option 2: you can use if row["fulltext"] is not None
file = open("output.txt", 'at',encoding='utf-8')
for row in cur.execute("select id,introtext,fulltext from content"):
soup = BeautifulSoup(row["introtext"], 'lxml')
file.write(soup.body.text)
if row["fulltext"] is not None:
soup = BeautifulSoup(row["fulltext"], 'lxml')
file.write(soup.body.text)
file.close()
Thanks everyone.
It turns out that in some rows, the "fulltext" column holds either just a space or carriage return (can't tell on the CLI). As a result, sqlite considers it's not empty… but BS isn't happy since it's not legit HTML:
#Not null?
#BAD if row["fulltext"]:
#BAD if len(row["fulltext"]):
#AttributeError: 'NoneType' object has no attribute 'text'
#if row["fulltext"] is not None:
#AttributeError: 'NoneType' object has no attribute 'text'
#fulltext holds just an empty space or carriage return?
#| |
#sqlite> select fulltext from content where id=117;
if len(row["fulltext"]) > 0:
print("FULL",row["fulltext"])
soup = BeautifulSoup(row["fulltext"], 'lxml')
#file.write(soup.prettify()+"\n")
file.write(soup.body.text)
Found it: It contained a space
sqlite> select hex(fulltext) from content where id=117;
20
if row["fulltext"] and not row["fulltext"].isspace():
soup = BeautifulSoup(row["fulltext"], 'lxml')
file.write(soup.body.text)
Thank you.