Python Forum
Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? (/thread-25165.html)

Pages: 1 2 3


RE: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - BrandonKastning - Mar-22-2020

(Mar-22-2020, 09:04 AM)ndc85430 Wrote: I'm confused. You wrote a function that you don't understand? It looks like you have the data in the variables allOpinion, allTitle and allURL, so why can't you call your function, passing in the values?

ndc85430,

I modified an existing reference function and since the bottom two lines look as if when the function is being called it's manual entered data.

When parsing the HTML to CSV as I have above this segment of code; I used the same variables. I don't know how to call the function with each of those variables.

Would I go about something like this:

CSV variables for Parse HTML Cycle #1:

allOpinion[0]
allTitle[0]
allURL

Reference Code Calling the Function w/ Custom Data:

insertVariblesIntoTable(2, 'Area 51M', 6999, '2019-04-14')
insertVariblesIntoTable(3, 'MacBook Pro', 2499, '2019-06-20')
Attempt #1 at Calling the Function w/ Parse HTML Cycle #1 Variables that CSV uses:

insertVariablesIntoTable('allTitle[0]', 'allOpinion[0]', 'allURL')
Adding the above code as the bottom line in my .py I receive the following error:

Quote:Traceback (most recent call last):
File "HTML2CSV-NoWrite3Variables-to-MySQL.Python.Variable.Passoff.MySQL.INSERT.py", line 61, in <module>
insertVariablesIntoTable('allTitle[0]', 'allOpinion[0]', 'allURL')
NameError: name 'insertVariablesIntoTable' is not defined

Do you have any suggestions as to how to remedy this?


RE: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - ndc85430 - Mar-22-2020

(Mar-22-2020, 09:19 AM)BrandonKastning Wrote:
insertVariablesIntoTable('allTitle[0]', 'allOpinion[0]', 'allURL')

Why the quotes? That's not how you refer to variables, is it?


Quote:Adding the above code as the bottom line in my .py I receive the following error:

Quote:Traceback (most recent call last):
File "HTML2CSV-NoWrite3Variables-to-MySQL.Python.Variable.Passoff.MySQL.INSERT.py", line 61, in <module>
insertVariablesIntoTable('allTitle[0]', 'allOpinion[0]', 'allURL')
NameError: name 'insertVariablesIntoTable' is not defined

Do you have any suggestions as to how to remedy this?

Does the function live in another file? You need to import it if so.


RE: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - BrandonKastning - Mar-22-2020

(Mar-22-2020, 09:23 AM)ndc85430 Wrote:
(Mar-22-2020, 09:19 AM)BrandonKastning Wrote:
insertVariablesIntoTable('allTitle[0]', 'allOpinion[0]', 'allURL')

Why the quotes? That's not how you refer to variables, is it?


Quote:Adding the above code as the bottom line in my .py I receive the following error:


Do you have any suggestions as to how to remedy this?

Does the function live in another file? You need to import it if so.

Line 37:

def insertVariblesIntoTable(allTitle, allOpinion, allURL):
Does this line not define that function?

To me it looks as if it reads "Define ""insertVariablesIntoTable""(variable1, variable2, variable3)

Do I need to add [0] to them for it to run properly? (Minus allURL since it's a allURL = url) rather than a BeautifulSoup4 HTML Parse Variable.


RE: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - ndc85430 - Mar-22-2020

Yes, line 37 defines the function. Not sure what you mean by the last question. Remember that you need to define the function before it's called, so if your calls were before that line, then they need to be after.


RE: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - BrandonKastning - Mar-22-2020

(Mar-22-2020, 09:37 AM)ndc85430 Wrote: Yes, line 37 defines the function. Not sure what you mean by the last question. Remember that you need to define the function before it's called, so if your calls were before that line, then they need to be after.

ndc85430,

I have since changed the last line of my .py to calling the function:

insertVariablesIntoTable()
This then results in the following error:

Quote:Traceback (most recent call last):
File "HTML2CSV-NoWrite3Variables-to-MySQL.Python.Variable.Passoff.MySQL.INSERT.py", line 62, in <module>
insertVariablesIntoTable()
NameError: name 'insertVariablesIntoTable' is not defined


I then go back in the .py and change line 37 from:

def insertVariblesIntoTable(allTitle, allOpinion, allURL):
to: (which is now line 38 w/ commenting line 37 out)

def insertVariablesIntoTable():
Then re-run the .py and receive the following error:

Quote:Failed to insert into MySQL table Failed processing format-parameters; Python 'resultset' cannot be converted to a MySQL type
MySQL connection is closed

Perhaps we are getting somewhere with this!


RE: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - ndc85430 - Mar-22-2020

Can you post all the code please? It's really hard to help without seeing it.


RE: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - BrandonKastning - Mar-22-2020

Current Code for "HTML2CSV-NoWrite3Variables-to-MySQL.Python.Variable.Passoff.MySQL.INSERT.py":

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/")
bsObj = BeautifulSoup(html.read())
allOpinion = bsObj.findAll(id="opinion")
import requests
from bs4 import BeautifulSoup

url = "http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/"
allTitle = bsObj.findAll({"title"})
allURL = url

print(allOpinion)
print(allTitle)
print(allURL)

import csv
csvRow = [allOpinion,allTitle,allURL]
csvfile = "current_F2_opinion_with_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)

print(allOpinion[0].get_text(),url)
 
import csv
csvRow = [allOpinion[0].get_text(),allTitle[0].get_text(),allURL]
csvfile = "current_F2_opinion_without_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)


import mysql.connector
from mysql.connector import Error

#def insertVariblesIntoTable(allTitle, allOpinion, allURL):
def insertVariablesIntoTable():
    try:
        connection = mysql.connector.connect(host='localhost',
                                             database='PythonMariaDB1',
                                             user='PythonMariaDB1',
                                             password='password1234')
        cursor = connection.cursor()
        mySql_insert_query = """INSERT INTO Single_No_Loop (all_Title, all_Opinion, all_URL) 
                                VALUES (%s, %s, %s) """

        recordTuple = (allTitle, allOpinion, allURL)
        cursor.execute(mySql_insert_query, recordTuple)
        connection.commit()
        print("Record inserted successfully into Single_No_Loop table")

    except mysql.connector.Error as error:
        print("Failed to insert into MySQL table {}".format(error))

    finally:
        if (connection.is_connected()):
            cursor.close()
            connection.close()
            print("MySQL connection is closed")

insertVariablesIntoTable()
Current Error:

Quote:Failed to insert into MySQL table Failed processing format-parameters; Python 'resultset' cannot be converted to a MySQL type
MySQL connection is closed



RE: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - ndc85430 - Mar-22-2020

On line 48, why aren't you extracting the text from allTitle and allOpinion in the same way that you're doing on line 27?

Also, remember to post the full traceback in future, as it contains important info about the error - like the line number it occurs on.

Also avoid using globals and pass the values into your function - the signature was sensible when you originally wrote it; I don't know why you changed that.


RE: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - BrandonKastning - Mar-22-2020

(Mar-22-2020, 10:38 AM)ndc85430 Wrote: On line 48, why aren't you extracting the text from allTitle and allOpinion in the same way that you're doing on line 27?

Also, remember to post the full traceback in future, as it contains important info about the error - like the line number it occurs on.

Also avoid using globals and pass the values into your function - the signature was sensible when you originally wrote it; I don't know why you changed that.

ndc85430,

I have since commented line #48 and line #49 reads as the following:

        recordTuple = (allTitle[0], allOpinion[0], allURL)
How do I post a full traceback? Is it a command switch I can append to my normal .py kickoff with my usual "python3 *.py"

Regarding globals; are you referring to "url = allURL" ? Does this qualify as a global variable?

How would I specifically avoid using a global for the use of the function itself?

The current .py looks like the following:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/")
bsObj = BeautifulSoup(html.read())
allOpinion = bsObj.findAll(id="opinion")
import requests
from bs4 import BeautifulSoup

url = "http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/"
allTitle = bsObj.findAll({"title"})
allURL = url

print(allOpinion)
print(allTitle)
print(allURL)

import csv
csvRow = [allOpinion,allTitle,allURL]
csvfile = "current_F2_opinion_with_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)

print(allOpinion[0].get_text(),url)
 
import csv
csvRow = [allOpinion[0].get_text(),allTitle[0].get_text(),allURL]
csvfile = "current_F2_opinion_without_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)


import mysql.connector
from mysql.connector import Error

#def insertVariblesIntoTable(allTitle, allOpinion, allURL):
def insertVariablesIntoTable():
    try:
        connection = mysql.connector.connect(host='localhost',
                                             database='PythonMariaDB1',
                                             user='PythonMariaDB1',
                                             password='password1234')
        cursor = connection.cursor()
        mySql_insert_query = """INSERT INTO Single_No_Loop (all_Title, all_Opinion, all_URL) 
                                VALUES (%s, %s, %s) """

#        recordTuple = (allTitle, allOpinion, allURL)
        recordTuple = (allTitle[0], allOpinion[0], allURL)
        cursor.execute(mySql_insert_query, recordTuple)
        connection.commit()
        print("Record inserted successfully into Single_No_Loop table")

    except mysql.connector.Error as error:
        print("Failed to insert into MySQL table {}".format(error))

    finally:
        if (connection.is_connected()):
            cursor.close()
            connection.close()
            print("MySQL connection is closed")

insertVariablesIntoTable()
The current error is as follows:

Quote:Failed to insert into MySQL table Failed processing format-parameters; Python 'tag' cannot be converted to a MySQL type
MySQL connection is closed



RE: Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? - ndc85430 - Mar-22-2020

Sigh. Is there a reason you aren't calling get_text on your variables on line 49 now, exactly like you're doing on line 27?

Yes, the variables declared on lines 9-11 are all global. You should declare your function to take parameters, like you had done in post 5 and then pass the values in. You can find much information on the internet about why using globals is bad.

Ah, you're handling the exception. I wasn't reading the code properly, so disregard my comment about the traceback. If you're interested, the standard library traceback module is useful when you want to print tracebacks during exception handling.