Python Forum
Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB?
#1
Hello Python Web Scraping Coders,

I am trying to determine how to write the 3 columns that I parse from HTML URL and send to a CSV; also / or directly into a MySQL (MariaDB) Database.

I have successfully loaded a working mysql.connector into my existing Python3 script and it connects to my Database Successfully! I have never gotten this far until now.

Below is my code and any tips in the right direction would be greatly appreciated!

Current Python3 Code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/")
bsObj = BeautifulSoup(html.read())
allOpinion = bsObj.findAll(id="opinion")
import requests
from bs4 import BeautifulSoup

url = "http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/"
allTitle = bsObj.findAll({"title"})
allURL = url

print(allOpinion)
print(allTitle)
print(allURL)

import csv
csvRow = [allOpinion,allTitle,allURL]
csvfile = "current_F2_opinion_with_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)

print(allOpinion[0].get_text(),url)
 
import csv
csvRow = [allOpinion[0].get_text(),allTitle[0].get_text(),allURL]
csvfile = "current_F2_opinion_without_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)


import mysql.connector
from mysql.connector import Error

try:
    connection = mysql.connector.connect(host='localhost',
                                         database='PythonMariaDB1',
                                         user='PythonMariaDB1',
                                         password='password1234')
    if connection.is_connected():
        db_Info = connection.get_server_info()
        print("Connected to MySQL Server version ", db_Info)
        cursor = connection.cursor()
        cursor.execute("select database();")
        record = cursor.fetchone()
        print("You're connected to database: ", record)

except Error as e:
    print("Error while connecting to MySQL", e)
finally:
    if (connection.is_connected()):
        cursor.close()
        connection.close()
        print("MySQL connection is closed")
Thank you!

Best Regards,

Brandon Kastning

P.S. - May your families be safe! God bless!
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#2
Do you know any SQL? If not, you'll want to at least learn the basics of creating tables and inserting rows. Seems like your question is more of an SQL one than a Python one at this point.
Reply
#3
(Mar-22-2020, 07:08 AM)ndc85430 Wrote: Do you know any SQL? If not, you'll want to at least learn the basics of creating tables and inserting rows. Seems like your question is more of an SQL one than a Python one at this point.

ndc85430,

I do know some SQL. I found some descent reference material that I will share below; however it doesn't demonstrate in ways that I understand how to pass the current Python3 variables that I am working with. (allOpinion, allTitle and allURL) to a MySQL database table.

Referring back to the documentation I see that you are right on looking at the code better for MySQL. I am setting up to do an INSERT Python Variable pass off to a MySQL Table.

The Instructions are found here:

The code used to demonstrate this process is below:

import mysql.connector
from mysql.connector import Error

def insertVariblesIntoTable(id, name, price, purchase_date):
    try:
        connection = mysql.connector.connect(host='localhost',
                                             database='Electronics',
                                             user='pynative',
                                             password='pynative@#29')
        cursor = connection.cursor()
        mySql_insert_query = """INSERT INTO Laptop (Id, Name, Price, Purchase_date) 
                                VALUES (%s, %s, %s, %s) """

        recordTuple = (id, name, price, purchase_date)
        cursor.execute(mySql_insert_query, recordTuple)
        connection.commit()
        print("Record inserted successfully into Laptop table")

    except mysql.connector.Error as error:
        print("Failed to insert into MySQL table {}".format(error))

    finally:
        if (connection.is_connected()):
            cursor.close()
            connection.close()
            print("MySQL connection is closed")

insertVariblesIntoTable(2, 'Area 51M', 6999, '2019-04-14')
insertVariblesIntoTable(3, 'MacBook Pro', 2499, '2019-06-20')
I am setting up my Database as follows for this script:

PythonMariaDB1 (Database)
Single_No_Loop (Table)

allTitle - VARCHAR - 250 (Column #1)
allOpinion - LONGTEXT (Column #2)
allURL - VARCHAR - 250 (Column #3)

I will use the above example code as reference and hope I can integrate it properly into this existing one. I will update either way (work or fail).

Thank you for your response sir!

Best Regards,

Brandon Kastning
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#4
Good! Trying things out is the best way to learn.
Reply
#5
Update is as follows:

I have setup my MariaDB Database as follows (Couple changes):

Primary Key - opinion_ID / BIGINT / 8 / Unsigned / AUTO_INCREMENT
all_Title - VARCHAR / 250
all_Opinion - LONGTEXT
all_URL - VARCHAR / 250

and Modified the Python3 Variable -> MySQL INSERT to look like the following:

import mysql.connector
from mysql.connector import Error

def insertVariblesIntoTable(allTitle, allOpinion, allURL):
    try:
        connection = mysql.connector.connect(host='localhost',
                                             database='PythonMariaDB1',
                                             user='PythonMariaDB1',
                                             password='password1234')
        cursor = connection.cursor()
        mySql_insert_query = """INSERT INTO Single_No_Loop (all_Title, all_Opinion, all_URL) 
                                VALUES (%s, %s, %s) """

        recordTuple = (allTitle, allOpinion, allURL)
        cursor.execute(mySql_insert_query, recordTuple)
        connection.commit()
        print("Record inserted successfully into Single_No_Loop table")

    except mysql.connector.Error as error:
        print("Failed to insert into MySQL table {}".format(error))

    finally:
        if (connection.is_connected()):
            cursor.close()
            connection.close()
            print("MySQL connection is closed")

insertVariblesIntoTable(2, 'Area 51M', 6999, '2019-04-14')
insertVariblesIntoTable(3, 'MacBook Pro', 2499, '2019-06-20')
What I am unsure of is the bottom lines:

insertVariblesIntoTable(2, 'Area 51M', 6999, '2019-04-14')
insertVariblesIntoTable(3, 'MacBook Pro', 2499, '2019-06-20')
My question is; can I disregard those lines as they do not pertain to my original script that I am implementing this into? Will the above be sufficient or are other modifications required?
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#6
I don't understand the question. You need to call the function somewhere for its code to execute.
Reply
#7
Here is my full HTML2CSV w/ Python3 Variable Passoff -> MySQL INSERT

(However it's not working when it gets to the database) and I left the bottom 2 lines out of the reference code because I cannot figure out how they apply to my variables.

Full .py as of now:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/")
bsObj = BeautifulSoup(html.read())
allOpinion = bsObj.findAll(id="opinion")
import requests
from bs4 import BeautifulSoup

url = "http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/"
allTitle = bsObj.findAll({"title"})
allURL = url

print(allOpinion)
print(allTitle)
print(allURL)

import csv
csvRow = [allOpinion,allTitle,allURL]
csvfile = "current_F2_opinion_with_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)

print(allOpinion[0].get_text(),url)
 
import csv
csvRow = [allOpinion[0].get_text(),allTitle[0].get_text(),allURL]
csvfile = "current_F2_opinion_without_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)


import mysql.connector
from mysql.connector import Error

def insertVariblesIntoTable(allTitle, allOpinion, allURL):
    try:
        connection = mysql.connector.connect(host='localhost',
                                             database='PythonMariaDB1',
                                             user='PythonMariaDB1',
                                             password='password1234')
        cursor = connection.cursor()
        mySql_insert_query = """INSERT INTO Single_No_Loop (all_Title, all_Opinion, all_URL) 
                                VALUES (%s, %s, %s) """

        recordTuple = (allTitle, allOpinion, allURL)
        cursor.execute(mySql_insert_query, recordTuple)
        connection.commit()
        print("Record inserted successfully into Single_No_Loop table")

    except mysql.connector.Error as error:
        print("Failed to insert into MySQL table {}".format(error))

    finally:
        if (connection.is_connected()):
            cursor.close()
            connection.close()
            print("MySQL connection is closed")
Just a heads up:

The Python3 Script Variables:

- allOpinion
- allTitle
- allURL

The MySQL Database Column Names:

- opinion_ID
- all_Title
- all_Opinion
- all_URL

How would I go about debugging this? Are you able to determine the error by looking at this post?
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#8
What happens when you run the program? Are you getting errors or something? I don't see anywhere you're calling the insertVariablesIntoTable function.
Reply
#9
(Mar-22-2020, 08:49 AM)ndc85430 Wrote: What happens when you run the program? Are you getting errors or something? I don't see anywhere you're calling the insertVariablesIntoTable function.

ndc85430,

When I run the program it runs without error. The CSV files populate and all is well. No mention of MySQL connection closing and the Database doesn't populate.

You mention that you do not see my program calling the "insertVariablesIntoTable" function. Same function that was on the last 2 lines that I didn't understand.

Since that's the case; how would I go about using "insertVariablesIntoTable" to generate the same data CSV is getting from those main variables?

Thanks ndc85430!
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#10
I'm confused. You wrote a function that you don't understand? It looks like you have the data in the variables allOpinion, allTitle and allURL, so why can't you call your function, passing in the values?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question Python Obstacles | Jeet-Kune-Do | BS4 (Tags > MariaDB) [URL/Local HTML] BrandonKastning 0 1,400 Feb-08-2022, 08:55 PM
Last Post: BrandonKastning
  Beautifulsoup4 help samuelbachorik 1 1,326 Feb-05-2022, 10:44 PM
Last Post: snippsat
Question Securing State Constitutions (USA) from University of Maryland > MariaDB .sql BrandonKastning 1 1,495 Jan-21-2022, 06:34 PM
Last Post: BrandonKastning
Question Scraping Columns with Pandas (Column Entries w/ more than 1 word writes two columns) BrandonKastning 7 3,103 Jan-13-2022, 10:52 PM
Last Post: BrandonKastning
Exclamation Debian 10 Buster Environment - Python 3.x (MariaDB 10.4.21) | Working Connector? BrandonKastning 9 4,117 Jan-04-2022, 08:27 PM
Last Post: BrandonKastning
Lightbulb Python Obstacles | Kung-Fu | Full File HTML Document Scrape and Store it in MariaDB BrandonKastning 5 2,817 Dec-29-2021, 02:26 AM
Last Post: BrandonKastning
  Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB BrandonKastning 8 3,090 Nov-22-2021, 01:38 AM
Last Post: BrandonKastning
  cleaning HTML pages using lxml and XPath wenkos 2 2,318 Aug-25-2021, 10:54 AM
Last Post: wenkos
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,529 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Build a simple Webapp with Python Flask and mariaDB newbie1 3 3,317 Jun-04-2020, 09:34 PM
Last Post: lmolter54

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020