Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB?
#1
Hello Python Web Scraping Coders,

I am trying to determine how to write the 3 columns that I parse from HTML URL and send to a CSV; also / or directly into a MySQL (MariaDB) Database.

I have successfully loaded a working mysql.connector into my existing Python3 script and it connects to my Database Successfully! I have never gotten this far until now.

Below is my code and any tips in the right direction would be greatly appreciated!

Current Python3 Code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/")
bsObj = BeautifulSoup(html.read())
allOpinion = bsObj.findAll(id="opinion")
import requests
from bs4 import BeautifulSoup

url = "http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/"
allTitle = bsObj.findAll({"title"})
allURL = url

print(allOpinion)
print(allTitle)
print(allURL)

import csv
csvRow = [allOpinion,allTitle,allURL]
csvfile = "current_F2_opinion_with_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)

print(allOpinion[0].get_text(),url)
 
import csv
csvRow = [allOpinion[0].get_text(),allTitle[0].get_text(),allURL]
csvfile = "current_F2_opinion_without_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)


import mysql.connector
from mysql.connector import Error

try:
    connection = mysql.connector.connect(host='localhost',
                                         database='PythonMariaDB1',
                                         user='PythonMariaDB1',
                                         password='password1234')
    if connection.is_connected():
        db_Info = connection.get_server_info()
        print("Connected to MySQL Server version ", db_Info)
        cursor = connection.cursor()
        cursor.execute("select database();")
        record = cursor.fetchone()
        print("You're connected to database: ", record)

except Error as e:
    print("Error while connecting to MySQL", e)
finally:
    if (connection.is_connected()):
        cursor.close()
        connection.close()
        print("MySQL connection is closed")

Thank you!

Best Regards,

Brandon Kastning

P.S. - May your families be safe! God bless!
Quote
#2
Do you know any SQL? If not, you'll want to at least learn the basics of creating tables and inserting rows. Seems like your question is more of an SQL one than a Python one at this point.
BrandonKastning likes this post
Quote
#3
(Mar-22-2020, 07:08 AM)ndc85430 Wrote: Do you know any SQL? If not, you'll want to at least learn the basics of creating tables and inserting rows. Seems like your question is more of an SQL one than a Python one at this point.

ndc85430,

I do know some SQL. I found some descent reference material that I will share below; however it doesn't demonstrate in ways that I understand how to pass the current Python3 variables that I am working with. (allOpinion, allTitle and allURL) to a MySQL database table.

Referring back to the documentation I see that you are right on looking at the code better for MySQL. I am setting up to do an INSERT Python Variable pass off to a MySQL Table.

The Instructions are found here:

The code used to demonstrate this process is below:

import mysql.connector
from mysql.connector import Error

def insertVariblesIntoTable(id, name, price, purchase_date):
    try:
        connection = mysql.connector.connect(host='localhost',
                                             database='Electronics',
                                             user='pynative',
                                             password='pynative@#29')
        cursor = connection.cursor()
        mySql_insert_query = """INSERT INTO Laptop (Id, Name, Price, Purchase_date) 
                                VALUES (%s, %s, %s, %s) """

        recordTuple = (id, name, price, purchase_date)
        cursor.execute(mySql_insert_query, recordTuple)
        connection.commit()
        print("Record inserted successfully into Laptop table")

    except mysql.connector.Error as error:
        print("Failed to insert into MySQL table {}".format(error))

    finally:
        if (connection.is_connected()):
            cursor.close()
            connection.close()
            print("MySQL connection is closed")

insertVariblesIntoTable(2, 'Area 51M', 6999, '2019-04-14')
insertVariblesIntoTable(3, 'MacBook Pro', 2499, '2019-06-20')
I am setting up my Database as follows for this script:

PythonMariaDB1 (Database)
Single_No_Loop (Table)

allTitle - VARCHAR - 250 (Column #1)
allOpinion - LONGTEXT (Column #2)
allURL - VARCHAR - 250 (Column #3)

I will use the above example code as reference and hope I can integrate it properly into this existing one. I will update either way (work or fail).

Thank you for your response sir!

Best Regards,

Brandon Kastning
ndc85430 likes this post
Quote
#4
Good! Trying things out is the best way to learn.
BrandonKastning likes this post
Quote
#5
Update is as follows:

I have setup my MariaDB Database as follows (Couple changes):

Primary Key - opinion_ID / BIGINT / 8 / Unsigned / AUTO_INCREMENT
all_Title - VARCHAR / 250
all_Opinion - LONGTEXT
all_URL - VARCHAR / 250

and Modified the Python3 Variable -> MySQL INSERT to look like the following:

import mysql.connector
from mysql.connector import Error

def insertVariblesIntoTable(allTitle, allOpinion, allURL):
    try:
        connection = mysql.connector.connect(host='localhost',
                                             database='PythonMariaDB1',
                                             user='PythonMariaDB1',
                                             password='password1234')
        cursor = connection.cursor()
        mySql_insert_query = """INSERT INTO Single_No_Loop (all_Title, all_Opinion, all_URL) 
                                VALUES (%s, %s, %s) """

        recordTuple = (allTitle, allOpinion, allURL)
        cursor.execute(mySql_insert_query, recordTuple)
        connection.commit()
        print("Record inserted successfully into Single_No_Loop table")

    except mysql.connector.Error as error:
        print("Failed to insert into MySQL table {}".format(error))

    finally:
        if (connection.is_connected()):
            cursor.close()
            connection.close()
            print("MySQL connection is closed")

insertVariblesIntoTable(2, 'Area 51M', 6999, '2019-04-14')
insertVariblesIntoTable(3, 'MacBook Pro', 2499, '2019-06-20')
What I am unsure of is the bottom lines:

insertVariblesIntoTable(2, 'Area 51M', 6999, '2019-04-14')
insertVariblesIntoTable(3, 'MacBook Pro', 2499, '2019-06-20')
My question is; can I disregard those lines as they do not pertain to my original script that I am implementing this into? Will the above be sufficient or are other modifications required?
Quote
#6
I don't understand the question. You need to call the function somewhere for its code to execute.
Quote
#7
Here is my full HTML2CSV w/ Python3 Variable Passoff -> MySQL INSERT

(However it's not working when it gets to the database) and I left the bottom 2 lines out of the reference code because I cannot figure out how they apply to my variables.

Full .py as of now:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/")
bsObj = BeautifulSoup(html.read())
allOpinion = bsObj.findAll(id="opinion")
import requests
from bs4 import BeautifulSoup

url = "http://law.justia.com/cases/federal/appellate-courts/F2/999/663/308588/"
allTitle = bsObj.findAll({"title"})
allURL = url

print(allOpinion)
print(allTitle)
print(allURL)

import csv
csvRow = [allOpinion,allTitle,allURL]
csvfile = "current_F2_opinion_with_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)

print(allOpinion[0].get_text(),url)
 
import csv
csvRow = [allOpinion[0].get_text(),allTitle[0].get_text(),allURL]
csvfile = "current_F2_opinion_without_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)


import mysql.connector
from mysql.connector import Error

def insertVariblesIntoTable(allTitle, allOpinion, allURL):
    try:
        connection = mysql.connector.connect(host='localhost',
                                             database='PythonMariaDB1',
                                             user='PythonMariaDB1',
                                             password='password1234')
        cursor = connection.cursor()
        mySql_insert_query = """INSERT INTO Single_No_Loop (all_Title, all_Opinion, all_URL) 
                                VALUES (%s, %s, %s) """

        recordTuple = (allTitle, allOpinion, allURL)
        cursor.execute(mySql_insert_query, recordTuple)
        connection.commit()
        print("Record inserted successfully into Single_No_Loop table")

    except mysql.connector.Error as error:
        print("Failed to insert into MySQL table {}".format(error))

    finally:
        if (connection.is_connected()):
            cursor.close()
            connection.close()
            print("MySQL connection is closed")

Just a heads up:

The Python3 Script Variables:

- allOpinion
- allTitle
- allURL

The MySQL Database Column Names:

- opinion_ID
- all_Title
- all_Opinion
- all_URL

How would I go about debugging this? Are you able to determine the error by looking at this post?
Quote
#8
What happens when you run the program? Are you getting errors or something? I don't see anywhere you're calling the insertVariablesIntoTable function.
BrandonKastning likes this post
Quote
#9
(Mar-22-2020, 08:49 AM)ndc85430 Wrote: What happens when you run the program? Are you getting errors or something? I don't see anywhere you're calling the insertVariablesIntoTable function.

ndc85430,

When I run the program it runs without error. The CSV files populate and all is well. No mention of MySQL connection closing and the Database doesn't populate.

You mention that you do not see my program calling the "insertVariablesIntoTable" function. Same function that was on the last 2 lines that I didn't understand.

Since that's the case; how would I go about using "insertVariablesIntoTable" to generate the same data CSV is getting from those main variables?

Thanks ndc85430!
Quote
#10
I'm confused. You wrote a function that you don't understand? It looks like you have the data in the variables allOpinion, allTitle and allURL, so why can't you call your function, passing in the values?
BrandonKastning likes this post
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 79 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  BeautifulSoup4 plugin help Lathem01 2 174 Feb-16-2020, 11:56 AM
Last Post: snippsat
  Cannot import BeautifulSoup4 as bs4 and BeautifulSoup4 not in same directory B5473829 1 713 Jul-05-2019, 06:55 PM
Last Post: snippsat
  BeautifulSoup4, How to get an HTML tag with specific class. Broadsworde 6 2,842 Nov-22-2018, 05:25 PM
Last Post: snippsat
  How to use BeautifulSoup4 with pandas series type of html data? PrateekG 4 1,562 Apr-26-2018, 07:33 AM
Last Post: PrateekG
  What's a good practice project for learning BeautifulSoup4, which has a real use case league55 2 948 Jan-27-2018, 11:29 PM
Last Post: league55

Forum Jump:


Users browsing this thread: 1 Guest(s)