Python Forum
Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB
#1
Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB

I figure the best way to learn is to make a thread for every obstacle I am up against. Possibly basics for others; Difficult for types like me.

I will be using Beautiful Soup 4 to tackle this. I will scrape basic HTML and then once successful move up to a dataset of SGML files as difficulty increases.

There are two types of data inserts to MariaDB that I want to learn:

a) 1 Column Specific Insert (1 HTML Tag for target / read) and 1 Carry over Insert to a specific Table and Column.

b) Row by Row Inserts on a new Unique Table [Table representing the file] (each row representing 1 line per the file scraped or value desired)

& The above a & b w/ full tag extraction & carry over INSERT's (i.e. the actual tags with the contents inside the tags) -- (for database fetching and building of HTML pages).

I am a Disabled American Constitutional Law Student... most of my threads will be related to Law, Court Opinions, Court Rules, etc. (.gov) (American Government web URL's) or 3rd Party Resources for Legal Resources.

Thank you all for having me!

Best Regards,

Brandon
Progress so far:

01_Karate:

Source Blogs/Tutorials:

https://www.geeksforgeeks.org/beautifuls...from-html/


Goal: Scrape specific HTML

Target URL: https://law.justia.com/constitution/us/preamble.html

Target Paragraph: "Preamble"

Target Datastore: MariaDB

as regular user (linux/bsd) (in my case: brandon),

# pip install bs4

# pip install urllib (However doesn't work for me on Debian 9.13 Stretch w/ Python 3.9.9)
and then Code so far:

# importing modules
import urllib.request 
from bs4 import BeautifulSoup
  
# providing url
#url = "https://www.geeksforgeeks.org/how-to-automate-an-excel-sheet-in-python/?ref=feed"
url = "https://law.justia.com/constitution/us/preamble.html" 

# opening the url for reading
html = urllib.request.urlopen(url)
  
# parsing the html file
htmlParse = BeautifulSoup(html, 'html.parser')
  
# getting all the paragraphs
for para in htmlParse.find_all("p"):
    print(para.get_text())
So far the results are :

This is pulling all the paragraph tags. using "htmlParse.find_all("p)" ; I am guessing there is a htmlParse.find("p)" (Hopefully with the ability to select 1st paragraph, 2nd paragraph, etc) using [1], [2], [3] type element selectors.

and no pass through of data to a datastore yet!
Update with Partial Success:

Source Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Updated Karate.py:

# importing modules
import urllib.request 
from bs4 import BeautifulSoup
  
# providing url
#url = "https://www.geeksforgeeks.org/how-to-automate-an-excel-sheet-in-python/?ref=feed"
url = "https://law.justia.com/constitution/us/preamble.html" 

# opening the url for reading
html = urllib.request.urlopen(url)
  
# parsing the html file
htmlParse = BeautifulSoup(html, 'html.parser')
  
# getting all the paragraphs
#for para in htmlParse.find_all("p"):
#    print(para.get_text())
for para in htmlParse.p:
    print(para)
Now it's extracting the 1st paragraph (the one I wanted); U.S. Federal Constitution of September 17, 1787's Preamble (The same Constitution that is The Supreme Law of The Land that my Countrymen and Woman have deviated from in error).

I would not know how to extract paragraph #2 using this code. I just know that changing the code to htmlParse.p is now picking up the 1st Paragraph. I would like to know how to individually select paragraph #2 so I better understand what I am doing.

We are now here:

brandon@FireDragon:~/Python/01_Karate$ python3 karate.py
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.
brandon@FireDragon:~/Python/01_Karate$

Update/Progress:

Sources (Tutorials/Blogs/etc):

https://stackoverflow.com/questions/4954...into-mysql

Install pymysql :

brandon@FireDragon:~/Python/01_Karate$ pip install pymysql
Defaulting to user installation because normal site-packages is not writeable
Collecting pymysql
  Downloading PyMySQL-1.0.2-py3-none-any.whl (43 kB)
     |████████████████████████████████| 43 kB 251 kB/s 
Installing collected packages: pymysql
Successfully installed pymysql-1.0.2
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python3.9 -m pip install --upgrade pip' command.
brandon@FireDragon:~/Python/01_Karate$
Create a MariaDB 10.4.22 (My version, anyhow) database called: Battle_Python1: (I couldn't get the following code to work; if you know why, please let me know; I used Portable Version 9.5.0.5196 HeidiSQL in Wine32 on Debian 9.13 Stretch currently which is a very nice Free and Lightweight GUI that manages MySQL/MariaDB and doesn't crash and I have put it to heavy test(s) so far in my pursuits for Big Data Creation & Management Skills.

CREATE TABLE `Karate` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `bs4_paragraph_1` text COLLATE utf8mb4_unicode_ci NULL,
  `bs4_python_timestamp` timestamp COLLATE utf8_unicode_ci CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8_unicode_ci
  AUTO_INCREMENT=1;
Screen shots of my database built in HeidiSQL (I took a standard CREATE TABLE example and changed the values to match the ones I have setup). I should eventually hone into my manual SQL Queries upgrades eventually. I ran the query on Karate2 and it didn't do anything. I am not sure why I have failed on such a feat.

#1:

[Image: 1-2021-11-20-08-35-17.png]

#2:

[Image: 2-2021-11-20-08-36-20.png]


[b]#3 (I added a new Column and changed the name of another *Update after posting the above 2 links*):


[Image: 3-2021-11-20-09-42-24.png]

Now to update our Python script, Karate.py:[/b]

# importing modules
import urllib.request 
import pymysql
from bs4 import BeautifulSoup
  
# providing url
#url = "https://www.geeksforgeeks.org/how-to-automate-an-excel-sheet-in-python/?ref=feed"
url = "https://law.justia.com/constitution/us/preamble.html" 

# opening the url for reading
html = urllib.request.urlopen(url)
  
# parsing the html file
htmlParse = BeautifulSoup(html, 'html.parser')
  
# getting all the paragraphs
#for para in htmlParse.find_all("p"):
#    print(para.get_text())
for para in htmlParse.p:
    print(para)

# Connection to database
connection = pymysql.connect(host='localhost',
                 user='brandon',
                 password='password',
                 db='Battle_Python1',
#                 charset='latin1',
                 charset='utf8mb4',

# I use the database type utf8mb4_unicode_ci when creating a New MariaDB Database; I have read that it allows Special Characters and Fulltext Search combined abilities 
                 cursorclass=pymysql.cursors.DictCursor)

# Checking Code / Error Free
print ("The code is Error free to this line!")
Continuing on...


Let's run the update of Karate.py:

brandon@FireDragon:~/Python/01_Karate$ python3 karate.py
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.
The code is Error free to this line!
brandon@FireDragon:~/Python/01_Karate$
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#2
Update / Progress:

Source (Documentation): https://pymysql.readthedocs.io/en/latest...mples.html

- Add import pymysql.cursors to Karate.py
- Add the following code that does not work, throws an error; yet has potential if remedied!

#Added
import pymysql.cursors
#Added
with connection:
    with connection.cursor() as cursor:
        # Attempt to pass over the data from para (the defined variable that holds the beautiful soup 4 scraped paragraph #1 tag) to MariaDB using pymysql
       # sql = "INSERT INTO `users` (`email`, `password`) VALUES (%s, %s)"
#        sql = "INSERT INTO `Karate` (`bs4_paragraph_1_text_only`) VALUES (%s)"
        sql = "INSERT INTO `Karate` (`bs4_paragraph_1_text_only) VALUES (%s)"
        cursor.execute(sql, ('bs4_paragraph_1_text_only'))

    # connection is not autocommit by default. So you must commit to save
    # your changes.
    connection.commit()
I do not know what is causing this problem! I am getting the following errors:

brandon@FireDragon:~/Python/01_Karate$ python3 karate.py
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.
Traceback (most recent call last):
  File "karate.py", line 43, in <module>
    cursor.execute(sql, ('bs4_paragraph_1_text_only'))
  File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 166, in execute
    result = self._query(query)
  File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 322, in _query
    conn.query(q)
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 852, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1053, in _read_query_result
    result.read()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1336, in read
    first_packet = self.connection._read_packet()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1010, in _read_packet
    packet.check_error()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 393, in check_error
    err.raise_mysql_exception(self._data)
  File "/usr/lib/python3/dist-packages/pymysql/err.py", line 107, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '`bs4_paragraph_1_text_only) VALUES ('bs4_paragraph_1_text_only')' at line 1")
brandon@FireDragon:~/Python/01_Karate$
What could possibly be wrong with my syntax?
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#3
After troubleshooting; I was able to come up with working code, moving forward!

I declared the Variable para as the column name that I am inserting into and an update to the code, all works!

Source: (Blogs/Tutorials):

https://www.quora.com/How-do-I-insert-sc...ing-Python

Working code of karate.py:

# importing modules
import urllib.request 
import pymysql
import pymysql.cursors
from bs4 import BeautifulSoup
  
# providing url
#url = "https://www.geeksforgeeks.org/how-to-automate-an-excel-sheet-in-python/?ref=feed"
url = "https://law.justia.com/constitution/us/preamble.html" 

# opening the url for reading
html = urllib.request.urlopen(url)
  
# parsing the html file
htmlParse = BeautifulSoup(html, 'html.parser')
  
# getting all the paragraphs
#for para in htmlParse.find_all("p"):
#    print(para.get_text())
for para in htmlParse.p:
    print(para)


# Connection to database
connection = pymysql.connect(host='localhost',
                 user='brandon',
                 password='password',
                 db='Battle_Python1',
#                 charset='latin1',
                 charset='utf8mb4',


# I use the database type utf8mb4_unicode_ci when creating a New MariaDB Database; I have read that it allows Special Characters and Fulltext Search combined abilities 
                 cursorclass=pymysql.cursors.DictCursor)


# BeautifulSoup4 tag text grab + pymysql mariadb insert

bs4_paragraph_1_text_only = para

try: 
    with connection.cursor() as cursor: 
            sql = "INSERT INTO `Karate` (`bs4_paragraph_1_text_only`) VALUES (%s)" 
            cursor.execute(sql, (bs4_paragraph_1_text_only)) 
    connection.commit() 
finally: 
    connection.close() 

# Checking Code / Error Free
print ("The code is Error free to this line!")
Now for grabbing the text WITH the tags and storing it to MariaDB + Python 3.9.9 w/ pymysql

Working Output:

brandon@FireDragon:~/Python/01_Karate$ python karate.py
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.
The code is Error free to this line!
brandon@FireDragon:~/Python/01_Karate$
MariaDB Evidence:

[Image: 4-2021-11-20-14-57-43.png]

Thank you everyone for this Forum! :)
snippsat likes this post
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#4
Good update progress👍

Here some advice about the parsing part,so using Requests and lxml(as parser).
So Requests is better choice always(more updated for the new web) as can save problem in future like eg encoding(usually utf-8).
lxml as is the fasted parser in Python.
Example and find_all('p') will give unwanted result ,so can go in with CSS selector to get paragraph wanted.
import requests
from bs4 import BeautifulSoup

url = "https://law.justia.com/constitution/us/preamble.html"
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')
for para in soup.find("p"):
    print(para)
print('-' * 30)
# Bs has select() and select_one() to choice CSS selectors
purpose = soup.select_one('div.us-constitution > p:nth-child(5)')
print(purpose.text)
Output:
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America. ------------------------------ Although the preamble is not a source of power for any department of the Federal Government,1 the Supreme Court has often referred to it as evidence of the origin, scope, and purpose of the Constitution.2 “Its true office,” wrote Joseph Story in his Commentaries, “is to expound the nature and extent and application of the powers actually conferred by the Constitution, and not substantively to create them. For example, the preamble declares one object to be, ‘provide for the common defense.’ No one can doubt that this does not enlarge the powers of Congress to pass any measures which they deem useful for the common defence. But suppose the terms of a given power admit of two constructions, the one more restrictive, the other more liberal, and each of them is consistent with the words, but is, and ought to be, governed by the intent of the power; if one could promote and the other defeat the common defence, ought not the former, upon the soundest principles of interpretation, to be adopted?”3
BrandonKastning likes this post
Reply
#5
snippsat! Thank you for this and I appreciate the good progress remark! :)

You have made the forum welcoming last time(s) I have been here and continuing this time. Much appreciated!

I believe lxml will be needed for a future thread for parsing SGML unless BS4 can handle it.

Thanks for the tips!

Best Regards,

Brandon Kastning

(Nov-21-2021, 12:28 PM)snippsat Wrote: Good update progress👍

Here some advice about the parsing part,so using Requests and lxml(as parser).
So Requests is better choice always(more updated for the new web) as can save problem in future like eg encoding(usually utf-8).
lxml as is the fasted parser in Python.
Example and find_all('p') will give unwanted result ,so can go in with CSS selector to get paragraph wanted.
import requests
from bs4 import BeautifulSoup

url = "https://law.justia.com/constitution/us/preamble.html"
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')
for para in soup.find("p"):
    print(para)
print('-' * 30)
# Bs has select() and select_one() to choice CSS selectors
purpose = soup.select_one('div.us-constitution > p:nth-child(5)')
print(purpose.text)
Output:
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America. ------------------------------ Although the preamble is not a source of power for any department of the Federal Government,1 the Supreme Court has often referred to it as evidence of the origin, scope, and purpose of the Constitution.2 “Its true office,” wrote Joseph Story in his Commentaries, “is to expound the nature and extent and application of the powers actually conferred by the Constitution, and not substantively to create them. For example, the preamble declares one object to be, ‘provide for the common defense.’ No one can doubt that this does not enlarge the powers of Congress to pass any measures which they deem useful for the common defence. But suppose the terms of a given power admit of two constructions, the one more restrictive, the other more liberal, and each of them is consistent with the words, but is, and ought to be, governed by the intent of the power; if one could promote and the other defeat the common defence, ought not the former, upon the soundest principles of interpretation, to be adopted?”3
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#6
Database 1/4: [Karate-AP1]

CREATE TABLE `KarateAP1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `bs4_paragraph_1_text_only` text COLLATE utf8mb4_unicode_ci NULL,
  `bs4_paragraph_1_text_with_tag` text COLLATE utf8mb4_unicode_ci NULL,
  `bs4_python_entry_timestamp` timestamp COLLATE utf8_unicode_ci CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8_unicode_ci
  AUTO_INCREMENT=1;

Python Script 1/4: [Karate-AP1]


# Finalized on Python-Forum.io
# Disabled American Constitutional Pre-Law Student: BrandonKastning
# Date: 11/20/2021
# Script: Karate_A_TextOnly-P1.py
# Purpose: Building Block for BeautifulSoup4 + MariaDB 10.4.x
# Thread URL with Sources of Learning (Cited on Board)
# https://python-forum.io/thread-35593-post-150043.html
# Zechariah 7:10(KJV)
# LetHISPeopleGo

# Import Python 3.9.9 compatible OpenSource Libraries
import urllib.request 
import pymysql
import pymysql.cursors
from bs4 import BeautifulSoup
  
# Website HTTP URL to Grab Data From
url = "https://law.justia.com/constitution/us/preamble.html" 

# Assign a Python Variable to urllib.request URL to work with
html = urllib.request.urlopen(url)
  
# Assign a Python Variable to BeautifulSoup4's HTML Parser
htmlParse = BeautifulSoup(html, 'html.parser')
  
# Select HTML Element for Data Parsing
htmlParse.find_all("p")[0].get_text()

# Connect to MariaDB 10.4.x with a Database selected using pymysql
connection = pymysql.connect(host='localhost',
                 user='brandon',
                 password='__password',
                 db='Battle_Python1',
                 charset='utf8mb4',
                 cursorclass=pymysql.cursors.DictCursor)

# Assign a Variable to BeautifulSoup4 Parsing using soup.find_all("") function 
# which is telling BeautifulSoup4 to find all <p> tags
# and store paragraph #1 signified as zero [0]  and then 
# strips the tags using soup.find_all("p").get_text() 
# leaving us only the text to pass to MariaDB for storage
bs4_paragraph_1_text_only = htmlParse.find_all("p")[0].get_text()

try: 
    with connection.cursor() as cursor: 
            sql = "INSERT INTO `KarateAP1` (`bs4_paragraph_1_text_only`) VALUES (%s)" 
            cursor.execute(sql, (bs4_paragraph_1_text_only)) 
    connection.commit() 
finally: 
    connection.close() 

# Checking Code / Error Free
print ("The code is Error free to this line!")
Run Successfully 1/4 - KarateAP1:

brandon@FireDragon:~/Python/01_Karate/final$ python Karate-A-TextOnly-P1.py
The code is Error free to this line!
brandon@FireDragon:~/Python/01_Karate/final$
Screenshot Evidence 1/4 - KarateAP1:

[Image: 1-2021-11-21-16-53-06.png]
image upload

[Image: 2-2021-11-21-16-53-51.png]
upload pic

[Image: 3-2021-11-21-16-54-11.png]

[Image: 4-2021-11-21-16-56-09.png]
image uploader

This concludes Karate A - Paragraph 1 Journey here on python-forum.io (Thank you everyone!)
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#7
Database 2/4: [Karate-AP2]

CREATE TABLE `KarateAP2` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `bs4_paragraph_2_text_only` text COLLATE utf8mb4_unicode_ci NULL,
  `bs4_paragraph_2_text_with_tag` text COLLATE utf8mb4_unicode_ci NULL,
  `bs4_python_entry_timestamp` timestamp COLLATE utf8_unicode_ci CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8_unicode_ci
  AUTO_INCREMENT=1;
Python Script 2/4: [Karate-AP2]

# Finalized on Python-Forum.io
# Disabled American Constitutional Pre-Law Student: BrandonKastning
# Date: 11/20/2021
# Script: Karate_A_TextOnly_Paragraph_1.py
# Purpose: Building Block for BeautifulSoup4 + MariaDB 10.4.x
# Thread URL with Sources of Learning (Cited on Board)
# https://python-forum.io/thread-35593-post-150043.html
# Zechariah 7:10(KJV)
# LetHISPeopleGo

# Import Python 3.9.9 compatible OpenSource Libraries
import urllib.request 
import pymysql
import pymysql.cursors
from bs4 import BeautifulSoup
  
# Website HTTP URL to Grab Data From
url = "https://law.justia.com/constitution/us/preamble.html" 

# Assign a Python Variable to urllib.request URL to work with
html = urllib.request.urlopen(url)
  
# Assign a Python Variable to BeautifulSoup4's HTML Parser
htmlParse = BeautifulSoup(html, 'html.parser')
  
# Select HTML Element for Data Parsing
htmlParse.find_all("p")[2].get_text()

# Connect to MariaDB 10.4.x with a Database selected using pymysql
connection = pymysql.connect(host='localhost',
                 user='brandon',
                 password='password',
                 db='Battle_Python1',
                 charset='utf8mb4',
                 cursorclass=pymysql.cursors.DictCursor)

# Assign a Variable to BeautifulSoup4 Parsing using soup.find_all("") function 
# which is telling BeautifulSoup4 to find all <p> tags
# and store paragraph #1 signified as zero [0]  and then 
# strips the tags using soup.find_all("p").get_text() 
# leaving us only the text to pass to MariaDB for storage
bs4_paragraph_2_text_only = htmlParse.find_all("p")[2].get_text()

try: 
    with connection.cursor() as cursor: 
            sql = "INSERT INTO `KarateAP2` (`bs4_paragraph_2_text_only`) VALUES (%s)" 
            cursor.execute(sql, (bs4_paragraph_2_text_only)) 
    connection.commit() 
finally: 
    connection.close() 

# Checking Code / Error Free
print ("The code is Error free to this line!")
Run Successfully 2/4 - KarateAP2:

brandon@FireDragon:~/Python/01_Karate/final$ python Karate-A-TextOnly-P2.py
The code is Error free to this line!
brandon@FireDragon:~/Python/01_Karate/final$
Screenshot Evidence 2/4 - KarateAP2:


[Image: 1-2021-11-21-17-10-46.png]

[Image: 2-2021-11-21-17-11-23.png]
images for forums

[Image: 3-2021-11-21-17-11-41.png]

[Image: 4-2021-11-21-17-12-39.png]
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#8
Database 3/4: [Karate-BP1]

CREATE TABLE `KarateBP1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `bs4_paragraph_1_text_only` text COLLATE utf8mb4_unicode_ci NULL,
  `bs4_paragraph_1_text_with_tag` text COLLATE utf8mb4_unicode_ci NULL,
  `bs4_python_entry_timestamp` timestamp COLLATE utf8_unicode_ci CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8_unicode_ci
  AUTO_INCREMENT=1;
Python Script 3/4: [Karate-BP1]

# Finalized on Python-Forum.io
# Disabled American Constitutional Pre-Law Student: BrandonKastning
# Date: 11/20/2021
# Script: Karate_A_TextOnly_Paragraph_1.py
# Purpose: Building Block for BeautifulSoup4 + MariaDB 10.4.x
# Thread URL with Sources of Learning (Cited on Board)
# https://python-forum.io/thread-35593-post-150043.html
# Zechariah 7:10(KJV)
# LetHISPeopleGo

# Import Python 3.9.9 compatible OpenSource Libraries
import urllib.request 
import pymysql
import pymysql.cursors
from bs4 import BeautifulSoup
  
# Website HTTP URL to Grab Data From
url = "https://law.justia.com/constitution/us/preamble.html" 

# Assign a Python Variable to urllib.request URL to work with
html = urllib.request.urlopen(url)
  
# Assign a Python Variable to BeautifulSoup4's HTML Parser
htmlParse = BeautifulSoup(html, 'html.parser')
  
# Select HTML Element for Data Parsing
htmlParse.find_all("p")[0]

# Connect to MariaDB 10.4.x with a Database selected using pymysql
connection = pymysql.connect(host='localhost',
                 user='brandon',
                 password='__password',
                 db='Battle_Python1',
                 charset='utf8mb4',
                 cursorclass=pymysql.cursors.DictCursor)

# Assign a Variable to BeautifulSoup4 Parsing using soup.find_all("") function 
# which is telling BeautifulSoup4 to find all <p> tags
# and store paragraph #1 signified as zero [0] 
bs4_paragraph_1_text_with_tag = htmlParse.find_all("p")[0]

try: 
    with connection.cursor() as cursor: 
            sql = "INSERT INTO `KarateBP1` (`bs4_paragraph_1_text_with_tag`) VALUES (%s)" 
            cursor.execute(sql, (bs4_paragraph_1_text_with_tag)) 
    connection.commit() 
finally: 
    connection.close() 

# Checking Code / Error Free
print ("The code is Error free to this line!")
Run Successfully 3/4 - KarateBP1:

brandon@FireDragon:~/Python/01_Karate/final$ python Karate-B-TextWithTag-P1.py
The code is Error free to this line!
brandon@FireDragon:~/Python/01_Karate/final$
Screenshot Evidence 3/4 - KarateBP1:

[Image: 1-2021-11-21-17-22-11.png]

[Image: 2-2021-11-21-17-22-39.png]

[Image: 3-2021-11-21-17-22-57.png]

[Image: 4-2021-11-21-17-24-12.png]
photo upload free
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply
#9
Database 4/4: [Karate-BP2]

CREATE TABLE `KarateBP2` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `bs4_paragraph_2_text_only` text COLLATE utf8mb4_unicode_ci NULL,
  `bs4_paragraph_2_text_with_tag` text COLLATE utf8mb4_unicode_ci NULL,
  `bs4_python_entry_timestamp` timestamp COLLATE utf8_unicode_ci CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8_unicode_ci
  AUTO_INCREMENT=1;
Python Script 4/4: [Karate-BP2]

# Finalized on Python-Forum.io
# Disabled American Constitutional Pre-Law Student: BrandonKastning
# Date: 11/20/2021
# Script: Karate_A_TextOnly_Paragraph_1.py
# Purpose: Building Block for BeautifulSoup4 + MariaDB 10.4.x
# Thread URL with Sources of Learning (Cited on Board)
# https://python-forum.io/thread-35593-post-150043.html
# Zechariah 7:10(KJV)
# LetHISPeopleGo

# Import Python 3.9.9 compatible OpenSource Libraries
import urllib.request 
import pymysql
import pymysql.cursors
from bs4 import BeautifulSoup
  
# Website HTTP URL to Grab Data From
url = "https://law.justia.com/constitution/us/preamble.html" 

# Assign a Python Variable to urllib.request URL to work with
html = urllib.request.urlopen(url)
  
# Assign a Python Variable to BeautifulSoup4's HTML Parser
htmlParse = BeautifulSoup(html, 'html.parser')
  
# Select HTML Element for Data Parsing
htmlParse.find_all("p")[2]

# Connect to MariaDB 10.4.x with a Database selected using pymysql
connection = pymysql.connect(host='localhost',
                 user='brandon',
                 password='password',
                 db='Battle_Python1',
                 charset='utf8mb4',
                 cursorclass=pymysql.cursors.DictCursor)

# Assign a Variable to BeautifulSoup4 Parsing using soup.find_all("") function 
# which is telling BeautifulSoup4 to find all <p> tags
# and store paragraph #2 signified as two [2] 
bs4_paragraph_2_text_with_tag = htmlParse.find_all("p")[2]

try: 
    with connection.cursor() as cursor: 
            sql = "INSERT INTO `KarateBP2` (`bs4_paragraph_2_text_with_tag`) VALUES (%s)" 
            cursor.execute(sql, (bs4_paragraph_2_text_with_tag)) 
    connection.commit() 
finally: 
    connection.close() 

# Checking Code / Error Free
print ("The code is Error free to this line!")
Run Successfully 4/4 - KarateBP2:

brandon@FireDragon:~/Python/01_Karate/final$ python Karate-B-TextWithTag-P2.py
The code is Error free to this line!
brandon@FireDragon:~/Python/01_Karate/final$
Screenshot Evidence 4/4 - KarateBP2:


[Image: 1-2021-11-21-17-29-58.png]

[Image: 2-2021-11-21-17-30-24.png]

[Image: 3-2021-11-21-17-30-43.png]

[Image: 4-2021-11-21-17-35-59.png]

That was fun! Let's keep going! Until my next Thread! :) Thank you everyone!
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)

“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)

#LetHISPeopleGo

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to scrape data from HTML with no identifiers pythonpaul32 2 1,650 Dec-02-2023, 03:42 AM
Last Post: pythonpaul32
  How can I web scrape the "alt" attribute from a "img" tag with Python? cisky 1 5,184 Aug-19-2022, 04:59 AM
Last Post: snippsat
Question Python Obstacles | Jeet-Kune-Do | BS4 (Tags > MariaDB) [URL/Local HTML] BrandonKastning 0 1,687 Feb-08-2022, 08:55 PM
Last Post: BrandonKastning
Question Securing State Constitutions (USA) from University of Maryland > MariaDB .sql BrandonKastning 1 1,855 Jan-21-2022, 06:34 PM
Last Post: BrandonKastning
Exclamation Debian 10 Buster Environment - Python 3.x (MariaDB 10.4.21) | Working Connector? BrandonKastning 9 5,286 Jan-04-2022, 08:27 PM
Last Post: BrandonKastning
  Python Obstacles | Krav Maga | Wiki Scraped Content [Column Copy] BrandonKastning 4 2,768 Jan-03-2022, 06:59 AM
Last Post: BrandonKastning
  Python Obstacles | Kapap | Wiki Scraped Content [Column Nulling] BrandonKastning 2 2,180 Jan-03-2022, 04:26 AM
Last Post: BrandonKastning
Lightbulb Python Obstacles | Kung-Fu | Full File HTML Document Scrape and Store it in MariaDB BrandonKastning 5 3,553 Dec-29-2021, 02:26 AM
Last Post: BrandonKastning
  Python Obstacles | American Kenpo | Wiki Scrape URL/Table and Store it in MariaDB BrandonKastning 6 3,500 Dec-29-2021, 12:38 AM
Last Post: BrandonKastning
  HTML multi select HTML listbox with Flask/Python rfeyer 0 5,455 Mar-14-2021, 12:23 PM
Last Post: rfeyer

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020