
How to create Conditionals for XPATH/BS4 tag scrapes?
I am working on my program which is parsing json files; then performs a remote url request and a lxml xpath scrape; stores to python variables and then injects the python variable payload to MariaDB using PyMySQL. I am working with a dataset that either has 5 Columns (no pdf links x 2 on page) vs. 7 Columns (with pdf links x 2 on page). How do I make it conditional? If XPATH error -> use this code instead.
Here are the code blocks I am working with right now:
Any pointers would be greatly appreciated. Thank you everyone for this Python Forum! :)
# How to make Conditional? If XPATH exists -> use block of code for payload here...
# vs. If XPATH doesn't exist -> use block of code for payload here...
# ... using same db & table
# XPATH Scraping pdf URL (not on all urls; hit / miss)
##### JURISDICTION: U.S. FEDERAL SUPREME COURT OF THE UNITED STATES #####
(5 Columns) w/o pdf storage & pdf gov
##### JURISDICTION: U.S. FEDERAL SUPREME COURT OF THE UNITED STATES #####
(7 Columns) w/ pdf storage & pdf gov
Best Regards,
Brandon Kastning
I am working on my program which is parsing json files; then performs a remote url request and a lxml xpath scrape; stores to python variables and then injects the python variable payload to MariaDB using PyMySQL. I am working with a dataset that either has 5 Columns (no pdf links x 2 on page) vs. 7 Columns (with pdf links x 2 on page). How do I make it conditional? If XPATH error -> use this code instead.
Here are the code blocks I am working with right now:
Any pointers would be greatly appreciated. Thank you everyone for this Python Forum! :)
# How to make Conditional? If XPATH exists -> use block of code for payload here...
# vs. If XPATH doesn't exist -> use block of code for payload here...
# ... using same db & table
# XPATH Scraping pdf URL (not on all urls; hit / miss)
https://www.courtlistener.com/opinion/4631414/mcdonough-v-smith/ (has it) https://www.courtlistener.com/opinion/141474/urban-v-hurley/ (does not)# Assign Table4 Column 6 / 12 - MariaDB Column Name: courtlistener_pdf_opinion_url_storage
print("Dragon Breath [F.03] - [Table 4/6] - Table4_RemoteServer - [CL Jurisdiction Dataset] - Now Assigning Table4 Column Python Variable 6 out of 12...") pvar_dom_xpath_courtlistener_pdf_opinion_url_storage = dom.xpath('/html/body/div[1]/div[1]/article/div[2]/ul/li[1]/a/@href') print(dom.xpath('/html/body/div[1]/div[1]/article/div[2]/ul/li[1]/a/@href'))# MariaDB Python Variable Payload
print("Dragon Breath [F.03] - [Table 4/6] - Table4_RemoteServer - [CL Jurisdiction Dataset] - Now Injecting XPATH Python Variable Payload to MariaDB...") import pymysql import pymysql.cursors connection = pymysql.connect(host='localhost', user="brandon", passwd="password", db="EXODUS_CL_DragonBreath_F03_ICEDRAGON3" ) print("PyMySQL Connected Successfully!")# MariaDB Python Variable Payload Part 2 [5 Columns] (Xpath for pdf storage & pdf gov doesn't exist):
##### JURISDICTION: U.S. FEDERAL SUPREME COURT OF THE UNITED STATES #####
(5 Columns) w/o pdf storage & pdf gov
with connection: with connection.cursor() as cursor: sql = "INSERT INTO `Current_JSON_Courtlistener_Dataset_Exodus_Table4_RemoteServer` (`courtlistener_case_name`, `courtlistener_jurisdiction`, `courtlistener_filed`, `courtlistener_precedential_status`, `courtlistener_docket_number`) VALUES (%s, %s, %s, %s, %s)" cursor.execute(sql, (bs4_xpath_courtlistener_case_name, bs4_xpath_courtlistener_jurisdiction, bs4_xpath_courtlistener_filed, bs4_xpath_courtlistener_precedential_status, bs4_xpath_courtlistener_docket_number)) connection.commit()
##### JURISDICTION: U.S. FEDERAL SUPREME COURT OF THE UNITED STATES #####
(7 Columns) w/ pdf storage & pdf gov
with connection: with connection.cursor() as cursor: sql = "INSERT INTO `Current_JSON_Courtlistener_Dataset_Exodus_Table4_RemoteServer` (`courtlistener_case_name`, `courtlistener_jurisdiction`, `courtlistener_filed`, `courtlistener_precedential_status`, `courtlistener_docket_number`, `courtlistener_pdf_opinion_url_storage`, `courtlistener_pdf_opinion_url_gov`) VALUES (%s, %s, %s, %s, %s, %s, %s)" cursor.execute(sql, (bs4_xpath_courtlistener_case_name, bs4_xpath_courtlistener_jurisdiction, bs4_xpath_courtlistener_filed, bs4_xpath_courtlistener_precedential_status, bs4_xpath_courtlistener_docket_number, bs4_xpath_courtlistener_pdf_opinion_url_storage, bs4_xpath_courtlistener_pdf_opinion_url_gov)) connection.commit()Thank you in advance for any pointers!
Best Regards,
Brandon Kastning
“And one of the elders saith unto me, Weep not: behold, the Lion of the tribe of Juda, the Root of David, hath prevailed to open the book,...” - Revelation 5:5 (KJV)
“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)
#LetHISPeopleGo
“And oppress not the widow, nor the fatherless, the stranger, nor the poor; and ...” - Zechariah 7:10 (KJV)
#LetHISPeopleGo