Python Forum
Court Opinion Scraper in Python w/ BS4 (Currently exports to CSV) need help with SQL
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Court Opinion Scraper in Python w/ BS4 (Currently exports to CSV) need help with SQL
#1
# Justia Court Opinion Scraper
# Works - Scrapes opinion with HTML tags
# Works - Scrapes opinion with HTML tags stripped
# Works - Write to CSV with HTML tags
# Works - Write to CSV without HTML tags
# July, 14, 2019
# localhost and law.justia.com are interchangeable!

from urllib.request import urlopen
from bs4 import BeautifulSoup
#html = urlopen("http://localhost/cases/federal/appellate-courts/F2/1/18/1506993/")
html = urlopen("http://localhost/cases/federal/appellate-courts/F2/999/663/308588/")
#html = urlopen("http://localhost/cases/federal/appellate-courts/F3/491/1/510017/")
#html = urlopen("http://localhost/cases/federal/us/385/206/case.html") <--- DOES NOT WORK with id="opinion"
bsObj = BeautifulSoup(html.read())
#bsObj.findAll(id="opinion")
allOpinion = bsObj.findAll(id="opinion")

# Want the TITLE of the Page in a Variable

import requests
import pymysql
from bs4 import BeautifulSoup

url = "http://localhost/cases/federal/appellate-courts/F2/999/663/308588/"

allTitle = bsObj.findAll({"title"})

allURL = url

#print(allOpinion[0].get_text())
# ^ Will Strip HTML tags and only store plain-text

# Column 1 [ ]
# / / of URL (third to last) (i.e /1/)
# Column 2 [ ]
# / / of URL (second to last) (i.e /18)
# Column 3 [ ]
# / / of URL (last) (i.e /1506993/)

# Column 4 [ allOpinion w/ HTML Tags ]

# Column 5 [ allOpinion w/ Stripped HTML Tags - Plaintext lump ]

# Store allOpinion to CSV File w/ Tags

db = pymysql.connect(host="localhost",
                 user="brandon",
                 password="_yLKVPSiTfEQowz_v745H5xKSUkFDUyvtyW_",
                 db="JustiaPython",
                 charset='utf8')



print(allOpinion)
print(allTitle)
print(allURL)

import csv
csvRow = [allOpinion,allTitle,allURL]
csvfile = "current_F2_opinion_with_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)
#    wr.writerow(['1'])
# ^ Works with retaining all the HTML tags; NEXT - Store allOpinion to a CSV, then MySQL.


# Loop w/ Stripping HTML Tags for allOpinion and it's CSV output

print(allOpinion[0].get_text(),url)

import csv
csvRow = [allOpinion[0].get_text(),allTitle[0].get_text(),allURL]
csvfile = "current_F2_opinion_without_tags_current.csv"
with open(csvfile, "a") as fp:
    wr = csv.writer(fp, dialect='excel')
    wr.writerow(csvRow)
#    wr.writerow(['1'])
I am tring to figure out a few things to make this a functional script. I would like to learn how to my pymysql work correctly and be able to create a row with allTitle allURL allOpinion with MariaDB and write appended results.

I also am trying to figure out how to store certain parts of the URL as variables ; such as "999" and "663" and "308588"

My long term goal is I have a couple folders of these opinions I would like to scrape and store properly with these variables. How can I go about doing html = urlopen() from a link list rather than a single URL; I am guessing at the end of this script; I will be wanting to write a loop to go to the next court opinion.

Thanks for any help!
Reply


Messages In This Thread
Court Opinion Scraper in Python w/ BS4 (Currently exports to CSV) need help with SQL - by MidnightDreamer - Jul-15-2019, 01:50 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Web scraper tomenzo123 8 4,431 Aug-18-2023, 12:45 PM
Last Post: Gaurav_Kumar
  Web scraper not populating .txt with scraped data BlackHeart 5 1,537 Apr-03-2023, 05:12 PM
Last Post: snippsat
  Image Scraper (beautifulsoup), stopped working, need to help see why woodmister 9 4,099 Jan-12-2021, 04:10 PM
Last Post: woodmister
  Python using BS scraper paulfearn100 1 2,564 Feb-07-2020, 10:22 PM
Last Post: snippsat
  web scraper using pathlib Larz60+ 1 3,220 Oct-16-2017, 05:27 PM
Last Post: Larz60+
  Need alittle hlpl with an image scraper. Blue Dog 8 7,749 Dec-24-2016, 08:09 PM
Last Post: Blue Dog
  Made a very simple email grabber(scraper) Blue Dog 4 6,904 Dec-13-2016, 06:25 AM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020