Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 SqlAlchemy Tutorial - Basic Model, database Creation and Data Load
#1
Introduction

This tutorial is a very basic Introduction to SqlAlchemy.

Although written for sqlite, the same model can be used for other DBMS with a one line code change.
The tutorial will use the city data from the Connecticut State Library website at https://ctstatelibrary.org/cttowns/counties
to create a database containing just one table 'ConnecticutCity', which is defined as:
Output:
'ConnecticutCity' ( `id` INTEGER NOT NULL, `TownName` VARCHAR ( 40 ), `County` VARCHAR ( 40 ), `YearEstablished` VARCHAR ( 10 ), `ParentTown` VARCHAR ( 40 ), `HistoryOfIncorporation` VARCHAR ( 500 ), PRIMARY KEY(`id`) );
Will cover the following:
  • Download the data to a CSV file
  • Define a 'paths' module named SqlPaths.py which will create the necessary data directory structure
  • Define a database model (which for this tutorial will only include a single table 'ConnecticutCity')
  • Create initial database from the Model
  • Populate the table from created CSV file
  • Query 'ConnecticutCity' table

To set up the project environment, please perform the following operations (internet access is required):
  • create a new directory named SqlAlchemyExample
  • run following commands:
    • Open a terminal window
    • cd SqlAlchemyExample
    • python -m venv venv
    • . ./venv/bin/activate
      if entered correctly you should see (venv) before prompt
    • pip install --upgrade pip
    • Copy the requirements.txt data below to requirements.txt in the new SqlAlchemyExample directory
    • pip install -r requirements.txt
    • pip list
      compare installed results to requirements.txt
    • mkdir utils
    • download latest version of gekodriver from here: https://github.com/mozilla/geckodriver/releases to utils directory
      make sure you select the 64 bit version (unless your computer is 32 bit)
    • Extract the tar file and copy file named gekodriver to venv/bin
    • if you use chrome, download the latest version of chromedriver to utils from: https://chromedriver.chromium.org/downloads
      make sure version matches your chrome installation
    • Extract the driver and copy to venv/bin
    • mkdir src

Path definitions

The first module will define and create (if not already available) the data directory structure using pathlib. This script also contains filenames and URL's that are used for the data collection.

cut and paste the SqlPaths.py code below to the src directory:
from pathlib import Path
import os


class SqlPaths:
    def __init__(self):
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        homepath = Path('.')
        rootpath = homepath / '..'

        self.datapath = rootpath / 'data'
        self.datapath.mkdir(exist_ok=True)

        self.cachepath = self.datapath / 'cache'
        self.cachepath.mkdir(exist_ok=True)

        self.csvpath = self.datapath / 'csv'
        self.csvpath.mkdir(exist_ok=True)

        self.dbpath = self.datapath / 'database'
        self.dbpath.mkdir(exist_ok=True)

        self.tmppath = self.datapath / 'tmp'
        self.tmppath.mkdir(exist_ok=True)
        
        self.sample_db = self.dbpath / 'sample.db'

        self.city_list_url = 'https://ctstatelibrary.org/cttowns/counties'
        self.citycsv = self.csvpath / 'city.csv'


if __name__ == '__main__':
    SqlPaths()
when done run the code from command line: using following commands
  • cd .../SqlAlchemyExample
    replace ... with your path
  • python src/SqlPaths.py
this will create a new folder in the SqlAlchemyExample directory and add all sub directories needed for the project.
project structure should now look like:

SqlAlchemyExample/
├── data
│   ├── cache
│   ├── csv
│   ├── database
│   └── tmp
├── requirements.txt
├── src
│   └── SqlPaths.py
├── utils
│   ├── geckodriver
│   └── geckodriver-v0.26.0-linux64.tar.g
└── venv
...

Capture Connecticut Data

next in src directory create a program that will use selenium to download the data from ctstatelibrary.org and create a .csv file in the data/csv directory. This will be used to populate the database table that will be created later.

This code used selenium to scrape the data. I will not get into details as it does not apply to SqlAlchemy.
If you would like to know more about how selenium works, i refer you to the following tutorials (which were written by Snippsat, and on this forum)

web scraping part 1
web scraping part 2

Cut and paste CreateCityFile.py to src directory
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
from SqlPaths import SqlPaths
import csv
import time
import sys


class CreateCityFile:
    def __init__(self):
        self.spath = SqlPaths()
        self.get_city_info()

    def start_browser(self):
        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        self.browser = webdriver.Firefox(capabilities=caps)

    def stop_browser(self):
        self.browser.close()

    def get_city_info(self, remove_savefile=False):
        savefile = self.spath.cachepath / "citypage.html"
        if remove_savefile:
            savefile.unlink(missing_ok=True)

        if savefile.exists():
            with savefile.open() as fp:
                page = fp.read()
        else:
            self.start_browser()
            self.browser.get(self.spath.city_list_url)
            time.sleep(2)
            page = self.browser.page_source
            with savefile.open("w") as fp:
                fp.write(page)
            self.stop_browser()

        with self.spath.citycsv.open("w") as fp:
            cwrt = csv.writer(fp, delimiter="|")
            soup = BeautifulSoup(page, "lxml")
            table = soup.find(
                "table",
                {
                    "summary": "This table displays Connecticut towns and the year of their establishment."
                },
            )
            trs = table.tbody.find_all("tr")
            for n, tr in enumerate(trs):
                contigname = "Unspecified"
                if n == 0:
                    header = []
                    for td in self.get_td(tr):
                        header.append(td.p.b.i.text.strip())
                    cwrt.writerow(header)
                else:
                    row = []
                    for n1, td in enumerate(self.get_td(tr)):
                        if td.p and len(td.p):
                            value = td.p.text.strip()
                            if value == '—-':
                                value = 'Unspecified'
                        else:
                            value = 'Unspecified'
                        value = value.replace('"', "'")
                        row.append(value)
                    cwrt.writerow(row)

    def get_td(self, tr):
        tds = tr.find_all("td")
        for td in tds:
            yield td


if __name__ == "__main__":
    CreateCityFile()
once done:
  • Make sure you have internet access
  • change directory to SqlAlchemyExample
  • python src/CreateCityFile.py
  • Check /data/csv for file named: city.csv

Database Model

In SqlAlchemy ORM databases are defined in a Model.
Within the module, the database is described with a create_engine statement.
The following description is taken from sqlalchemy.org:

The Engine is the starting point for any SQLAlchemy application. It’s “home base” for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database/DBAPI combination.

(click to expand)
   

Creating an engine is just a matter of issuing a single call, create_engine()

Please refer to https://docs.sqlalchemy.org/en/13/core/engines.html for a complete description.

in this application, the engine is created below on line 17.
tables will be created using declarative_base

The new base class will be given a metaclass that produces appropriate Table objects and makes the appropriate mapper() calls based on the information provided declaratively in the class and any subclasses of the class.

For complete description see see: https://docs.sqlalchemy.org/en/13/orm/ex...e/api.html

Each table is constructed using an individual class.

This model will only have one table (with several indexes) but more tables can be added by
adding additional classes, one for each table.

Things to be included in each class:
  • the ORM definition of the table itself.
    this is done on lines 23 through 31
    • always starts with the __tablename__ definition
      for this example table will be named 'ConnecticutCity'
      line 23: __tablename__ = "ConnecticutCity"
    • Next is an id used internally by SqlAlchemy:
      id = Column(Integer, primary_key=True)
    • And then the definitions of each column within the table.
      see: https://docs.sqlalchemy.org/en/13/orm/ma...lumns.html
      NOTE Indexes are added to column definitions see script.
    • Further reading: https://docs.sqlalchemy.org/en/13/orm/ex...c_use.html

    • Next you need an __init__ method (lines 33-39) which defined each field for external interface.
    • Though not necessary, I usually include an __repr__ method for printing the class
    • any modules for common table tasks (vary by application)

    the statement on line 51: Base.metadata.create_all(engine)
    creates the new database

    Save the following code in the src directory as DatabaseModel.py:
    from sqlalchemy import (
        Column,
        String,
        Integer,
        Date,
        create_engine,
        DateTime,
        ForeignKey,
    )
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.orm import relationship, backref
    from datetime import datetime
    from SqlPaths import SqlPaths
    
    
    spath = SqlPaths()
    engine = create_engine(f"sqlite:///{spath.sample_db}")
    Base = declarative_base()
    
    
    class ConnecticutCity(Base):
    
        __tablename__ = "ConnecticutCity"
    
        id = Column(Integer, primary_key=True)
    
        TownName = Column(String(40), index=True, unique=True)
        County = Column(String(40), index=True)
        YearEstablished = Column(String(10), index=True)
        ParentTown = Column(String(40), index=True)
        HistoryOfIncorporation = Column(String(500))
    
        def __init__(self, TownName, County, YearEstablished, 
            ParentTown, HistoryOfIncorporation):
            self.TownName = TownName
            self.County = County
            self.YearEstablished = YearEstablished
            self.ParentTown = ParentTown
            self.HistoryOfIncorporation = HistoryOfIncorporation
    
    
        def __repr__(self):
            return (
                f"<ConnecticutCity {self.TownName}, {self.County}, " \
                f"{self.YearEstablished}, {self.ParentTown}, " \
                f"{self.HistoryOfIncorporation}>"
            )
    
    
    def main():
        Base.metadata.create_all(engine)
    
    
    if __name__ == "__main__":
        main()
    
    Once you have created this module, run as follows to create the database in data/database
    • cd SqlAlchemyExample
    • python src/DatabaseModel.py
    • Check data/database you should find sample.db

    The database should look like (click to expand)
       

    Populate Table

    The following code will add the data created by CreateCityFile.py (city.csv) to the database.

    The steps required are quite simple:
    • import SqlPaths
      defines where files are located
    • from sqlalchemy.orm import sessionmaker
      used to create session with database
    • import DatabaseModel
      defines the database table(s) only one table for this example but usually will have many table classes.
    • import csv
      to process city.csv
    • Define database model
    • link model to sqlalchmy engine
    • create database session
    • read csv data and add to database
    • commit session
    • close session

    Save the following code in the src directory as LoadTables.py
    from SqlPaths import SqlPaths
    from sqlalchemy.orm import sessionmaker
    import DatabaseModel
    import csv
    import sys
     
     
    class LoadTables:
        def __init__(self):
            self.spath = SqlPaths()
            self.dmap = DatabaseModel
            self.db = self.dmap.engine
             
        def load_db(self):
            db = self.db
            Session = sessionmaker(bind=db)
            Session.configure(bind=db)
            session = Session()
     
            with self.spath.citycsv.open() as fp:
                crdr = csv.reader(fp, delimiter='|')
                next(crdr) # Skip header
                for row in crdr:
                    cityrec = self.dmap.ConnecticutCity(
                        TownName = row[0],
                        County = row[1],
                        YearEstablished = row[2],
                        ParentTown = row[3],
                        HistoryOfIncorporation = row[4])
                    session.add(cityrec)
            session.commit()
            session.close()
     
     
    def main():
        lt = LoadTables()
        lt.load_db()
     
     
    if __name__ == '__main__':
        main()
    
    Once you have created this module, run as follows to add the city data to database
    • cd SqlAlchemyExample
    • python src/LoadTables.py
    • with dbbrowser or other sqlite database tool, examine sample.db
      from 'Execute SQL' tab run query: select * from ConnecticutCity;
      results should be like image below.

    Query
    The following script uses SqlAlchemy ORM for two simple queries
    The first is a simple list of all cities and towns in Connecticut
    The second extracts all cities and towns in any county that begins with 'New' (2 in Connecticut) ordered by County, TownName

    import DatabaseModel
    from sqlalchemy.orm import sessionmaker
    from SqlPaths import SqlPaths
    
    
    class QueryCityTable:
        def __init__(self):
            self.spath = SqlPaths()
            self.dmap = DatabaseModel
            db = self.dmap.engine
            self.Session = sessionmaker(bind=db)
            self.Session.configure(bind=db)
            self.session = self.Session()
            self.city_table = self.dmap.ConnecticutCity
    
        def all_towns(self):
            citieslist = self.session.query(self.city_table).all()
            print(f"\n=================================")
            print(f"List of all cities in Connecticut")
            for row in citieslist:
                print(f"{row.TownName}")
    
        def filter_by_county(self):
            ctab = self.city_table
    
            citylist = self.session.query(ctab).filter(ctab.County.like('New%')).order_by(ctab.County).all()
            print(f"\n===================================================================================")
            print(f"List of all cities in all counties where county name starts with New alphabetically")
            for element in citylist:
                print(element.County, element.TownName)
    
    def main():
        qct = QueryCityTable()
        qct.all_towns()
        qct.filter_by_county()
    
    if __name__ == '__main__':
        main()
    
    Partial results:
    Output:
    ================================= List of all cities in Connecticut Andover Ansonia Ashford Avon Barkhamsted Beacon Falls Berlin Bethany ... =================================================================================== List of all cities in all counties where county name starts with New alphabetically New Haven Ansonia New Haven Beacon Falls New Haven Bethany New Haven Branford New Haven Cheshire New Haven Derby New Haven East Haven New Haven Guilford New Haven Hamden New Haven Madison New Haven Meriden New Haven Middlebury New Haven Milford New Haven Naugatuck New Haven New Haven New Haven North Branford New Haven North Haven New Haven Orange New Haven Oxford New Haven Prospect New Haven Seymour New Haven Southbury New Haven Wallingford New Haven Waterbury New Haven West Haven New Haven Wolcott New Haven Woodbridge New London Bozrah New London Colchester New London East Lyme New London Franklin New London Griswold New London Groton New London Lebanon New London Ledyard New London Lisbon New London Lyme New London Montville New London New London New London North Stonington New London Norwich New London Old Lyme New London Preston New London Salem New London Sprague New London Stonington New London Voluntown New London Waterford
    Requirements

    Python 3.6 or newer
    Tutorial was created on Linux, not tested on other Operating Systems, but should work with possibly some minor adjustments.

    requirements.txt:
    Output:
    appdirs==1.4.3 attrs==19.3.0 beautifulsoup4==4.8.2 certifi==2019.11.28 chardet==3.0.4 Click==7.0 idna==2.8 lxml==4.5.0 pathspec==0.7.0 regex==2020.1.8 requests==2.22.0 selenium==3.141.0 soupsieve==1.9.5 SQLAlchemy==1.3.13 toml==0.10.0 typed-ast==1.4.1 urllib3==1.25.8
snippsat, Marbelous, j.crater like this post


Attached Files Thumbnail(s)
   
Quote

Top Page

Forum Jump:


Users browsing this thread: 1 Guest(s)