Python Forum
SQLAlchemy Dynamic model generation and Database load directly from CSV
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
SQLAlchemy Dynamic model generation and Database load directly from CSV
#1
Create an Airport Database using SQLAlchemy, with Dynamic Model Generation, Directly from CSV data

EDIT Nov 22, 2021 - Found some errors due to format changes. all code has been replaced.

Overview

I wanted to create a package that could dynamically create an SQLAlchemy Database Model from CSV files. This is best done if the files contained headers. But, I thought, even without headers, If I could find a data dictionary and relate the contents to the available files, it would be sufficient enough to build a model. Once the Model was available, creating a database, and populating it with the available data would be a snap. I was successful in doing all of this using the OurAirports.com download page. As it turned out, OurAirports.com, was perfect, and all of the data files have headers.

This tutorial will describe the process in great detail, making reproduction of the methods used available for use with other datasets.

The steps needed are:

  1. Download all csv files, updating only those that are older than number of hours specified by the user (defaults to 24)
  2. Determine data types and maximum column sizes for each column in each file.
  3. Create an SQLAlchemy Data Model (a python file) based on above information.
  4. Using the model that's created, create a new database (currently only SQLite).
  5. Populate the database with the downloaded data.
  6. When finished, you will have a very useful database for use if you are an aircraft pilot, or like to track aircraft in flight,
    or just have a general interest in the subject.

Although written for sqlite, the same model can be used for other DBMS with a one line code change (in most instances).
I will be add code necessary for other DBMS including PostgreSQL, and perhaps MySQL at a future date.

The system I used to create the tutorial:
  • Linux Mint 20 Ulyana
  • SQLAlchemy 1.4.17
  • Python 3.9.5

Although I have not tested on other OS, code should work on MS Windows and OS X with little or no modification, also other versions of python 3 should be OK.

Code includes downloader that scans the host download site, and includes all CSV files.


-- CREDITS --

OurAirports.com gets credit for the wonderful set of data that they supply to the public, available here

Visit their website, and see what they have done with this data, best described by themselves:
Quote:OurAirports is a free site where visitors can explore the world's airports, read other people's comments, and leave their own. The help pages have information to get you started.

The site is dedicated to both passengers and pilots. You can create a map of the airports you've visited and share that map with friends. You can find the closest airports to you, and discover the ones that you haven't visited yet.

Behind the fun and features, OurAirports exists primarily as a public good. When Australia forced the US government to shut down public access to its Digital Aeronautical Flight Information File (DAFIF) service in 2006, there was no longer a good source of global aviation data. OurAirports started in 2007 primarily to fill that gap: we encourage members to create and maintain data records for airports around the world, and they manage over 40,000 of them. Many web sites, smartphone apps, and other services rely on OurAirport's data, which is all in the Public Domain (no permission required).

Also take a look at the other folks that have contributed many hours to the OurAirports here: here.

-- MIT LICENSE --

Copyright (c) <2021> <Larz60+>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Part 1 - Setup

Please execute the following steps in order presented.
This will set up everything that is required to successfully complete the tutorial:

  1. create a new directory named OurAirports
  2. run following commands:
    1. Open a terminal window
    2. run command cd OurAirports
    3. Disconnect from internet
    4. run command python -m venv venv
    5. run command . ./venv/bin/activate
      if entered correctly you should see (venv) before prompt
    6. Save the following text to a file named requirements.txt
      requirements.txt
      Output:
      beautifulsoup4==4.9.3 certifi==2020.12.5 chardet==4.0.0 flightradar24==0.3.1 greenlet==1.0.0 html5lib==1.1 html5print==0.1.2 idna==2.10 lxml==4.6.3 numpy==1.20.3 pandas==1.2.4 ply==3.4 python-dateutil==2.8.1 pytz==2021.1 requests==2.25.1 six==1.16.0 slimit==0.8.1 soupsieve==2.2.1 SQLAlchemy==1.4.17 tinycss2==1.1.0 urllib3==1.26.4 webencodings==0.5.1
    7. Activate internet connection
    8. run command pip install --upgrade pip
    9. run command pip install -r requirements.txt
    10. run command pip list
      compare installed results to requirements.txt
    11. Add the following module __init__.py to the OurAirports directory
      __init__.py
          __init__.py
          src
              __init__.py
              BuildDbModel.py
              CreateDict.py
              DbModel.py
              FindColumnMinMaxAndType.py
              FlightPaths.py
              LoadDatabase.py
              OurAirports.py
              UpdateDataFiles.py
              # License.txt
         
    12. run command mkdir src
    13. run command cd src
    14. Add an empty __init__.py file to OurAirports/src
      Linux: run command touch __init__.py
      Windows: run command fsutil file create new __init__.py 0
      OS X: run command echo $null >> __init__.py


Part 2 - Define Path Structure

Add the following script to the src directory, name this FlightPaths.py.

This script contains posix paths to data files, databases, URLs, And all major files used by the project.
A directory will be created for all paths defined within, if, and only if, one doesn't already exist.
This module is imported by every other script in the project, thus assuring modules all use same paths.
Because of the common use amoungst scripts, a new path can be added at any time, simply by adding an entry to FlightPaths.py

FlightPaths.py
import os
from pathlib import Path


class FlightPaths:
    def __init__(self, depth=0):
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        dir_depth = abs(depth)

        HomePath = Path(".")

        while dir_depth:
            HomePath = HomePath / ".."
            dir_depth -= 1

        rootpath = HomePath / ".."

        self.datapath = rootpath / "data"
        self.datapath.mkdir(exist_ok=True)

        self.csvpath = self.datapath / "csv"
        self.csvpath.mkdir(exist_ok=True)

        self.ourairport_datafiles = self.csvpath / "OurAirports"
        self.ourairport_datafiles.mkdir(exist_ok=True)

        self.ourairport_staging = self.ourairport_datafiles / 'Staging'
        self.ourairport_staging.mkdir(exist_ok=True)

        self.databasepath = self.datapath / "database"
        self.databasepath.mkdir(exist_ok=True)

        self.docpath = rootpath / "docs"
        self.docpath.mkdir(exist_ok=True)

        self.jsonpath = self.datapath / "json"
        self.jsonpath.mkdir(exist_ok=True)

        self.htmlpath = self.datapath / "html"
        self.htmlpath.mkdir(exist_ok=True)

        self.prettypath = self.datapath / "pretty"
        self.prettypath.mkdir(exist_ok=True)

        self.skeletons = self.datapath / 'skeletons'
        self.skeletons.mkdir(exist_ok=True)

        self.srcpath = rootpath / 'src'
        self.srcpath.mkdir(exist_ok=True)

        self.tmppath = self.datapath / "tmp"
        self.tmppath.mkdir(exist_ok=True)

        # Database
        self.FlightAidsDb = self.databasepath / "FlightAids.db"

        # files
        self.filesizes = self.jsonpath / 'FileSizes.json'
        self.modelfile = self.srcpath / "DbModel.py"
        self.model_prefix = self.skeletons / "ModelPrefix.txt"
        self.model_table_skeleton = self.skeletons / "TableSkeleton.txt"
        self.model_class_skeleton = self.skeletons / "ModeClassAndCommandLineCapability.txt"

        # Up to date open source data
        self.our_airports_urls = {
            "mainpage": "https://ourairports.com/",
            "downloadsbase": "https://ourairports.com/data/",
            "contributors": "https://ourairports.com/stats/contributors.html",
        }


if __name__ == "__main__":
    FlightPaths()
when ready:

  1. From the terminal window (cmd window) already open run the following
    If terminal window was accidently closed:
    1. Open a new terminal window
    2. cd to OurAirports directory
    3. run command . ./venv/bin/activate
  2. cd to OurAirports directory
  3. run command python src/FlightPaths.py
[/list]

this will add all directories required by the project after running, the project structure should now look like:

├── OurAirports
│ ├── data
│ │ ├── csv
│ │ │ └── OurAirports
│ │ │ └── Staging
│ │ ├── database
│ │ ├── html
│ │ ├── json
│ │ ├── pretty
│ │ ├── skeletons
│ │ └── tmp
│ ├── docs
│ ├── requirements.txt
│ ├── src
│ │ └── FlightPaths.py
│ └── utils
│ ├── geckodriver
│ └── geckodriver-v0.29.1-linux64.tar.gz
└── tree.txt


Part 3 - Add Skeleton Files Required To Create Database Model

Add the following files to path: OurAirports/data/skeletons name each as indicated.

  1. run command cd data/skeletons.
  2. copy following to ModelPrefix.txt
    ModelPrefix.txt
    Output:
    # This is a software generated script. Do not make changes to This # file as they will vanish. Rather make changes to BuildDbModel.py # from FlightPaths import FlightPaths from datetime import datetime from sqlalchemy import Column, String, Integer, Date, \ create_engine, DateTime, ForeignKey from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import relationship, backref fpath = FlightPaths() engine = create_engine(f"sqlite:///{fpath.FlightAidsDb}") Base = declarative_base()
  3. copy the fllowing to TableSkeleton.txt
    TableSkeleton.txt
    Output:
    class tablename_placeholder(Base): __tablename__ = 'tablename_placeholder'
  4. copy the following to ModeClassAndCommandLineCapability.txt
    ModeClassAndCommandLineCapability.txt
    Output:
    class DbModel: def __init__(self): Base.metadata.create_all(engine) if __name__ == '__main__': DbModel()
  5. run command cd ../../src

Part 4 - Main Dispatch Module, OurAirports.py

What this module does:

  1. Import FlightPaths.py (this is the first module loaded in step 2, and contains paths to every data directory, URL, datafile, and database.
    This is imported by every module from here to end of project.
    If new paths, URLs, etc. are added to project, add them here.
  2. The remainder of the script simply dispatches all of the modules in the project.
    It can be run at any time, or could be put as a process in a scheduler.

Add following to the OurAirports/src directory

OurAirports.py
import importlib
from FlightPaths import FlightPaths
from FindColumnMinMaxAndType import FindColumnMinMaxAndType
from UpdateDataFiles import UpdateDataFiles
from BuildDbModel import BuildDbModel


class OurAirports:
    def __init__(self):
        self.fpath = FlightPaths()
        self.update_data = UpdateDataFiles()
        self.find_sizes = FindColumnMinMaxAndType()
        self.build_model = BuildDbModel()

        self.filelist = []

    def dispatcher(self):
        print(f"Running UpdateDataFiles")
        self.update_data.dispatch()
        print(f"Running FindColumnMinMaxAndType")
        self.find_sizes.dispatch()
        print(f"Running BuildDbModel")
        self.build_model.dispatch()

        # LoadDatabase must be imported dynamically because DbModel which
        # is imported in LoadDatabase doesn't exist at start when run the first time.
        loaddb = importlib.import_module("LoadDatabase")
        ldbclass = getattr(loaddb, 'LoadDatabase')
        Dbloader = ldbclass()
        Dbloader.dispatch()

def main():
    oap = OurAirports()
    oap.dispatcher()


if __name__ == '__main__':
    main()
Part 5 - Update Datafiles UpdateDataFiles.py(CSV)

What this module does:

  1. Import FlightPaths.py (this is the first module loaded in step 2, and contains paths to every data directory, URL, datafile, and database.
    This is imported by every module from here to end of project.
    If new paths, URLs, etc. are added to project, add them here.
  2. Extract list of files available for download at ourairports.com/data
  3. Check local files timestamp against available file timestamp.
    If age greater than backuphours (supplied as argument to UpdateDataFiles.__init__, defaults to 24 hours), mark file for update and
    Add to self.download_list if update needed
  4. Download each file using requests to buffer. If download successful (status_code is 200) move to staging area, which is located at:
    OurAirports/data/csv/OurAirport/Staging
  5. Once all files have been successfully downloaded to staging, they will be moved to the OurAirports/data/csv/OurAirport directory.
    Other classes in the project will use the files here for procesing.

UpdateDataFiles.py
from FlightPaths import FlightPaths
from CreateDict import CreateDict
from bs4 import BeautifulSoup
import requests
import datetime
import shutil
import time
import sys


class UpdateDataFiles:
    def __init__(self, backuphours=24):
        self.fpath = FlightPaths()
        self.cd = CreateDict()
        self.downloads = {}

        self.backuphours = backuphours
        self.debug = False

    def dispatch(self):
        savepath = self.fpath.ourairport_staging
        urlbase = self.fpath.our_airports_urls["downloadsbase"]
        if self.get_links(urlbase, savepath):
            self.get_data_files(savepath)

    def get_links(self, url, savepath):
        page = None
        todays_date = datetime.date.today()
        savepage = self.fpath.htmlpath / 'DownloadPage.html'

        if not savepage.exists() or (savepage.exists() and self.determine_file_age(savepage) > self.backuphours):
            response = requests.get(url)
            if response.status_code == 200:
                page = response.content
                with savepage.open('wb') as fp:
                    fp.write(page)                
            else:
                print(f"Unable to connect")
                sys.exit(-1)
                return False
        else:
            with savepage.open('rb') as fp:
                page = fp.read()

        # css_selector: #files
        soup = BeautifulSoup(page, 'lxml')
        dwnld_area = soup.find('section', {'id': "files"})
        flinks = dwnld_area.find_all('dt')

        for flink in flinks:
            filename = self.fpath.ourairport_datafiles / flink.a.text.strip()
            self.nnode = self.cd.add_node(self.downloads, filename.name)
            self.cd.add_cell(self.nnode, 'filename', filename)
            self.cd.add_cell(self.nnode, 'url', flink.a.get('href'))
        # self.cd.display_dict(self.downloads)
        return True
        
    def get_data_files(self, savepath):
        for fname, entries in self.downloads.items():
            filename = entries['filename']
            url = entries['url']
            # print(f"savepath: {savepath}, type savepath: {type(savepath)}, filename: {filename}")

            dest = savepath / filename.name
            if not filename.exists() or (filename.exists and self.determine_file_age(filename) > self.backuphours):
                print(f"downloading {filename.name}")
                response = requests.get(url)
                if response.status_code == 200:
                    with dest.open('w') as fp:
                        fp.write(response.text)
                else:
                    print(f"Unbale to download file {dest.name}")
        
            stagefiles = [fn for fn in savepath.iterdir() if fn.is_file() and fn.suffix == '.csv']
            for file in stagefiles:
                    dest = self.fpath.ourairport_datafiles / file.name
                    shutil.move(file, dest)


    def determine_file_age(self, filename):
        last_mod_time = filename.stat().st_mtime
        localtime = time.time()
        fileage = float(localtime - last_mod_time) / 3600.0
        return fileage


def main():
    udf = UpdateDataFiles()
    udf.dispatch()


if __name__ == '__main__':
    main()
Part 6 - Determine Datatypes and Maximum Size for Each Column of Each CSV File.

What this module does:

  1. Import FlightPaths.py (this is the first module loaded in step 2, and contains paths to every data directory, URL, datafile, and database.
    This is imported by every module from here to end of project.
    If new paths, URLs, etc. are added to project, add them here.
  2. Unconditionally opens each CSV file in the OurAirports/data/csv/ directory then for each file:
    1. Determine database column type (Integer or String)
    2. Determine column minimum and maximum size.
  3. This data is sorted in a dictionary which has the following structure:
    Output:
    column_sizes = { "tablename": { "ColumnName": { "minsize": size, "maxsize": size, "dtype": "datatype" # "Integer" or "String" }, ... }, ... }
  4. The dictionary is written to a JSON file here: OurAirports/data/json/FileSizes.json


FindColumnMinMaxAndType.py
from FlightPaths import FlightPaths
from CreateDict import CreateDict
import csv
import sys
import json


class FindColumnMinMaxAndType:
    def __init__(self):
        self.fpath = FlightPaths()
        self.cd = CreateDict()
        self.column_sizes = {}

    def dispatch(self):
        filelist = self.get_filelist()
        self.get_file_sizes(filelist)
    
    def get_file_sizes(self, filelist):        
        cd = self.cd

        for filename in filelist:
            csize = self.column_sizes
            header_needed = True
            with filename.open() as fp:
                fnode = cd.add_node(csize, filename.name)
                crdr = csv.reader(fp)
                for row in crdr:
                    if header_needed:
                        header = row
                        for name in header:
                            colnode = cd.add_node(fnode, name)
                            cd.add_cell(colnode, "minsize", 99999)
                            cd.add_cell(colnode, "maxsize", 0)
                            cd.add_cell(colnode, "dtype", "Integer")
                        header_needed = False
                    else:
                        for n, column in enumerate(row):
                            cptr = csize[filename.name][header[n]]
                            col_length  = len(column)

                            if cptr["minsize"] > col_length:
                                cptr["minsize"] = col_length
                            if cptr["maxsize"] < col_length:
                                cptr["maxsize"] = col_length
                            if not column.isnumeric():
                                cptr["dtype"] = 'String'

            with self.fpath.filesizes.open('w') as fp:
                json.dump(self.column_sizes, fp)

    def get_filelist(self):
        filepath = self.fpath.ourairport_datafiles

        filelist = [fn for fn in filepath.iterdir() if fn.is_file()]
        if len(filelist) == 0:
            print(f"\n*** Error ***" \
                f"\nThere are no data files in download directory.\n" \
                f"This script is meant to be run from WorldFlightInformation.py\n" \
                f"If run standalone, you must first run UpdateDataFiles.py.\n")
            sys.exit(-1)

        return filelist


def main():
    fcmm = FindColumnMinMaxAndType()
    fcmm.dispatch()


if __name__ == '__main__':
    main()
Part 7 - Build SQLAlchemy Model. BuildDbModel.py

What this module does:

  1. Import FlightPaths.py (this is the first module loaded in step 2, and contains paths to every data directory, URL, datafile, and database.
    This is imported by every module from here to end of project.
    If new paths, URLs, etc. are added to project, add them here.
  2. Using the json file created in the previous program, which is referenced by fpath.filesizes and located at:
    OurAirports/data/json/FileSizes.json, and the skeleton files created earlier in the tutorial, Create a SQLAlchemy Model named DbModel.py in the src directory.
  3. The database (fpath.FlightAidsDb -- named FlightAids.db will be created in the next module when DbModel is imported.
  4. It's important to note that changes should never be made to the Model itself, as it is recreated every time data is updated.
  5. The same holds true for the database itself. (I will probably chamge this before I add this code to GitHub).

BuildDbModel.py
from FlightPaths import FlightPaths
from copy import deepcopy
import json
import sys


class BuildDbModel:
    '''
    Creates an SQLAlchemy Database Model from
    information gathered from running FindColumnMinMax.py.
    
    The preamble, table colums,  __init__ structure and repr
    function will be syntaticly correct.

    A stab at column sizes will be made by adding a 50% buffer to
    calculated max column sizes.
    This may need some manual inspection and adjustment as the AI is not
    strong here.    
    '''
    def __init__(self):
        self.fpath = FlightPaths()

        self.FlightAidsDb = self.fpath.databasepath / "FlightAids.db"

        self.ModelFile = self.fpath.srcpath / "DbModel.py"
        self.ModelPrefix = self.fpath.model_prefix
        self.TableDef = self.fpath.model_table_skeleton

        self.filesizes = {}

    def dispatch(self):
        self.load_file_sizes()
        self.write_prefix()
        self.add_tables_to_model()


    def add_tables_to_model(self):
        with self.ModelFile.open('a') as mf:
            # Read the table template
            with  self.TableDef.open() as td:
                table_template = td.read()

            for tablename, columns in self.filesizes.items():
                # hack to allow filenames with hyphens which SQLAlchemy 
                # model doesn't like
                tablename = tablename.replace('-', '_')
                rtablename = tablename[:-4]
                # print(f"\nrtablename: {rtablename}\n")

                this_table_template = deepcopy(table_template)
                this_table_template = this_table_template.replace('tablename_placeholder', rtablename)

                InitText = f"    def __init__(self, "
                reprText = f'    def __repr__(self):\n        return f"<{rtablename} '

                for column_name, attributes in columns.items():
                    InitText = f"{InitText} {column_name}, "
                    reprText = f"{reprText}, {column_name}"
                    csize = (attributes['maxsize'])
                    if csize:
                        csize = int(csize + csize * .5)
                    if attributes['dtype'] == "Integer":
                        this_table_template = this_table_template + \
                            f"\n    {column_name} = Column(Integer)"
                    else:
                        this_table_template = this_table_template + \
                            f"\n    {column_name} = Column(String({csize}))"
                    if column_name == 'id':
                        this_table_template = this_table_template[:-1] + f", primary_key=True)\n" 

                mf.write(f"{this_table_template}\n")

                # finish __init__ definition
                InitText = f"\n{InitText[:-1]}):\n"
                mf.write(InitText)
                for column_name, attributes in columns.items():
                    text = f"        self.{column_name} = {column_name}\n"
                    mf.write(text)
                mf.write(f"\n")

                # finish repr definition
                reprText = f'\n{reprText}>"\n'
                mf.write(reprText)

            with self.fpath.model_class_skeleton.open() as mcs:
                mf.write(f"\n")
                mf.write(mcs.read())
                mf.write(f"\n")

    def load_file_sizes(self):
        with self.fpath.filesizes.open() as fp:
            self.filesizes = json.load(fp)

    def write_prefix(self):
        if self.ModelFile.exists():
            self.ModelFile.unlink()
        
        with self.ModelFile.open('w') as mf, \
            self.ModelPrefix.open() as mp:
            mf.write(mp.read())
            # requires two empty lines at end
            mf.write(f"\n\n")


def main():
    bdbs = BuildDbModel()
    bdbs.dispatch()


if __name__ == '__main__':
    main()
Part 8 -Create and Load Database. LoadDatabase.py

A note about this module. At first I tried to avoid using Pandas for the reading of CSV and writing to the database. When I finally tried this approach, I found it to be blazingly fast, so used it.

What this module does:

  1. Import FlightPaths.py (this is the first module loaded in step 2, and contains paths to every data directory, URL, datafile, and database.
    This is imported by every module from here to end of project.
    If new paths, URLs, etc. are added to project, add them here.
  2. The database is created by DbModel instanciation which causes Base.metadata.create_all(engine) to run, and thus create an empty database.
  3. Get a list of all CSV files located the OurAitports/data/csv/OurAirports
  4. For each file in list generated above:
    • Open CSV file using Pandas read_csv
    • Extract database table name from stem of filename, replace '-' with '_'
    • Display user message showing tablename
    • Load database table using pandas df.to_sql, specifing tablename, engine name,
      and 'replace' command which replaces the table if it already exists.

LoadDatabase.py
from FlightPaths import FlightPaths
import pandas as pd
import DbModel
import csv


class LoadDatabase:
    def __init__(self):
        self.fpath = FlightPaths()

        self.Model = DbModel

    def dispatch(self):
        self.load_all()

    def load_all(self):
        datafiles = self.fpath.ourairport_datafiles

        filelist = [fn for fn in datafiles.iterdir() if fn.is_file() and fn.suffix == '.csv']
        for fn in filelist:
            tablename = fn.stem.replace('-', '_')
            df = pd.read_csv(fn)
            print(f"engine: {self.Model.engine}")
            df.to_sql(tablename, con=self.Model.engine, if_exists='replace')


def main():
    lDb = LoadDatabase()
    lDb.load_all()


if __name__ == '__main__':
    main()
Part 9 - Dictionary creation tool. CreateDict.py

I created this module a few years back, and use it whenever creating nested dynamic dictionaries
It makes the process simpler to keep track of. It includes a sample restuarant dictionary creation
at bottom of script that can be used for testing.

What this module does:

  1. There are 4 methods in this class:
    1. new_dict(dictname) - Creates a new dictionary instance with
      the name contained in dictname
    2. add_node(parent, nodename) - Creates a new node (nested dictionary)
      named in nodename, in parent dictionary.
    3. add_cell(nodename, cellname, value) - Creates a leaf node within node
      named in nodename, with a cell name of cellname, and value of value.
    4. display_dict(dictname) - Recursively displays a nested dictionary.

CreateDict.py:
import os

class CreateDict:
    """
    Generic Software tools used by Trailmapper.

    CreateDict.py - Contains methods to simplify node and cell creation within
                    a dictionary

    Usage: 
    
        The best way to learn what can be done is to examine the testit function
        included in this module.

        new_dict(dictname) - Creates a new dictionary instance with the name
            contained in dictname

        add_node(parent, nodename) - Creates a new node (nested dictionary)
            named in nodename, in parent dictionary.

        add_cell(nodename, cellname, value) - Creates a leaf node within node
            named in nodename, with a cell name of cellname, and value of value.

        display_dict(dictname) - Recursively displays a nested dictionary.

    Requirements:

        Trailmapper software:
            None

        Python standard library:
            os
    
    Author: Larry McCaig  -- May 2019.
    """
    def __init__(self):
        os.chdir(os.path.abspath(os.path.dirname(__file__)))

    def new_dict(self, dictname):
        setattr(self, dictname, {})

    def add_node(self, parent, nodename):
        node = parent[nodename] = {}
        return node

    def add_cell(self, nodename, cellname, value):
        cell =  nodename[cellname] = value
        return cell

    def display_dict(self, dictname, level=0):
        indent = " " * (4 * level)
        for key, value in dictname.items():
            if isinstance(value, dict):
                print(f'\n{indent}{key}')
                level += 1
                self.display_dict(value, level)
            else:
                print(f'{indent}{key}: {value}')
            if level > 0:
                level -= 1


def testit():
    # instantiate class
    cd = CreateDict()

    # create new dictionary named CityList
    cd.new_dict('CityList')

    # add node Boston
    boston = cd.add_node(cd.CityList, 'Boston')
    # add sub node Resturants
    bos_resturants = cd.add_node(boston, 'Resturants')

    # Add subnode 'Spoke Wine Bar' to parent bos_resturants
    spoke = cd.add_node(bos_resturants, 'Spoke Wine Bar')
    cd.add_cell(spoke, 'Addr1', '89 Holland St')
    cd.add_cell(spoke, 'City', 'Sommerville')
    cd.add_cell(spoke, 'Addr1', '02144')
    cd.add_cell(spoke, 'Phone', '617-718-9463')

    # Add subnode 'Highland Kitchen' to parent bos_resturants
    highland = cd.add_node(bos_resturants, 'Highland Kitchen')
    cd.add_cell(highland, 'Addr1', '150 Highland Ave')
    cd.add_cell(highland, 'City', 'Sommerville')
    cd.add_cell(highland, 'ZipCode', '02144')
    cd.add_cell(highland, 'Phone', '617-625-1131')

    # display dictionary
    print(f'\nCityList Dictionary')
    cd.display_dict(cd.CityList)
    print(f'\nraw data: {cd.CityList}')

if __name__ == '__main__':
    testit()
Part 10 - Testing -- Running The Code

Now that all of the code is in place, it's time to see if everything works.

  1. Navigate to OurAirports -- run command cd ..
  2. check directory:
    run command pwd
  3. If your virtual environment is not running (you will see a promps similar to: (venv) OurAirports$
  4. If not, run command . ./venv/bin/activate to start the virtual environment.
  5. Make sure that you have an active Internet connection
  6. run command python3 src/OurAirports.py
  7. Wait for the program to finish

If all went well, you should now see a database named FlightAids.db in the OurAirports/data/database directory.
This is a fully functional sqlite3 database, with all tables populated.

****** Important Note ******
The database is replaced each time the aboue process is run.
If the datafiles were older that the default (24 hours) the database will be populated with the new data.
Also, if OurAirports.com adds new data files, this will be recognized by the package, and will be added to
the database automatically.
Knowing what is stated above, you can export the data from the FlightAids.db as .sql files, and reload into any other database that needs the data.
Also be aware that ourairports.com updates their data in the evening You can establish times by reading https://ourairports.com/data page.

If you don't have a copy of DB Browser for SQLite:
  1. Bring up website https://sqlitebrowser.org/dl/
  2. Find the version you need for your operating system.
  3. Download and install, following specific instructions for your Operating System.

  1. Open a file browser (I use nemo on Linux mint, however, use what you have available)
  2. Navigate to OurAirports/data/database
  3. Double-click on b]FlightAids.db[/b] this should should bring up DB Browser
  4. you should see a list showing all tables and indexes
  5. you can expand each table to see the columns
  6. To see contents of a table, click on the 'Browse Data' tab, and select the table name
    that you'd ike to explore.
  7. You can Query the database using SQL, Try this example:
    1. Let's say we want a list of all airports in Italy
    2. Click on 'Execute SQL' tab.
    3. In the edit box, type:

      select * from Airports where iso_country = 'IT' and elevation_ft > 1000 order by name;

    4. Highlight the query, and press the 'Execute All' button (icon above edit window, right pointing arrow).
      You should see a list of (as of this writing) 77 airports and all of their associated data.
      Note: You can have multiple SQL statements in the edit window, just highlight the one you wish to run
      before pressing the 'Execute All' button.

Image of FlightAids.db in DB Browser. Click to expand
   


Part 11 - An SQLAlchemy query example using new database

This script will show how to use SQLAlchemy to perform the same query in ORM format from python.

What this module does:

  1. Automates the query that was run from DB Browser above
  2. converts the query results into a dictionary
  3. Saves the dictionary to a JSON file in the OurAirports/data/json path named tabulate=False
  4. provides two data display options based on wether the tabulate attribute supplied to the dispatch() method is True or False.
    1. if tabulate=True, provides a tabulated list of all data in the dictionary (and JSON file)
      partial example:
      Output:
      QueryResults row0 airports.id: 308703 airports.ident: IT-0067 airports.type: small_airport airports.name: Altiporto Chamois airports.latitude_deg: 45.833793 airports.longitude_deg: 7.618504 airports.elevation_ft: 5600.0 airports.continent: EU airports.iso_country: IT airports.iso_region: IT-23 airports.municipality: Chamois (AO) airports.scheduled_service: no airports.gps_code: None airports.iata_code: None airports.local_code: AOCHA airports.home_link: None airports.wikipedia_link: None airports.keywords: None row1 airports.id: 29426 airports.ident: LILB airports.type: small_airport airports.name: Alzate Brianza Airport airports.latitude_deg: 45.76996 airports.longitude_deg: 9.162866 airports.elevation_ft: 1260.0 airports.continent: EU airports.iso_country: IT airports.iso_region: IT-25 airports.municipality: Como airports.scheduled_service: no airports.gps_code: LILB airports.iata_code: None airports.local_code: CO03 airports.home_link: http://www.avl.it/ airports.wikipedia_link: None airports.keywords: None
    2. If tabulated is false, provides a pandas dataframe output
      partial example:
      Output:
      Id Type LocalCd GPSCd AirportName Municipality Elevation Latitude Longitude row0 308703 small_airport AOCHA None Altiporto Chamois Chamois (AO) 5600.0 45.833793 7.618504 row1 29426 small_airport CO03 LILB Alzate Brianza Airport Como 1260.0 45.76996 9.162866 row2 4349 medium_airport AO01 LIMW Aosta Airport Aosta 1791.0 45.738499 7.36872 row3 4326 small_airport VI06 LIDA Asiago Airport Vicenza 3409.0 45.886902 11.5169 row4 319842 small_airport FR09 None Aviosuperficie "Amici del Volo" Alvito (FR) 1217.0 41.676997 13.74285 row5 319459 small_airport AQCAR None Aviosuperficie "Il Pratone" Carsoli (AQ) 1893.0 42.093703 13.042332 row6 308676 small_airport VCVAR None Aviosuperficie "Marc Ingegno" Varallo (VC) 1411.0 45.778056 8.268611 row7 315890 small_airport TNTON None Aviosuperficie "Passo del Tonale" Vermiglio (BS) 6550.0 46.264722 10.599167 row8 320158 small_airport ARCVP None Aviosuperficie Aero Club Valdarno S.C.U.V. Cavriglia (AR) 1131.0 43.53084 11.473086 row9 320468 small_airport TRSVS None Aviosuperficie Alfina Torre Alfina (TR) 1811.0 42.739667 11.983333 row10 298748 small_airport VVROM None Aviosuperficie Ali Calabria Rombiolo 1870.0 38.619603 15.955968 row11 316752 small_airport VTVEJ None Aviosuperficie Alituscia Vejano (VT) 1270.0 42.229167 12.120833 row12 308692 small_airport CNPIA None Aviosuperficie Alpi Marittime Pianfei (CN) 1457.0 44.391111 7.725556 row13 319808 small_airport None None Aviosuperficie Aretusa Fly Canicattini Bagni 1012.0 37.03741 15.09536 row14 319612 small_airport SI06 None Aviosuperficie Borgo Castelvecchio Contignano (SI) 1007.0 43.0093333 11.7334667
      • The JSON file can be used for any aditional reporting.

      SimpleQuery.py
      import DbModel
      from sqlalchemy.orm import sessionmaker
      from FlightPaths import FlightPaths
      from CreateDict import CreateDict
      import pandas as pd
      import json
      import sys
      
      
      class SimpleQuery:
          def __init__(self):
              self.fpath = FlightPaths()
              self.cd = CreateDict()
              self.dmap = DbModel
      
              db = self.dmap.engine
              
              self.Session = sessionmaker(bind=db)
              self.Session.configure(bind=db)
              self.session = self.Session()
      
              self.selection_dict = {}
              self.selnode = self.cd.add_node(self.selection_dict, 'QueryResults')
      
          def dispatch(self, tablename, tabulate=False):
              savefile = self.fpath.jsonpath / f"{tablename}_query.json"
              tabmap = getattr(self.dmap, tablename)
              self.simple_query(tabmap, savefile)
              if tabulate:
                  self.cd.display_dict(self.selection_dict)
              else:
                  self.show_results()
      
          def row2dict(self, rownum, row):
              rowname = f"row{rownum}"
              rnode = self.cd.add_node(self.selnode, rowname)
              for col in row.__table__.columns:
                  item = str(getattr(row, col.name))
                  self.cd.add_cell(rnode, str(col), item)
      
          def simple_query(self, tabmap, savefile):        
              # This is an example of how easy it is to create SQLAlchemy Query
              alist = self.session.query(tabmap).filter(tabmap.iso_country == 'IT', \
                  tabmap.elevation_ft > 1000).order_by(tabmap.name)
              for n, row in enumerate(alist):
                  self.row2dict(n, row)
              
              with savefile.open('w') as fp:
                  json.dump(self.selection_dict, fp)
      
          def show_results(self):
              pd.set_option('max_row', None)
              select_data = {}
      
              # Extract data for dataframe into separate dictionary
              for key in self.selection_dict['QueryResults'].keys():
                  node = self.cd.add_node(select_data, key)
                  self.cd.add_cell(node, 'Id', self.selection_dict['QueryResults'][key]['airports.id'])
                  self.cd.add_cell(node, 'Type', self.selection_dict['QueryResults'][key]['airports.type'])
                  self.cd.add_cell(node, 'LocalCd', self.selection_dict['QueryResults'][key]['airports.local_code'])
                  self.cd.add_cell(node, 'GPSCd', self.selection_dict['QueryResults'][key]['airports.gps_code'])
                  self.cd.add_cell(node, 'AirportName', self.selection_dict['QueryResults'][key]['airports.name'])
                  self.cd.add_cell(node, 'Municipality', self.selection_dict['QueryResults'][key]['airports.municipality'])
                  self.cd.add_cell(node, 'Elevation', self.selection_dict['QueryResults'][key]['airports.elevation_ft'])
                  self.cd.add_cell(node, 'Latitude', self.selection_dict['QueryResults'][key]['airports.latitude_deg'])
                  self.cd.add_cell(node, 'Longitude', self.selection_dict['QueryResults'][key]['airports.longitude_deg'])
      
              print()
      
              df = pd.DataFrame.from_dict(select_data, orient='index')
              df.head()
              print(df)
      
      
      def main():
          ex = SimpleQuery()
          ex.dispatch('airports', tabulate=False)
      
      
      if __name__ == '__main__':
          main()
      To run above script:
      1. From OurAirports directory start a terminal session (command window)
      2. Activate virtual environment -- run command . ./venv/bin/activate
      3. Run command: python src/SimpleQuery.py
      4. If you want a tabulated list, change (in the main function)
        ex.dispatch('airports', tabulate=False)
        to:
        ex.dispatch('airports', tabulate=True)

      Appendex A - Sample Generated SQLAlchemy database model DbModel.py

      DbModel.py
      # This is a software generated script. Do not make changes to This
      # file as they will vanish. Rather make changes to BuildDbModel.py
      #
      from FlightPaths import FlightPaths
      from datetime import datetime
      from sqlalchemy import Column, String, Integer, Date, \
          create_engine, DateTime, ForeignKey
      from sqlalchemy.ext.declarative import declarative_base
      from sqlalchemy.orm import relationship, backref
      
      
      fpath = FlightPaths()
      engine = create_engine(f"sqlite:///{fpath.FlightAidsDb}")
      Base = declarative_base()
      
      
      class airport_frequencies(Base):
      
          __tablename__ = 'airport_frequencies'
      
      
          id = Column(Integer, primary_key=True)
      
          airport_ref = Column(Integer)
          airport_ident = Column(String(10))
          type = Column(String(24))
          description = Column(String(96))
          frequency_mhz = Column(String(10))
      
          def __init__(self,  id,  airport_ref,  airport_ident,  type,  description,  frequency_mhz,):
              self.id = id
              self.airport_ref = airport_ref
              self.airport_ident = airport_ident
              self.type = type
              self.description = description
              self.frequency_mhz = frequency_mhz
      
      
          def __repr__(self):
              return f"<airport_frequencies , id, airport_ref, airport_ident, type, description, frequency_mhz>"
      
      class airports(Base):
      
          __tablename__ = 'airports'
      
      
          id = Column(Integer, primary_key=True)
      
          ident = Column(String(10))
          type = Column(String(21))
          name = Column(String(120))
          latitude_deg = Column(String(36))
          longitude_deg = Column(String(31))
          elevation_ft = Column(String(7))
          continent = Column(String(3))
          iso_country = Column(String(3))
          iso_region = Column(String(10))
          municipality = Column(String(72))
          scheduled_service = Column(String(4))
          gps_code = Column(String(6))
          iata_code = Column(String(4))
          local_code = Column(String(10))
          home_link = Column(String(192))
          wikipedia_link = Column(String(192))
          keywords = Column(String(384))
      
          def __init__(self,  id,  ident,  type,  name,  latitude_deg,  longitude_deg,  elevation_ft,  continent,  iso_country,  iso_region,  municipality,  scheduled_service,  gps_code,  iata_code,  local_code,  home_link,  wikipedia_link,  keywords,):
              self.id = id
              self.ident = ident
              self.type = type
              self.name = name
              self.latitude_deg = latitude_deg
              self.longitude_deg = longitude_deg
              self.elevation_ft = elevation_ft
              self.continent = continent
              self.iso_country = iso_country
              self.iso_region = iso_region
              self.municipality = municipality
              self.scheduled_service = scheduled_service
              self.gps_code = gps_code
              self.iata_code = iata_code
              self.local_code = local_code
              self.home_link = home_link
              self.wikipedia_link = wikipedia_link
              self.keywords = keywords
      
      
          def __repr__(self):
              return f"<airports , id, ident, type, name, latitude_deg, longitude_deg, elevation_ft, continent, iso_country, iso_region, municipality, scheduled_service, gps_code, iata_code, local_code, home_link, wikipedia_link, keywords>"
      
      class countries(Base):
      
          __tablename__ = 'countries'
      
      
          id = Column(Integer, primary_key=True)
      
          code = Column(String(3))
          name = Column(String(66))
          continent = Column(String(3))
          wikipedia_link = Column(String(111))
          keywords = Column(String(76))
      
          def __init__(self,  id,  code,  name,  continent,  wikipedia_link,  keywords,):
              self.id = id
              self.code = code
              self.name = name
              self.continent = continent
              self.wikipedia_link = wikipedia_link
              self.keywords = keywords
      
      
          def __repr__(self):
              return f"<countries , id, code, name, continent, wikipedia_link, keywords>"
      
      class navaids(Base):
      
          __tablename__ = 'navaids'
      
      
          id = Column(Integer, primary_key=True)
      
          filename = Column(String(60))
          ident = Column(String(12))
          name = Column(String(46))
          type = Column(String(10))
          frequency_khz = Column(String(9))
          latitude_deg = Column(String(31))
          longitude_deg = Column(String(31))
          elevation_ft = Column(String(7))
          iso_country = Column(String(3))
          dme_frequency_khz = Column(String(9))
          dme_channel = Column(String(6))
          dme_latitude_deg = Column(String(12))
          dme_longitude_deg = Column(String(13))
          dme_elevation_ft = Column(String(6))
          slaved_variation_deg = Column(String(12))
          magnetic_variation_deg = Column(String(12))
          usageType = Column(String(12))
          power = Column(String(10))
          associated_airport = Column(String(10))
      
          def __init__(self,  id,  filename,  ident,  name,  type,  frequency_khz,  latitude_deg,  longitude_deg,  elevation_ft,  iso_country,  dme_frequency_khz,  dme_channel,  dme_latitude_deg,  dme_longitude_deg,  dme_elevation_ft,  slaved_variation_deg,  magnetic_variation_deg,  usageType,  power,  associated_airport,):
              self.id = id
              self.filename = filename
              self.ident = ident
              self.name = name
              self.type = type
              self.frequency_khz = frequency_khz
              self.latitude_deg = latitude_deg
              self.longitude_deg = longitude_deg
              self.elevation_ft = elevation_ft
              self.iso_country = iso_country
              self.dme_frequency_khz = dme_frequency_khz
              self.dme_channel = dme_channel
              self.dme_latitude_deg = dme_latitude_deg
              self.dme_longitude_deg = dme_longitude_deg
              self.dme_elevation_ft = dme_elevation_ft
              self.slaved_variation_deg = slaved_variation_deg
              self.magnetic_variation_deg = magnetic_variation_deg
              self.usageType = usageType
              self.power = power
              self.associated_airport = associated_airport
      
      
          def __repr__(self):
              return f"<navaids , id, filename, ident, name, type, frequency_khz, latitude_deg, longitude_deg, elevation_ft, iso_country, dme_frequency_khz, dme_channel, dme_latitude_deg, dme_longitude_deg, dme_elevation_ft, slaved_variation_deg, magnetic_variation_deg, usageType, power, associated_airport>"
      
      class regions(Base):
      
          __tablename__ = 'regions'
      
      
          id = Column(Integer, primary_key=True)
      
          code = Column(String(10))
          local_code = Column(String(6))
          name = Column(String(78))
          continent = Column(String(3))
          iso_country = Column(String(3))
          wikipedia_link = Column(String(126))
          keywords = Column(String(139))
      
          def __init__(self,  id,  code,  local_code,  name,  continent,  iso_country,  wikipedia_link,  keywords,):
              self.id = id
              self.code = code
              self.local_code = local_code
              self.name = name
              self.continent = continent
              self.iso_country = iso_country
              self.wikipedia_link = wikipedia_link
              self.keywords = keywords
      
      
          def __repr__(self):
              return f"<regions , id, code, local_code, name, continent, iso_country, wikipedia_link, keywords>"
      
      class runways(Base):
      
          __tablename__ = 'runways'
      
      
          id = Column(Integer, primary_key=True)
      
          airport_ref = Column(Integer)
          airport_ident = Column(String(10))
          length_ft = Column(String(9))
          width_ft = Column(String(6))
          surface = Column(String(94))
          lighted = Column(Integer)
          closed = Column(Integer)
          le_ident = Column(String(10))
          le_latitude_deg = Column(String(13))
          le_longitude_deg = Column(String(13))
          le_elevation_ft = Column(String(7))
          le_heading_degT = Column(String(9))
          le_displaced_threshold_ft = Column(String(6))
          he_ident = Column(String(10))
          he_latitude_deg = Column(String(13))
          he_longitude_deg = Column(String(13))
          he_elevation_ft = Column(String(7))
          he_heading_degT = Column(String(9))
          he_displaced_threshold_ft = Column(String(6))
      
          def __init__(self,  id,  airport_ref,  airport_ident,  length_ft,  width_ft,  surface,  lighted,  closed,  le_ident,  le_latitude_deg,  le_longitude_deg,  le_elevation_ft,  le_heading_degT,  le_displaced_threshold_ft,  he_ident,  he_latitude_deg,  he_longitude_deg,  he_elevation_ft,  he_heading_degT,  he_displaced_threshold_ft,):
              self.id = id
              self.airport_ref = airport_ref
              self.airport_ident = airport_ident
              self.length_ft = length_ft
              self.width_ft = width_ft
              self.surface = surface
              self.lighted = lighted
              self.closed = closed
              self.le_ident = le_ident
              self.le_latitude_deg = le_latitude_deg
              self.le_longitude_deg = le_longitude_deg
              self.le_elevation_ft = le_elevation_ft
              self.le_heading_degT = le_heading_degT
              self.le_displaced_threshold_ft = le_displaced_threshold_ft
              self.he_ident = he_ident
              self.he_latitude_deg = he_latitude_deg
              self.he_longitude_deg = he_longitude_deg
              self.he_elevation_ft = he_elevation_ft
              self.he_heading_degT = he_heading_degT
              self.he_displaced_threshold_ft = he_displaced_threshold_ft
      
      
          def __repr__(self):
              return f"<runways , id, airport_ref, airport_ident, length_ft, width_ft, surface, lighted, closed, le_ident, le_latitude_deg, le_longitude_deg, le_elevation_ft, le_heading_degT, le_displaced_threshold_ft, he_ident, he_latitude_deg, he_longitude_deg, he_elevation_ft, he_heading_degT, he_displaced_threshold_ft>"
      
      class DbModel:
          def __init__(self):
              Base.metadata.create_all(engine)
      
      
      if __name__ == '__main__':
          DbModel()
Gribouillis and snippsat like this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  SqlAlchemy Tutorial - Basic Model, database Creation and Data Load Larz60+ 0 8,428 Feb-02-2020, 03:32 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020