Python Forum

Full Version: formatting string and returning as geojson
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I have records like this (I cannot alter the origional dataset). The Primary key is always 64 characters, followed by a 3 letter code, three sets of WGS84 coordinates (lat,long,altitutde), then a site name, in this case De Vrijheid.

('b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo52.7642852014.9228121084000000000000De Vrijheid')

First I try to format this:

        
        cur.execute(
           "SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'")
           
        print("Rows returned: ", cur.rowcount)
        dashmap=(66, 3, 12, 12, 12, )
        row = cur.fetchone()

        while row is not None:
            #formatting each row
            data = row
            data = str(data).encode('latin1').decode('unicode_escape').replace("', '","")
            data = re.findall(''.join('(\S{{{}}})'.format(l) for l in dashmap), data)
            data = str(data)
            data = data.replace(r'"', '|').replace(r'|(', '').replace(r'|', '').replace(r'(', '').replace(r')', '').replace(r'[', '').replace(r']', '').replace(r"'", "").replace(" ","").split(",")
This returns
['b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo', '52.764285201', '4.9228121084', '000000000000']

I get geojson this way.
geodata = geojson.Point((float(data[3]), float(data[2])))
And write to file

with open('points.json', 'a') as outfile:
                geojson.dump(geodata, outfile, indent=2)
            outfile.close()
And finish of
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)

        # execute a statement
        print('PostgreSQL database version:')
        cur.execute('SELECT version()')

        # display the PostgreSQL database server version
        db_version = cur.fetchone()
        print(db_version)

        # close the communication with the PostgreSQL
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
            print('Database connection closed.')
This gives a geojson file formatted like this, which is not geojson, it seems like it is writing one line at a time as an induvidual point not a multipoint file. I'm a noob to python, so any amendments/alternative solutions are welcome.

{
  "type": "Point",
  "coordinates": [
    45.506527899,
    12.081455199
  ]
}{
  "type": "Point",
  "coordinates": [
    25.4094136,
    11.9158859
  ]
}
could you post your full code (at least part after executing the sql) as well as sample data (few points) as returned by the database.
The data straight from the db
read in to data
type(data) tuple
Connecting to the PostgreSQL database...
Rows returned:  17
('9fde26933b7196e3373ffe80e45e7f36e13bf3aba6290d65ce1a2f5360cb246b', 'geo12.08145519945.50652789900000.000000test')
('e46117fb58a13de5d59a6a4a897d3cfd9ae9d4b616278b5d87e254a1c1ea5728', 'geo011.9158859025.40941360000000.000000This is a test')
('eb0cd00281521f241de36a59487b8923c6b2e62894fd411ca3c511a25a958c6d', 'geo45.38584480011.89942250000000.000000Città di Padova')
('bcbc94f00db0cfb5e01b88703ab858ed58082845a4d00309c5d7920418a6aab9', 'geo11.91584370045.409410600000000000000I love geo')
('1008ab688b0fdab30a00e5805d8c774739383288a98787e4740fbee109ea0e8b', 'geo45.40944419911.915854900000000000000')
('5e114a3fa9910b306f880ddedb3a703da9de96464d6c87cd52b5138b7cc78084', 'geo45.40320450011.922245600000000000000')
('aa4bb5d192f58a7f72151d3f47c37917651667ab573b9339f0ab35188ae2b6a3', 'geo000000000000000000000000000000000000')
('813617825c70f30c73c30c8cd20ecde37609b2c137d2b0d1a8ef06d1e33315ee', 'geo44.40320450010.922245600000000000000Hi all!!!')
('b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo52.7642852014.9228121084000000000000De Vrijheid')
('dd7903d0cea1b90fddc60ff15e3312261018b5edfcfc60dcf31e189a784dc76b', 'geo52.7346136774.9371617604000000000000Mienakker')
('d5070e592bc468140090b881e61993ba291cf3f133c3559a067e28b0c98223dc', 'geo45.70526570012.259708200000000000000')
('17cb54f68c451d19b3b7490316e9a099d290cd8ef80cb65509a0a8aa9d97e956', 'geo45.40936250011.916058399000000000000This is a test')
('71e4c6bb1858fcc65fe35a0fdd38c2a310e65edc24a3dc4c78d5ff6315eaf9ec', 'geo45.40947760011.915883800000000000000')
('569f2c49f12f26676e2995ae58631c10500f4c596d18a142f1927f74f123b6df', 'geo000000000000000000000000000000000000')
('3c05130fa815f0f8a4c0560189839327c784381ba5f104b257ce708463259ee8', 'geo45.40943009911.915891000000000000000He man')
('346de63a2a44ef0ac5793942c0c5a05764f16923ca46a4c20bf4566fd5bae3bd', 'geo45.40946890011.915854000000000000000ITEC Padova')
('4d0db19237393cbe98d972def8a47aff99942a498540594ff4abdead2105dac5', 'geo45.49301060012.330827699000000000000This is a test')
Database connection closed.
The full code
#!/usr/bin/python
import psycopg2
from config import config
import re
import geojson
import tempfile

def get_vendorfield():
    """ Connect to the PostgreSQL database server """

    global cur, data
    conn = None
    try:
        # read connection parameters
        params = config()

        # connect to the PostgreSQL server
        print('Connecting to the PostgreSQL database...')
        conn = psycopg2.connect(**params)

        # create a cursor
        cur = conn.cursor()

        cur.execute(
           "SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'")
        print("Rows returned: ", cur.rowcount)
        dashmap=(66, 3, 12, 12, 12, ) #this is used for the chunks - but the last chuck is missed because it is of unknown length
        row = cur.fetchone()

        while row is not None:
            #formatting each row - this feels over complicated

            data = row
            data = str(data).encode('latin1').decode('unicode_escape').replace("', '","")
            data = re.findall(''.join('(\S{{{}}})'.format(l) for l in dashmap), data)
            data = str(data)
            data = data.replace(r'"', '|').replace(r'|(', '').replace(r'|', '').replace(r'(', '').replace(r')', '').replace(r'[', '').replace(r']', '').replace(r"'", "").replace(" ","").split(",")

            with open('points.json', 'a') as outfile:
                geojson.dump(geodata, outfile, indent=2)
            outfile.close()

        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)

        # execute a statement
        print('PostgreSQL database version:')
        cur.execute('SELECT version()')

        # display the PostgreSQL database server version
        db_version = cur.fetchone()
        print(db_version)

        # close the communication with the PostgreSQL
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
            print('Database connection closed.')
something like this (note that I'm not able to test it)

#!/usr/bin/python
import psycopg2
from config import config
import re
import geojson
import tempfile

def parse_row(row):
    # extract lat, lon
    # you may need to round the float number to
    # correct number of decimal digits

    lon = float(row[3:15])
    lat = float(row[15:27])
    return (lon, lat)

 
def get_vendorfield():
    """ Connect to the PostgreSQL database server """


    # read connection parameters
    params = config()
    # connect to the PostgreSQL server
    print('Connecting to the PostgreSQL database...')
    with psycopg2.connect(**params) as conn:
        # create a cursor
        with conn.cursor() as cur:
            try:
                cur.execute("SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'")
                print("Rows returned: ", cur.rowcount)
                #dashmap=(66, 3, 12, 12, 12, ) #this is used for the chunks - but the last chuck is missed because it is of unknown length
                geo_data = [parse_row(row[1]) for row in cur] # you may need to replace cur with cur.fetchall()
            except psycopg2.DatabaseError as db_error:
                print(error)
                geo_data = []

    print('Database connection closed.')
            
    if geo_data:
        multipoint_json = geojson.Multipoint(geo_data)
        with open('points.json', 'w') as outfile:
            geojson.dump(multipoint_json, outfile, indent=2)
My understanding was you want geojson.Multipoint file
there are number of things you need to consider, e.g. N/S E/W of the coordinates (usually it is represented with -) does it affect the length? also decimal precision - you may need to round the float number for lon/lat to correct precision)
I'm trying to adapted your code a little to include properties.
I have added the altitude data to the geojson

Also there is an altitude, row[1] alt []

#!/usr/bin/python
import psycopg2
from config import config
import re
import geojson
from geojson import Feature, Point
import tempfile

def parse_row(row):
    # extract lat, lon, alt
    # you may need to round the float number to
    # correct number of decimal digits

    lon = float(row[3:15])
    lat = float(row[15:27])
    alt = float(row[28:39])
    
    return (lon, lat, alt)


def get_vendorfield():
    """ Connect to the PostgreSQL database server """

    # read connection parameters
    params = config()
    # connect to the PostgreSQL server
    print('Connecting to the PostgreSQL database...')
    with psycopg2.connect(**params) as conn:
        # create a cursor
        with conn.cursor() as cur:
            try:
                cur.execute("SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'")
                print("Rows returned: ", cur.rowcount)
                geo_data = [parse_row(row[1]) for row in cur] # you may need to replace cur with cur.fetchall()
            except psycopg2.DatabaseError as db_error:
                print(re.error)
                geo_data = []

    print('Database connection closed.')
    print(geo_data)
    if geo_data:
        feature_json = geojson.Feature(geometry = geo_data)
        #Feature(geometry=my_point, properties={"country": "Spain"})  
        #{"geometry": {"coordinates": [-3.68..., 40.4...], "type": "Point"}, "id": null, "properties": {"country": "Spain"}, "type": "Feature"}

        with open('points.json', 'w') as outfile:
            geojson.dump(feature_json, outfile, indent=2)
I'm having problems returning the geojson as a feature class, I get a TypeError: list indices must be integers or slices, not list.

Once this is achieved I plan the following:
The row[0] is a 64 character primary key, this can be the "id" of the geojson
In row[1] there are the code [0:2] and the site_name, this is the rest of the line from character 41 (to max 64) <-- I guess this might be solved through a (len(row) - 39) to establish the length of the remaining characters?

I'm thinking I will need to return the data like so:
def parse_key(row):
    #extract key
    key = str(row[0:64])
    return (key)

def parse_data(row):
    #extract three letter code i.e. geo and the site name
    code = str(row[0:2])
    site_name = str(row[3:4])
    return (code, site_name)
and them amend geo_data = [parse_row(row[1]) for row in cur] to

geo_data = [parse_row(row[1]) for row in cur]
data = [parse_data(row[1]) for row in cur]
key = [parse_key(row[0]) for row in cur]
Secondly I will try to incorporate the properties values including the key to the geojson id.
Thank you for your guidance.
key is just row[0]
parse row[1] and return dict (for readability). Or you may return Feature object
For Feature class I need to do some reading, but I will do tomorrow as it is late here. What properties shall Feature has?
Do you plan on using Feature collection?
By the way, I think Point does not have elevation property. What about Multipoint? did you drop the idea?
#!/usr/bin/python
import psycopg2
from config import config
import geojson


def parse_row(row):
    # extract data from row, return geojson.Feature

    key = row[0]
    lon = float(row[1][3:15])
    lat = float(row[1][15:27])
    if len(row[1]) > 39:
        elevation = float(row[1][27:39])
        descr = row[1][39:]
    else:
        elevation = float(row[1][27:])
        descr = None
    return geojson.Feature(geometry=geojson.Point((lon, lat)), id=key, properties={'elevation':elevation, 'description':descr})

 
def db_query():
    """ Connect to the PostgreSQL database server """
    # read connection parameters
    params = config()
    # connect to the PostgreSQL server
    print('Connecting to the PostgreSQL database...')
    with psycopg2.connect(**params) as conn:
        # create a cursor
        with conn.cursor() as cur:
            try:
                cur.execute("SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'")
                print("Rows returned: ", cur.rowcount)
                #dashmap=(66, 3, 12, 12, 12, ) #this is used for the chunks - but the last chuck is missed because it is of unknown length
                geo_data = [parse_row(row[1]) for row in cur] # you may need to replace cur with cur.fetchall()
            except psycopg2.DatabaseError as db_error:
                print(error)
                geo_data = []
            finally:
                print('Database connection closed.')
    return geo_data
            
def write_to_file(geo_data):
    with open('points.json', 'w') as outfile:
        geojson.dump(geo_data, outfile, indent=2)

def test_data():
    """Just to return test data"""

    data = (('9fde26933b7196e3373ffe80e45e7f36e13bf3aba6290d65ce1a2f5360cb246b', 'geo12.08145519945.50652789900000.000000test'),
            ('e46117fb58a13de5d59a6a4a897d3cfd9ae9d4b616278b5d87e254a1c1ea5728', 'geo011.9158859025.40941360000000.000000This is a test'),
            ('eb0cd00281521f241de36a59487b8923c6b2e62894fd411ca3c511a25a958c6d', 'geo45.38584480011.89942250000000.000000Città di Padova'),
            ('bcbc94f00db0cfb5e01b88703ab858ed58082845a4d00309c5d7920418a6aab9', 'geo11.91584370045.409410600000000000000I love geo'),
            ('1008ab688b0fdab30a00e5805d8c774739383288a98787e4740fbee109ea0e8b', 'geo45.40944419911.915854900000000000000'),
            ('5e114a3fa9910b306f880ddedb3a703da9de96464d6c87cd52b5138b7cc78084', 'geo45.40320450011.922245600000000000000'),
            ('aa4bb5d192f58a7f72151d3f47c37917651667ab573b9339f0ab35188ae2b6a3', 'geo000000000000000000000000000000000000'),
            ('813617825c70f30c73c30c8cd20ecde37609b2c137d2b0d1a8ef06d1e33315ee', 'geo44.40320450010.922245600000000000000Hi all!!!'),
            ('b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo52.7642852014.9228121084000000000000De Vrijheid'),
            ('dd7903d0cea1b90fddc60ff15e3312261018b5edfcfc60dcf31e189a784dc76b', 'geo52.7346136774.9371617604000000000000Mienakker'),
            ('d5070e592bc468140090b881e61993ba291cf3f133c3559a067e28b0c98223dc', 'geo45.70526570012.259708200000000000000'),
            ('17cb54f68c451d19b3b7490316e9a099d290cd8ef80cb65509a0a8aa9d97e956', 'geo45.40936250011.916058399000000000000This is a test'),
            ('71e4c6bb1858fcc65fe35a0fdd38c2a310e65edc24a3dc4c78d5ff6315eaf9ec', 'geo45.40947760011.915883800000000000000'),
            ('569f2c49f12f26676e2995ae58631c10500f4c596d18a142f1927f74f123b6df', 'geo000000000000000000000000000000000000'),
            ('3c05130fa815f0f8a4c0560189839327c784381ba5f104b257ce708463259ee8', 'geo45.40943009911.915891000000000000000He man'),
            ('346de63a2a44ef0ac5793942c0c5a05764f16923ca46a4c20bf4566fd5bae3bd', 'geo45.40946890011.915854000000000000000ITEC Padova'),
            ('4d0db19237393cbe98d972def8a47aff99942a498540594ff4abdead2105dac5', 'geo45.49301060012.330827699000000000000This is a test'))
    return data

if __name__ == '__main__':
    my_data = db_query()
    my_data = test_data() #just for test, comment out this line in production
    if my_data:
        my_feature_collection = geojson.FeatureCollection([parse_row(row) for row in my_data])
        write_to_file(my_feature_collection)
output with test data
I would like MultiPoint but with properties, I couldn't see MultiPoint with properties, so choose feature class http://python-geojson.readthedocs.io/en/...multipoint

Altitude can be included https://bl.ocks.org/fogonwater/340afe26d...0f73eb10f2

So mainly, I want to return the data row[0] (the key as id) and add the 'geo' (the code) and the site_name as two properties, e.g code: 'geo'. I'll look into dictionaries.
look into my last post, I implemented Feature and FeaturesCollection. You need to adjust parse_row function to include the elevation to Point and then code to properties. as you can see I already added elevation to properties, which is one of the approaches in your link
if you have trouble, I'll look into it tomorrow.
here is the new parse_row


def parse_row(row):
    # extract lat, lon
    # you may need to round the float number to
    # correct number of decimal digits
    key = row[0]
    code=row[1][:3]
    lon = float(row[1][3:15])
    lat = float(row[1][15:27])
    elevation = float(row[1][27:39])
    descr = row[1][39:]
    return geojson.Feature(geometry=geojson.Point((lon, lat, elevation)), id=key,
                           properties={'code':code, 'elevation':elevation, 'description':descr})
new output with test data

I tested using the test_data()
As you know that works fine, but when I use db_query() instead it throws an error
ValueError: could not convert string to float

I went through the debugging (step through) and in the def parse_row(row)
key: 'g'
code: 'e'
at geo_data = [parse_row(row[1]) for row in cur]
row: <class 'tuple'>: ('9fde ... 246b', 'geo12.08 ... 45.50 ... 0.000000test')

So it looks like something is going wrong in the following, for some reason the key (row[0])is being ignored and the row[1] is behaving strangely, perhaps tuples behave differently? I've spent today looking into this, but I'm struggling to find the solution, strange that it works for the test_data() but not the database data. Should I add I'm using Python 3?

def parse_row(row):
    # extract lat, lon
    # you may need to round the float number to
    # correct number of decimal digits
    key = row[0]
    code=row[1][:3]
    lat = float(row[1][3:15])
    lon = float(row[1][15:27])
    elevation = float(row[1][27:39])
    descr = row[1][39:]
    return geojson.Feature(geometry=geojson.Point((lon, lat, elevation)), id=key,
                           properties={'code':code, 'elevation':elevation, 'description':descr})
Pages: 1 2