Python Forum
formatting string and returning as geojson
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
formatting string and returning as geojson
#1
I have records like this (I cannot alter the origional dataset). The Primary key is always 64 characters, followed by a 3 letter code, three sets of WGS84 coordinates (lat,long,altitutde), then a site name, in this case De Vrijheid.

('b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo52.7642852014.9228121084000000000000De Vrijheid')

First I try to format this:

        
        cur.execute(
           "SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'")
           
        print("Rows returned: ", cur.rowcount)
        dashmap=(66, 3, 12, 12, 12, )
        row = cur.fetchone()

        while row is not None:
            #formatting each row
            data = row
            data = str(data).encode('latin1').decode('unicode_escape').replace("', '","")
            data = re.findall(''.join('(\S{{{}}})'.format(l) for l in dashmap), data)
            data = str(data)
            data = data.replace(r'"', '|').replace(r'|(', '').replace(r'|', '').replace(r'(', '').replace(r')', '').replace(r'[', '').replace(r']', '').replace(r"'", "").replace(" ","").split(",")
This returns
['b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo', '52.764285201', '4.9228121084', '000000000000']

I get geojson this way.
geodata = geojson.Point((float(data[3]), float(data[2])))
And write to file

with open('points.json', 'a') as outfile:
                geojson.dump(geodata, outfile, indent=2)
            outfile.close()
And finish of
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)

        # execute a statement
        print('PostgreSQL database version:')
        cur.execute('SELECT version()')

        # display the PostgreSQL database server version
        db_version = cur.fetchone()
        print(db_version)

        # close the communication with the PostgreSQL
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
            print('Database connection closed.')
This gives a geojson file formatted like this, which is not geojson, it seems like it is writing one line at a time as an induvidual point not a multipoint file. I'm a noob to python, so any amendments/alternative solutions are welcome.

{
  "type": "Point",
  "coordinates": [
    45.506527899,
    12.081455199
  ]
}{
  "type": "Point",
  "coordinates": [
    25.4094136,
    11.9158859
  ]
}
Reply
#2
could you post your full code (at least part after executing the sql) as well as sample data (few points) as returned by the database.
Reply
#3
The data straight from the db
read in to data
type(data) tuple
Connecting to the PostgreSQL database...
Rows returned:  17
('9fde26933b7196e3373ffe80e45e7f36e13bf3aba6290d65ce1a2f5360cb246b', 'geo12.08145519945.50652789900000.000000test')
('e46117fb58a13de5d59a6a4a897d3cfd9ae9d4b616278b5d87e254a1c1ea5728', 'geo011.9158859025.40941360000000.000000This is a test')
('eb0cd00281521f241de36a59487b8923c6b2e62894fd411ca3c511a25a958c6d', 'geo45.38584480011.89942250000000.000000Città di Padova')
('bcbc94f00db0cfb5e01b88703ab858ed58082845a4d00309c5d7920418a6aab9', 'geo11.91584370045.409410600000000000000I love geo')
('1008ab688b0fdab30a00e5805d8c774739383288a98787e4740fbee109ea0e8b', 'geo45.40944419911.915854900000000000000')
('5e114a3fa9910b306f880ddedb3a703da9de96464d6c87cd52b5138b7cc78084', 'geo45.40320450011.922245600000000000000')
('aa4bb5d192f58a7f72151d3f47c37917651667ab573b9339f0ab35188ae2b6a3', 'geo000000000000000000000000000000000000')
('813617825c70f30c73c30c8cd20ecde37609b2c137d2b0d1a8ef06d1e33315ee', 'geo44.40320450010.922245600000000000000Hi all!!!')
('b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo52.7642852014.9228121084000000000000De Vrijheid')
('dd7903d0cea1b90fddc60ff15e3312261018b5edfcfc60dcf31e189a784dc76b', 'geo52.7346136774.9371617604000000000000Mienakker')
('d5070e592bc468140090b881e61993ba291cf3f133c3559a067e28b0c98223dc', 'geo45.70526570012.259708200000000000000')
('17cb54f68c451d19b3b7490316e9a099d290cd8ef80cb65509a0a8aa9d97e956', 'geo45.40936250011.916058399000000000000This is a test')
('71e4c6bb1858fcc65fe35a0fdd38c2a310e65edc24a3dc4c78d5ff6315eaf9ec', 'geo45.40947760011.915883800000000000000')
('569f2c49f12f26676e2995ae58631c10500f4c596d18a142f1927f74f123b6df', 'geo000000000000000000000000000000000000')
('3c05130fa815f0f8a4c0560189839327c784381ba5f104b257ce708463259ee8', 'geo45.40943009911.915891000000000000000He man')
('346de63a2a44ef0ac5793942c0c5a05764f16923ca46a4c20bf4566fd5bae3bd', 'geo45.40946890011.915854000000000000000ITEC Padova')
('4d0db19237393cbe98d972def8a47aff99942a498540594ff4abdead2105dac5', 'geo45.49301060012.330827699000000000000This is a test')
Database connection closed.
The full code
#!/usr/bin/python
import psycopg2
from config import config
import re
import geojson
import tempfile

def get_vendorfield():
    """ Connect to the PostgreSQL database server """

    global cur, data
    conn = None
    try:
        # read connection parameters
        params = config()

        # connect to the PostgreSQL server
        print('Connecting to the PostgreSQL database...')
        conn = psycopg2.connect(**params)

        # create a cursor
        cur = conn.cursor()

        cur.execute(
           "SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'")
        print("Rows returned: ", cur.rowcount)
        dashmap=(66, 3, 12, 12, 12, ) #this is used for the chunks - but the last chuck is missed because it is of unknown length
        row = cur.fetchone()

        while row is not None:
            #formatting each row - this feels over complicated

            data = row
            data = str(data).encode('latin1').decode('unicode_escape').replace("', '","")
            data = re.findall(''.join('(\S{{{}}})'.format(l) for l in dashmap), data)
            data = str(data)
            data = data.replace(r'"', '|').replace(r'|(', '').replace(r'|', '').replace(r'(', '').replace(r')', '').replace(r'[', '').replace(r']', '').replace(r"'", "").replace(" ","").split(",")

            with open('points.json', 'a') as outfile:
                geojson.dump(geodata, outfile, indent=2)
            outfile.close()

        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)

        # execute a statement
        print('PostgreSQL database version:')
        cur.execute('SELECT version()')

        # display the PostgreSQL database server version
        db_version = cur.fetchone()
        print(db_version)

        # close the communication with the PostgreSQL
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
            print('Database connection closed.')
Reply
#4
something like this (note that I'm not able to test it)

#!/usr/bin/python
import psycopg2
from config import config
import re
import geojson
import tempfile

def parse_row(row):
    # extract lat, lon
    # you may need to round the float number to
    # correct number of decimal digits

    lon = float(row[3:15])
    lat = float(row[15:27])
    return (lon, lat)

 
def get_vendorfield():
    """ Connect to the PostgreSQL database server """


    # read connection parameters
    params = config()
    # connect to the PostgreSQL server
    print('Connecting to the PostgreSQL database...')
    with psycopg2.connect(**params) as conn:
        # create a cursor
        with conn.cursor() as cur:
            try:
                cur.execute("SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'")
                print("Rows returned: ", cur.rowcount)
                #dashmap=(66, 3, 12, 12, 12, ) #this is used for the chunks - but the last chuck is missed because it is of unknown length
                geo_data = [parse_row(row[1]) for row in cur] # you may need to replace cur with cur.fetchall()
            except psycopg2.DatabaseError as db_error:
                print(error)
                geo_data = []

    print('Database connection closed.')
            
    if geo_data:
        multipoint_json = geojson.Multipoint(geo_data)
        with open('points.json', 'w') as outfile:
            geojson.dump(multipoint_json, outfile, indent=2)
My understanding was you want geojson.Multipoint file
there are number of things you need to consider, e.g. N/S E/W of the coordinates (usually it is represented with -) does it affect the length? also decimal precision - you may need to round the float number for lon/lat to correct precision)
Reply
#5
I'm trying to adapted your code a little to include properties.
I have added the altitude data to the geojson

Also there is an altitude, row[1] alt []

#!/usr/bin/python
import psycopg2
from config import config
import re
import geojson
from geojson import Feature, Point
import tempfile

def parse_row(row):
    # extract lat, lon, alt
    # you may need to round the float number to
    # correct number of decimal digits

    lon = float(row[3:15])
    lat = float(row[15:27])
    alt = float(row[28:39])
    
    return (lon, lat, alt)


def get_vendorfield():
    """ Connect to the PostgreSQL database server """

    # read connection parameters
    params = config()
    # connect to the PostgreSQL server
    print('Connecting to the PostgreSQL database...')
    with psycopg2.connect(**params) as conn:
        # create a cursor
        with conn.cursor() as cur:
            try:
                cur.execute("SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'")
                print("Rows returned: ", cur.rowcount)
                geo_data = [parse_row(row[1]) for row in cur] # you may need to replace cur with cur.fetchall()
            except psycopg2.DatabaseError as db_error:
                print(re.error)
                geo_data = []

    print('Database connection closed.')
    print(geo_data)
    if geo_data:
        feature_json = geojson.Feature(geometry = geo_data)
        #Feature(geometry=my_point, properties={"country": "Spain"})  
        #{"geometry": {"coordinates": [-3.68..., 40.4...], "type": "Point"}, "id": null, "properties": {"country": "Spain"}, "type": "Feature"}

        with open('points.json', 'w') as outfile:
            geojson.dump(feature_json, outfile, indent=2)
I'm having problems returning the geojson as a feature class, I get a TypeError: list indices must be integers or slices, not list.

Once this is achieved I plan the following:
The row[0] is a 64 character primary key, this can be the "id" of the geojson
In row[1] there are the code [0:2] and the site_name, this is the rest of the line from character 41 (to max 64) <-- I guess this might be solved through a (len(row) - 39) to establish the length of the remaining characters?

I'm thinking I will need to return the data like so:
def parse_key(row):
    #extract key
    key = str(row[0:64])
    return (key)

def parse_data(row):
    #extract three letter code i.e. geo and the site name
    code = str(row[0:2])
    site_name = str(row[3:4])
    return (code, site_name)
and them amend geo_data = [parse_row(row[1]) for row in cur] to

geo_data = [parse_row(row[1]) for row in cur]
data = [parse_data(row[1]) for row in cur]
key = [parse_key(row[0]) for row in cur]
Secondly I will try to incorporate the properties values including the key to the geojson id.
Thank you for your guidance.
Reply
#6
key is just row[0]
parse row[1] and return dict (for readability). Or you may return Feature object
For Feature class I need to do some reading, but I will do tomorrow as it is late here. What properties shall Feature has?
Do you plan on using Feature collection?
By the way, I think Point does not have elevation property. What about Multipoint? did you drop the idea?
Reply
#7
#!/usr/bin/python
import psycopg2
from config import config
import geojson


def parse_row(row):
    # extract data from row, return geojson.Feature

    key = row[0]
    lon = float(row[1][3:15])
    lat = float(row[1][15:27])
    if len(row[1]) > 39:
        elevation = float(row[1][27:39])
        descr = row[1][39:]
    else:
        elevation = float(row[1][27:])
        descr = None
    return geojson.Feature(geometry=geojson.Point((lon, lat)), id=key, properties={'elevation':elevation, 'description':descr})

 
def db_query():
    """ Connect to the PostgreSQL database server """
    # read connection parameters
    params = config()
    # connect to the PostgreSQL server
    print('Connecting to the PostgreSQL database...')
    with psycopg2.connect(**params) as conn:
        # create a cursor
        with conn.cursor() as cur:
            try:
                cur.execute("SELECT t.id, t.\"vendorField\" FROM transactions t WHERE t.\"vendorField\" LIKE 'geo%'")
                print("Rows returned: ", cur.rowcount)
                #dashmap=(66, 3, 12, 12, 12, ) #this is used for the chunks - but the last chuck is missed because it is of unknown length
                geo_data = [parse_row(row[1]) for row in cur] # you may need to replace cur with cur.fetchall()
            except psycopg2.DatabaseError as db_error:
                print(error)
                geo_data = []
            finally:
                print('Database connection closed.')
    return geo_data
            
def write_to_file(geo_data):
    with open('points.json', 'w') as outfile:
        geojson.dump(geo_data, outfile, indent=2)

def test_data():
    """Just to return test data"""

    data = (('9fde26933b7196e3373ffe80e45e7f36e13bf3aba6290d65ce1a2f5360cb246b', 'geo12.08145519945.50652789900000.000000test'),
            ('e46117fb58a13de5d59a6a4a897d3cfd9ae9d4b616278b5d87e254a1c1ea5728', 'geo011.9158859025.40941360000000.000000This is a test'),
            ('eb0cd00281521f241de36a59487b8923c6b2e62894fd411ca3c511a25a958c6d', 'geo45.38584480011.89942250000000.000000Città di Padova'),
            ('bcbc94f00db0cfb5e01b88703ab858ed58082845a4d00309c5d7920418a6aab9', 'geo11.91584370045.409410600000000000000I love geo'),
            ('1008ab688b0fdab30a00e5805d8c774739383288a98787e4740fbee109ea0e8b', 'geo45.40944419911.915854900000000000000'),
            ('5e114a3fa9910b306f880ddedb3a703da9de96464d6c87cd52b5138b7cc78084', 'geo45.40320450011.922245600000000000000'),
            ('aa4bb5d192f58a7f72151d3f47c37917651667ab573b9339f0ab35188ae2b6a3', 'geo000000000000000000000000000000000000'),
            ('813617825c70f30c73c30c8cd20ecde37609b2c137d2b0d1a8ef06d1e33315ee', 'geo44.40320450010.922245600000000000000Hi all!!!'),
            ('b30e845d00993f2258900c4052471a6947c6b3f1a375438b6f4cd206397186a5', 'geo52.7642852014.9228121084000000000000De Vrijheid'),
            ('dd7903d0cea1b90fddc60ff15e3312261018b5edfcfc60dcf31e189a784dc76b', 'geo52.7346136774.9371617604000000000000Mienakker'),
            ('d5070e592bc468140090b881e61993ba291cf3f133c3559a067e28b0c98223dc', 'geo45.70526570012.259708200000000000000'),
            ('17cb54f68c451d19b3b7490316e9a099d290cd8ef80cb65509a0a8aa9d97e956', 'geo45.40936250011.916058399000000000000This is a test'),
            ('71e4c6bb1858fcc65fe35a0fdd38c2a310e65edc24a3dc4c78d5ff6315eaf9ec', 'geo45.40947760011.915883800000000000000'),
            ('569f2c49f12f26676e2995ae58631c10500f4c596d18a142f1927f74f123b6df', 'geo000000000000000000000000000000000000'),
            ('3c05130fa815f0f8a4c0560189839327c784381ba5f104b257ce708463259ee8', 'geo45.40943009911.915891000000000000000He man'),
            ('346de63a2a44ef0ac5793942c0c5a05764f16923ca46a4c20bf4566fd5bae3bd', 'geo45.40946890011.915854000000000000000ITEC Padova'),
            ('4d0db19237393cbe98d972def8a47aff99942a498540594ff4abdead2105dac5', 'geo45.49301060012.330827699000000000000This is a test'))
    return data

if __name__ == '__main__':
    my_data = db_query()
    my_data = test_data() #just for test, comment out this line in production
    if my_data:
        my_feature_collection = geojson.FeatureCollection([parse_row(row) for row in my_data])
        write_to_file(my_feature_collection)
output with test data
Reply
#8
I would like MultiPoint but with properties, I couldn't see MultiPoint with properties, so choose feature class http://python-geojson.readthedocs.io/en/...multipoint

Altitude can be included https://bl.ocks.org/fogonwater/340afe26d...0f73eb10f2

So mainly, I want to return the data row[0] (the key as id) and add the 'geo' (the code) and the site_name as two properties, e.g code: 'geo'. I'll look into dictionaries.
Reply
#9
look into my last post, I implemented Feature and FeaturesCollection. You need to adjust parse_row function to include the elevation to Point and then code to properties. as you can see I already added elevation to properties, which is one of the approaches in your link
if you have trouble, I'll look into it tomorrow.
here is the new parse_row


def parse_row(row):
    # extract lat, lon
    # you may need to round the float number to
    # correct number of decimal digits
    key = row[0]
    code=row[1][:3]
    lon = float(row[1][3:15])
    lat = float(row[1][15:27])
    elevation = float(row[1][27:39])
    descr = row[1][39:]
    return geojson.Feature(geometry=geojson.Point((lon, lat, elevation)), id=key,
                           properties={'code':code, 'elevation':elevation, 'description':descr})
new output with test data

Reply
#10
I tested using the test_data()
As you know that works fine, but when I use db_query() instead it throws an error
ValueError: could not convert string to float

I went through the debugging (step through) and in the def parse_row(row)
key: 'g'
code: 'e'
at geo_data = [parse_row(row[1]) for row in cur]
row: <class 'tuple'>: ('9fde ... 246b', 'geo12.08 ... 45.50 ... 0.000000test')

So it looks like something is going wrong in the following, for some reason the key (row[0])is being ignored and the row[1] is behaving strangely, perhaps tuples behave differently? I've spent today looking into this, but I'm struggling to find the solution, strange that it works for the test_data() but not the database data. Should I add I'm using Python 3?

def parse_row(row):
    # extract lat, lon
    # you may need to round the float number to
    # correct number of decimal digits
    key = row[0]
    code=row[1][:3]
    lat = float(row[1][3:15])
    lon = float(row[1][15:27])
    elevation = float(row[1][27:39])
    descr = row[1][39:]
    return geojson.Feature(geometry=geojson.Point((lon, lat, elevation)), id=key,
                           properties={'code':code, 'elevation':elevation, 'description':descr})
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Formatting a date time string read from a csv file DosAtPython 5 1,296 Jun-19-2023, 02:12 PM
Last Post: DosAtPython
  String formatting (strptime) issues Henrio 2 843 Jan-06-2023, 06:57 PM
Last Post: deanhystad
  geojson to json --missing multiple row output yoshi 9 2,790 Mar-06-2022, 08:34 PM
Last Post: snippsat
  confused about string formatting barryjo 7 2,000 Mar-06-2022, 02:03 AM
Last Post: snippsat
  string formatting barryjo 7 2,073 Jan-02-2022, 02:08 AM
Last Post: snippsat
  Help with string formatting in classes brthurr 6 9,358 Dec-17-2021, 04:35 PM
Last Post: Jeff900
  Question on HTML formatting with set string in message Cknutson575 3 3,502 Mar-09-2021, 08:11 AM
Last Post: Cknutson575
  smtplib: string formatting not carrying over to email ClassicalSoul 1 2,661 Apr-22-2020, 09:58 PM
Last Post: bowlofred
  String formatting difficulties mmk1995 3 2,787 Aug-09-2019, 11:18 AM
Last Post: wavic
  string formatting Uchikago 1 1,933 Jun-28-2019, 03:28 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020