Python Forum
parsing text for common factor
Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
parsing text for common factor
#1
Im trying to alphabetize this list of addresses by city. I might just be air headed today or what. Im not sure on how to identify the city. There isnt really a consistent point in each of these examples in which you could use to identify the city out of the text. Sometimes they are proper address formats and sometimes they are not. You cant use the commas to your advantage, nor the state abbreviations as sometimes there are none. So you cant go from the right to left to find the city. The address streets sometimes can contain numerous indexes so you can go from left to right either. 


--------------------------------------------------
Sat April 29 8AM-4PM Blossburg PA Community Yard Sales
--------------------------------------------------
34 WEST WATER STREET WELLSBORO, PA.
FRIDAY AND SATURDAY APRIL 28-29 8:00 A.M.P- 3:00 P.M. 
--------------------------------------------------
728 W Broad St Horseheads
Saturday April 29, 2017 Time: 8:00AM
--------------------------------------------------
125 St Andrews Dr Horseheads, NY 14845
Saturday April 29, 2017 Time: 8:00AM
--------------------------------------------------
228 Leisure Ln Horseheads NY 14845
+Thursday April 27, 2017 - Friday April 28, 2017 Time: 8:30AM
--------------------------------------------------
26 Valley Ave Horseheads NY
+Thursday April 27, 2017 - Saturday April 29, 2017 Time: 8:00AM - 2
--------------------------------------------------
Orchard Drive Big Flats, NY 14814
Saturday April 29, 2017 Time: 8:00AM - 2:00PM
--------------------------------------------------
2083 College Ave, Elmira Heights NY (in the Gym) (high School Thomas A Edison)
Sat 8-2 (bag sale at 1) 
--------------------------------------------------
NY-427 & Palomino Manor, Elmira, NY
Saturday, April 29, 2017 ⏲8:30am
--------------------------------------------------
709 Grove Street, Elmira.
sat 4/29, 8am, 
--------------------------------------------------
602 Winsor Ave, Elmira, NY
Friday, April 28, 2017 - Saturday, April 29, 2017 ⏲9:00am
--------------------------------------------------
229 W 11th St Elmira, NY 14903
Saturday April 29, 2017 Time: 9:00AM
--------------------------------------------------
907 Grove St Elmira, NY 14901
Saturday April 29, 2017 Time: 8:00AM
--------------------------------------------------
458 West Water Elmira NY (Out back by garage)
Thurs, Fri, Sat April 27th-28th and 29th. 9 - 5 
--------------------------------------------------
1115 Abbott St Elmira, NY 14901
+Thursday April 27, 2017 - Saturday April 29, 2017 Time: 10:30AM
--------------------------------------------------
720 Kinyon St Elmira, NY 14904 (ESTATE)
+Friday April 28, 2017 - Sunday April 30, 2017
--------------------------------------------------
32 N Chemung St Waverly, NY 14892
+Thursday April 27, 2017 - Saturday April 29, 2017 Time: 10:00AM
--------------------------------------------------
313 Lincoln St Sayre, PA 18840
Saturday April 29, 2017 - Sunday April 30, 2017 Time: 10:00AM
--------------------------------------------------
164 Dryden Harford Rd Dryden, NY 13053
+Saturday April 29, 2017 Time: 11:00AM
--------------------------------------------------
1893 E Shore Dr Lansing, NY 14882
+Friday April 28, 2017 Time: 8:00AM
--------------------------------------------------
7500 Mitchellsville Hill Rd Bath, NY 14810
+Friday April 28, 2017 - Sunday April 30, 2017 Time: 9:00AM
--------------------------------------------------
157 Enfield Falls Rd Ithaca, NY 14850
+Saturday April 29, 2017 Time: 9:00AM
--------------------------------------------------
1245 NY-14 Millport, NY 14864
+Thursday April 27, 2017 - Friday April 28, 2017 Time: 9:00AM
--------------------------------------------------
65 Beckwith Rd Pine City, NY 14871
+Thursday April 27, 2017 - Saturday April 29, 2017 Time: 9:00AM - 2:00PM
--------------------------------------------------
3433 Co Rd 2 Addison, NY 14801
+Friday April 28, 2017 - Sunday April 30, 2017
--------------------------------------------------
3221 Co Rd 3 Addison, NY 14801
Monday April 24, 2017 - Thursday April 27, 2017 Time: 9:00AM - 6:00PM
--------------------------------------------------
210 Church St Breesport, NY 14816
+Friday April 28, 2017 - Saturday April 29, 2017 Time: 9:00AM
--------------------------------------------------
16 Goodrich Way Dryden, NY 13053
Friday April 28, 2017 - Saturday April 29, 2017 Time: 9:00AM
--------------------------------------------------
528 Bath St Watkins Glen, NY 14891
Saturday April 29, 2017 - Sunday April 30, 2017 Time: 8:00AM - 2:00PM
--------------------------------------------------
Grace Blvd & Keefe Blvd, Painted Post, NY
Friday, April 28, 2017 - Saturday, April 29, 2017 ⏲9:00am
--------------------------------------------------
4404 Meads Creek Rd Painted Post, NY 14870
Friday April 28, 2017 - Saturday April 29, 2017
--------------------------------------------------
918 Ithaca Rd Spencer, NY 14883
Saturday April 29, 2017 - Sunday April 30, 2017 Time: 8:00AM - 6:00PM
--------------------------------------------------
1238 Pennsylvania Ave Pine City, NY 14871
Thursday April 27, 2017 - Saturday April 29, 2017
--------------------------------------------------
Penna Ave & Country Ln, Pine City, NY
Saturday, April 29, 2017 10:00am
--------------------------------------------------
222 Washington St Corning, NY 14830
Saturday April 29, 2017 - Sunday April 30, 2017 Time: 10:00AM
--------------------------------------------------
11817 Theresa Dr, Corning, NY
Thursday, May 4, 2017 - Saturday, May 6, 2017 8:00am
--------------------------------------------------
115 Grace Blvd Painted Post, NY 14870
Friday April 28, 2017 - Saturday April 29, 2017 Time: 9:00AM
--------------------------------------------------
192 Park Station Rd Erin, NY 14838
+Friday April 28, 2017 - Saturday April 29, 2017
--------------------------------------------------
151 Enfield Falls Rd Ithaca, NY 14850
Saturday April 29, 2017 Time: 9:00AM
--------------------------------------------------
9911 Church Creek Rd, Lindley, NY
Thursday, May 4, 2017 - Saturday, May 6, 2017 7:00am to 4:00pm
--------------------------------------------------
1810 River Rd, Lindley
Friday, May 5, 2017 - Saturday, May 6, 2017 ⏲7:00am to 3:00pm
--------------------------------------------------
21 mill st in candor NY
Fri and sat 9am-?
--------------------------------------------------
210 W Honeoye St. Shinglehouse PA
fri, sat 9-4
--------------------------------------------------
28395 rt 220 in Milan pa
All week long 9-? April 22 to May 1st.
--------------------------------------------------
Recommended Tutorials:
Reply
#2
Is there a reason the data format is so inconsistent? Can we possibly fix this issue at the source by making people input things in a proper manner or are you stuck with it?
Also, just spitballing here but what about using something smarter than us. Fairly sure the google maps api will make short work of any of those and then you can grab the city from that. I don't know the specifics of doing that but certainly sounds doable.
Reply
#3
(Apr-28-2017, 01:06 AM)Mekire Wrote: Can we possibly fix this issue at the source by making people input things in a proper manner or are you stuck with it?
stuck to it. I grab posts from facebook and whatever they input is there. So it literally could be anything.

Quote:Fairly sure the google maps api will make short work of any of those and then you can grab the city from that.
didnt think of that. I will have to check. Thanks.
Recommended Tutorials:
Reply
#4
You can try doing something like identifying which parts of the text are what, like you can probably get a relatively high confidence that date formats are what they are. Then you can look at the rest of the text for which you don't have a guess, like a divide-and-conquer. States have known abbreviations, and cities probably come right before them, potentially with a comma. I don't know how you want to handle highly ambiguous cases - preferring false positives, false negatives, or flagging for human intervention. I imagine this would be a lot of write-something-try-it-then-improve-it.
Reply
#5
download zip file from census.gov.
see if you can find zip in text, separate all out that don't have zips
on those that do, determine begging of city, all up to that point can be
determined to be street address.

when an entry doesn't have a zip, it looks as though the surrounding entries are from
the same city. see if a match can be made on city, if so fill inmissing information from
previous or next record.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Text parsing Arik 5 306 Mar-11-2024, 03:30 PM
Last Post: Gribouillis
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,575 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  Find factor to match test curve to golden curve SriRajesh 0 1,522 Jun-17-2021, 04:39 AM
Last Post: SriRajesh
  parsing complex text file anna 1 2,040 Apr-10-2019, 09:54 PM
Last Post: Larz60+
  Parsing file and get a specific text dds69 4 3,018 Nov-12-2018, 08:06 AM
Last Post: dds69
  Parsing Text file having repeated value key pair using python manussnair 3 3,231 Aug-04-2018, 11:48 PM
Last Post: micseydel
  Parsing and Editing a Structured Text File norsemanGrey 1 2,387 Jul-11-2018, 09:51 PM
Last Post: Larz60+
  How to removewhat is common from one text file using another text file petertyler 1 2,245 May-25-2018, 12:28 AM
Last Post: Larz60+
  How do I code this equation in python (factor ceiling(2^127-1)) Pleiades 5 4,369 Apr-23-2018, 03:01 AM
Last Post: Skaperen
  parsing text with ply (lex/yacc) tool bb8 2 3,179 Feb-25-2018, 06:24 AM
Last Post: bb8

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020