Python Forum
How to parse the data in python
Thread Rating:
  • 3 Vote(s) - 2.67 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to parse the data in python
#1
Hi ,
I have very subjected data in one column . here goes the sample data
Output:
PURCHASE AUTHORIZED ON 09/28 UA UNION SQUARE 14 NEW YORK NY S388272071655085 CARD 0057 PURCHASE AUTHORIZED ON 09/28 7-ELEVEN 18594 ARVADA CO S588272206422481 CARD 7621 PURCHASE AUTHORIZED ON 09/30 JCPENNEY 1330 CORPUS CHRIST TX S468273721740671 CARD 8143
from that string I need only specific names.

Ex:for first string I need union square
2nd string I need 7-Elven
3rd string I need JCPenncy.

In sql I can use case statement on that column
by using like operators ex: %Union Square%' ,%7-Elven%',%JCPenncy.%'

so output should be
Union square
7-Elven
JCPENNY...


Only merchants names I need and want to remove unwanted data.
Like that I have 800 rows with different merchants names

ex:as show below..
Output:
RECURRING PAYMENT AUTHORIZED ON 09/30 Netflix.com netflix.com CA S308273592350001 CARD 9296 RECURRING PAYMENT AUTHORIZED ON 09/30 GOOGLE *YouTube Pr 855-836-3987 CA S588274040027644 CARD 8785 RECURRING PAYMENT AUTHORIZED ON 09/30 GEICO *AUTO 800-841-3000 DC S388273506944188 CARD 7336 RECURRING PAYMENT AUTHORIZED ON 09/30 FARMERS INS BILLIN 877-327-6392 CA S308273417047853 CARD 1421
I want to implement in python .Please can any one help in code .thank in advance .
Reply
#2
Looks like a job for a regular expression (standard library module: re). However, it will be a tricky one with that data. For what you've provided, this will work:

import re

regex = re.compile("(?:PURCHASE|RECURRING PAYMENT) AUTHORIZED ON \d{2}/\d{2} (.+?) \d+?.+")
regex.match(data).groups() # Data is a single row
Reply
#3
Thanks for answering my question ,sorry for providing less data ... i provide few examples, my data not
starting with Pruchase and recurring payment .. its starting with some other names like ..
mcdonalds payment
ross payment
venmo ach pay
like that.. its becoming tough to me to parse.. but will do
Reply
#4
i ran ur code i am getting error ... plz can you explain wht that syntax is doing.. thanks
Reply
#5
What is the error? The regular expression is just being compared against the string. If it doesn't find a match in a row, it would return None for the match; that could cause an error.
Reply
#6
Is data originally in some sort of file (csv, txt etc)?
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#7
here goes the error:

regex.match(data).groups()
NameError: name 'data' is not defined

import re
data="PURCHASE AUTHORIZED ON 09/30 VENMO* Visa Direct NY S00588273644160550 CARD 7521"
regex = re.compile("(?:PURCHASE|RECURRING PAYMENT) AUTHORIZED ON \d{2}/\d{2} (.+?) \d+?.+")
regex.match(data).groups()

i am running this code
Reply
#8
Shouldn't merchant names be like: UA UNION SQUARE 14, 7-ELEVEN 18594, JCPENNEY 1330, Netflix.com, GEICO *AUTO, FARMERS INS BILLIN?

Or should merchants be UA, 7-ELEVEN, JCPENNEY, Netflix.com, GEICO, FARMERS (i.e first word after date)?

One way to achieve these results:

>>> lst = [
       'PURCHASE                                AUTHORIZED ON   09/28 UA UNION SQUARE 14        NEW YORK  NY  S388272071655085   CARD 0057',
       'PURCHASE                                AUTHORIZED ON   09/28 7-ELEVEN 18594            ARVADA  CO  S588272206422481   CARD 7621',
       'PURCHASE                                AUTHORIZED ON   09/30 JCPENNEY 1330             CORPUS CHRIST TX  S468273721740671   CARD 8143'
         ]
>>> splitted = [[''.join(word).strip() for word in row.split('  ') if word] for row in lst]
>>> splitted
['PURCHASE', 'AUTHORIZED ON', '09/28 UA UNION SQUARE 14', 'NEW YORK', 'NY', 'S388272071655085', 'CARD 0057'], ['PURCHASE', 'AUTHORIZED ON', '09/28 7-ELEVEN 18594', 'ARVADA', 'CO', 'S588272206422481', 'CARD 7621'], ['PURCHASE', 'AUTHORIZED ON', '09/30 JCPENNEY 1330', 'CORPUS CHRIST TX', 'S468273721740671', 'CARD 8143']]
>>> merchants = [' '.join(row[2].split()[1:]) for row in splitted]
>>> merchants
['UA UNION SQUARE 14', '7-ELEVEN 18594', 'JCPENNEY 1330']
>>> first_word = [row.split()[0] for row in merchants]
>>> first_word
['UA', '7-ELEVEN', 'JCPENNEY']
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#9
cool brother. What about if the string start with JC-PENNY
EX: JCPenney CC JCPTELPAY 092818 1688660513N6008895915763857 - THIS IS ONE STRING ...
for every instance we cant write all functions. so my thinking is I have 800 merchants .. want to scan each row like sql like operator ..
case
when substr(upper(TRAN_STMT_DESC),1,16) IN ('ACORNS.COM ', 'ACORNS INVESTMEN') then 'ACORNS'
when substr(upper(TRAN_STMT_DESC),1,16) ='ACS EXPRESS PAY ' then 'ACS'
when substr(upper(TRAN_STMT_DESC),1,16) ='ACS' then 'ACS'
when substr(upper(TRAN_STMT_DESC),1,4) ='AD&D' then 'AD&D'
when substr(upper(TRAN_STMT_DESC),1,16) ='ADP PAYROLL FEES' then 'ADP'
when substr(upper(TRAN_STMT_DESC),1,16) ='ADP TX/FINCL SVC' then 'ADP'
when substr(upper(TRAN_STMT_DESC),1,12) ='ADT SECURITY' then 'ADT SECURITY'
when substr(upper(TRAN_STMT_DESC),1,16) ='AES ' then 'AES'
when substr(upper(TRAN_STMT_DESC),1,06) ='AFLAC ' then 'AFLAC'
when substr(upper(TRAN_STMT_DESC),1,15) ='ALLIED' then 'ALLIED'
when substr(upper(TRAN_STMT_DESC),1,8) = 'ALLSTATE' then 'ALLSTATE'
when substr(upper(TRAN_STMT_DESC),1,05) ='ALLY 'then 'ALLY Bank/Financial'
when substr(upper(TRAN_STMT_DESC),1,16) ='AM INCOME LIFE ' then 'AM INCOME LIFE'
when ( substr(upper(TRAN_STMT_DESC),1,6) = 'AMAZON' OR substr(upper(TRAN_STMT_DESC),1,15)= 'PAYMENT FOR AMZ' OR substr(upper(TRAN_STMT_DESC),9,6) = 'AMAZON') then 'AMAZON'
when substr(upper(TRAN_STMT_DESC),1,15) ='AMERICAN FAMILY' then 'AMERICAN FAMILY'
when substr(upper(TRAN_STMT_DESC),1,16) ='AMERICAN FUNDS ' then 'AMERICAN FUNDS'
when substr(upper(TRAN_STMT_DESC),1,16) ='AMERICAN GEN LIF' then 'AMERICAN GEN LIF INS'
when substr(upper(TRAN_STMT_DESC),1,16) ='AMERICAN GENERAL' then 'AMERICAN GENERAL'

can this we implement in Python..
actually i am 5 days baby to python , so asking so many questions sorry for that...

my goal: is want to bring terdata table into python .. scan a row and parse the column(remove unwanted data) and send that parsed data into a table ..
Reply
#10
'JC-PENNY' is continuous string and will not be splitted with default settings.

Regarding the implementation of code you presented in Python.... huh, how to put it. I am more in programming and not in manual labor. I let Python work for me, I don't use Python as type machine. If one needs to treat every row by separate (arbitrary) rules then... it's manual and error prone. However, it can be done (and quite easily when not considering the rules you must define for every row)

The more general question is - what you intend to do with this data at the end? Maybe there are easier ways to achieve your desired results than have specific rules for every row? You stated that you want send parsed data into table. Then what? It seems to me that it can't be the end-result.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to parse and group hierarchical list items from an unindented string in Python? ann23fr 0 80 Yesterday, 01:16 PM
Last Post: ann23fr
  [split] Parse Nested JSON String in Python mmm07 4 1,413 Mar-28-2023, 06:07 PM
Last Post: snippsat
  python read iperf log and parse throughput jacklee26 4 2,648 Aug-27-2022, 07:04 AM
Last Post: Yoriz
  How to parse a live feed in Python? Daring_T 2 3,958 Jan-20-2022, 04:17 AM
Last Post: Daring_T
  how to parse data fakka 2 1,464 Sep-22-2021, 10:50 PM
Last Post: bowlofred
  Parse BytesIO data GrahamL 2 2,107 Aug-19-2020, 05:09 PM
Last Post: bowlofred
  Parse a REST API call using Python GKT 1 1,874 May-07-2020, 04:15 AM
Last Post: buran
  command line input (arg parse) and data exchange Simba 7 4,243 Dec-06-2019, 11:58 PM
Last Post: Simba
  Read csv file, parse data, and store in a dictionary markellefultz20 4 4,486 Nov-26-2019, 03:33 PM
Last Post: DeaD_EyE
  Parse data from xml file klllmmm 9 9,265 Jun-25-2019, 05:14 PM
Last Post: heiner55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020