Bottom Page

Thread Rating:
  • 3 Vote(s) - 2.67 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How to parse the data in python
#1
Hi ,
I have very subjected data in one column . here goes the sample data
Output:
PURCHASE AUTHORIZED ON 09/28 UA UNION SQUARE 14 NEW YORK NY S388272071655085 CARD 0057 PURCHASE AUTHORIZED ON 09/28 7-ELEVEN 18594 ARVADA CO S588272206422481 CARD 7621 PURCHASE AUTHORIZED ON 09/30 JCPENNEY 1330 CORPUS CHRIST TX S468273721740671 CARD 8143
from that string I need only specific names.

Ex:for first string I need union square
2nd string I need 7-Elven
3rd string I need JCPenncy.

In sql I can use case statement on that column
by using like operators ex: %Union Square%' ,%7-Elven%',%JCPenncy.%'

so output should be
Union square
7-Elven
JCPENNY...


Only merchants names I need and want to remove unwanted data.
Like that I have 800 rows with different merchants names

ex:as show below..
Output:
RECURRING PAYMENT AUTHORIZED ON 09/30 Netflix.com netflix.com CA S308273592350001 CARD 9296 RECURRING PAYMENT AUTHORIZED ON 09/30 GOOGLE *YouTube Pr 855-836-3987 CA S588274040027644 CARD 8785 RECURRING PAYMENT AUTHORIZED ON 09/30 GEICO *AUTO 800-841-3000 DC S388273506944188 CARD 7336 RECURRING PAYMENT AUTHORIZED ON 09/30 FARMERS INS BILLIN 877-327-6392 CA S308273417047853 CARD 1421
I want to implement in python .Please can any one help in code .thank in advance .
Quote
#2
Looks like a job for a regular expression (standard library module: re). However, it will be a tricky one with that data. For what you've provided, this will work:

import re

regex = re.compile("(?:PURCHASE|RECURRING PAYMENT) AUTHORIZED ON \d{2}/\d{2} (.+?) \d+?.+")
regex.match(data).groups() # Data is a single row
Quote
#3
Thanks for answering my question ,sorry for providing less data ... i provide few examples, my data not
starting with Pruchase and recurring payment .. its starting with some other names like ..
mcdonalds payment
ross payment
venmo ach pay
like that.. its becoming tough to me to parse.. but will do
Quote
#4
i ran ur code i am getting error ... plz can you explain wht that syntax is doing.. thanks
Quote
#5
What is the error? The regular expression is just being compared against the string. If it doesn't find a match in a row, it would return None for the match; that could cause an error.
Quote
#6
Is data originally in some sort of file (csv, txt etc)?
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Quote
#7
here goes the error:

regex.match(data).groups()
NameError: name 'data' is not defined

import re
data="PURCHASE AUTHORIZED ON 09/30 VENMO* Visa Direct NY S00588273644160550 CARD 7521"
regex = re.compile("(?:PURCHASE|RECURRING PAYMENT) AUTHORIZED ON \d{2}/\d{2} (.+?) \d+?.+")
regex.match(data).groups()

i am running this code
Quote
#8
Shouldn't merchant names be like: UA UNION SQUARE 14, 7-ELEVEN 18594, JCPENNEY 1330, Netflix.com, GEICO *AUTO, FARMERS INS BILLIN?

Or should merchants be UA, 7-ELEVEN, JCPENNEY, Netflix.com, GEICO, FARMERS (i.e first word after date)?

One way to achieve these results:

>>> lst = [
       'PURCHASE                                AUTHORIZED ON   09/28 UA UNION SQUARE 14        NEW YORK  NY  S388272071655085   CARD 0057',
       'PURCHASE                                AUTHORIZED ON   09/28 7-ELEVEN 18594            ARVADA  CO  S588272206422481   CARD 7621',
       'PURCHASE                                AUTHORIZED ON   09/30 JCPENNEY 1330             CORPUS CHRIST TX  S468273721740671   CARD 8143'
         ]
>>> splitted = [[''.join(word).strip() for word in row.split('  ') if word] for row in lst]
>>> splitted
['PURCHASE', 'AUTHORIZED ON', '09/28 UA UNION SQUARE 14', 'NEW YORK', 'NY', 'S388272071655085', 'CARD 0057'], ['PURCHASE', 'AUTHORIZED ON', '09/28 7-ELEVEN 18594', 'ARVADA', 'CO', 'S588272206422481', 'CARD 7621'], ['PURCHASE', 'AUTHORIZED ON', '09/30 JCPENNEY 1330', 'CORPUS CHRIST TX', 'S468273721740671', 'CARD 8143']]
>>> merchants = [' '.join(row[2].split()[1:]) for row in splitted]
>>> merchants
['UA UNION SQUARE 14', '7-ELEVEN 18594', 'JCPENNEY 1330']
>>> first_word = [row.split()[0] for row in merchants]
>>> first_word
['UA', '7-ELEVEN', 'JCPENNEY']


I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Quote
#9
cool brother. What about if the string start with JC-PENNY
EX: JCPenney CC JCPTELPAY 092818 1688660513N6008895915763857 - THIS IS ONE STRING ...
for every instance we cant write all functions. so my thinking is I have 800 merchants .. want to scan each row like sql like operator ..
case
when substr(upper(TRAN_STMT_DESC),1,16) IN ('ACORNS.COM ', 'ACORNS INVESTMEN') then 'ACORNS'
when substr(upper(TRAN_STMT_DESC),1,16) ='ACS EXPRESS PAY ' then 'ACS'
when substr(upper(TRAN_STMT_DESC),1,16) ='ACS' then 'ACS'
when substr(upper(TRAN_STMT_DESC),1,4) ='AD&D' then 'AD&D'
when substr(upper(TRAN_STMT_DESC),1,16) ='ADP PAYROLL FEES' then 'ADP'
when substr(upper(TRAN_STMT_DESC),1,16) ='ADP TX/FINCL SVC' then 'ADP'
when substr(upper(TRAN_STMT_DESC),1,12) ='ADT SECURITY' then 'ADT SECURITY'
when substr(upper(TRAN_STMT_DESC),1,16) ='AES ' then 'AES'
when substr(upper(TRAN_STMT_DESC),1,06) ='AFLAC ' then 'AFLAC'
when substr(upper(TRAN_STMT_DESC),1,15) ='ALLIED' then 'ALLIED'
when substr(upper(TRAN_STMT_DESC),1,8) = 'ALLSTATE' then 'ALLSTATE'
when substr(upper(TRAN_STMT_DESC),1,05) ='ALLY 'then 'ALLY Bank/Financial'
when substr(upper(TRAN_STMT_DESC),1,16) ='AM INCOME LIFE ' then 'AM INCOME LIFE'
when ( substr(upper(TRAN_STMT_DESC),1,6) = 'AMAZON' OR substr(upper(TRAN_STMT_DESC),1,15)= 'PAYMENT FOR AMZ' OR substr(upper(TRAN_STMT_DESC),9,6) = 'AMAZON') then 'AMAZON'
when substr(upper(TRAN_STMT_DESC),1,15) ='AMERICAN FAMILY' then 'AMERICAN FAMILY'
when substr(upper(TRAN_STMT_DESC),1,16) ='AMERICAN FUNDS ' then 'AMERICAN FUNDS'
when substr(upper(TRAN_STMT_DESC),1,16) ='AMERICAN GEN LIF' then 'AMERICAN GEN LIF INS'
when substr(upper(TRAN_STMT_DESC),1,16) ='AMERICAN GENERAL' then 'AMERICAN GENERAL'

can this we implement in Python..
actually i am 5 days baby to python , so asking so many questions sorry for that...

my goal: is want to bring terdata table into python .. scan a row and parse the column(remove unwanted data) and send that parsed data into a table ..
Quote
#10
'JC-PENNY' is continuous string and will not be splitted with default settings.

Regarding the implementation of code you presented in Python.... huh, how to put it. I am more in programming and not in manual labor. I let Python work for me, I don't use Python as type machine. If one needs to treat every row by separate (arbitrary) rules then... it's manual and error prone. However, it can be done (and quite easily when not considering the rules you must define for every row)

The more general question is - what you intend to do with this data at the end? Maybe there are easier ways to achieve your desired results than have specific rules for every row? You stated that you want send parsed data into table. Then what? It seems to me that it can't be the end-result.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Parse the data in XML metadata field klllmmm 2 178 2 hours ago
Last Post: klllmmm
  Parse data from xml file klllmmm 8 356 Today, 06:43 AM
Last Post: klllmmm
  How to conver txt file data to python? yunglin 1 79 Jun-13-2019, 07:50 PM
Last Post: micseydel
  how to parse binary frames? elmg 1 86 Jun-10-2019, 04:29 PM
Last Post: Larz60+
  Parse Binary Data File and convert Epoch Time drdevereaux 1 120 May-16-2019, 01:56 AM
Last Post: Larz60+
  Smartsheet api data using python into folder dspy9 0 116 Apr-04-2019, 07:30 PM
Last Post: dspy9
  Could not parse the remainder: '=' from '=' Saurabh 3 291 Feb-20-2019, 11:18 AM
Last Post: buran
  Data alignment in Python Nirmal 1 184 Feb-12-2019, 09:55 PM
Last Post: nilamo
  can't parse json file jolinchewjb 1 207 Jan-25-2019, 09:54 AM
Last Post: Larz60+
  Python Library to parse RAML 1.0 sameekb 1 278 Jan-21-2019, 04:55 PM
Last Post: Larz60+

Forum Jump:


Users browsing this thread: 1 Guest(s)