How to parse the data in python - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How to parse the data in python (/thread-15296.html) Pages:
1
2
|
How to parse the data in python - sandy - Jan-12-2019 Hi , I have very subjected data in one column . here goes the sample data from that string I need only specific names.Ex:for first string I need union square 2nd string I need 7-Elven 3rd string I need JCPenncy. In sql I can use case statement on that column by using like operators ex: %Union Square%' ,%7-Elven%',%JCPenncy.%' so output should be Union square 7-Elven JCPENNY... Only merchants names I need and want to remove unwanted data. Like that I have 800 rows with different merchants names ex:as show below.. I want to implement in python .Please can any one help in code .thank in advance .
RE: How to parse the data in python - stullis - Jan-12-2019 Looks like a job for a regular expression (standard library module: re). However, it will be a tricky one with that data. For what you've provided, this will work: import re regex = re.compile("(?:PURCHASE|RECURRING PAYMENT) AUTHORIZED ON \d{2}/\d{2} (.+?) \d+?.+") regex.match(data).groups() # Data is a single row RE: How to parse the data in python - sandy - Jan-13-2019 Thanks for answering my question ,sorry for providing less data ... i provide few examples, my data not starting with Pruchase and recurring payment .. its starting with some other names like .. mcdonalds payment ross payment venmo ach pay like that.. its becoming tough to me to parse.. but will do RE: How to parse the data in python - sandy - Jan-14-2019 i ran ur code i am getting error ... plz can you explain wht that syntax is doing.. thanks RE: How to parse the data in python - stullis - Jan-14-2019 What is the error? The regular expression is just being compared against the string. If it doesn't find a match in a row, it would return None for the match; that could cause an error. RE: How to parse the data in python - perfringo - Jan-14-2019 Is data originally in some sort of file (csv, txt etc)? RE: How to parse the data in python - sandy - Jan-14-2019 here goes the error: regex.match(data).groups() NameError: name 'data' is not defined import re data="PURCHASE AUTHORIZED ON 09/30 VENMO* Visa Direct NY S00588273644160550 CARD 7521" regex = re.compile("(?:PURCHASE|RECURRING PAYMENT) AUTHORIZED ON \d{2}/\d{2} (.+?) \d+?.+") regex.match(data).groups() i am running this code RE: How to parse the data in python - perfringo - Jan-14-2019 Shouldn't merchant names be like: UA UNION SQUARE 14, 7-ELEVEN 18594, JCPENNEY 1330, Netflix.com, GEICO *AUTO, FARMERS INS BILLIN? Or should merchants be UA, 7-ELEVEN, JCPENNEY, Netflix.com, GEICO, FARMERS (i.e first word after date)? One way to achieve these results: >>> lst = [ 'PURCHASE AUTHORIZED ON 09/28 UA UNION SQUARE 14 NEW YORK NY S388272071655085 CARD 0057', 'PURCHASE AUTHORIZED ON 09/28 7-ELEVEN 18594 ARVADA CO S588272206422481 CARD 7621', 'PURCHASE AUTHORIZED ON 09/30 JCPENNEY 1330 CORPUS CHRIST TX S468273721740671 CARD 8143' ] >>> splitted = [[''.join(word).strip() for word in row.split(' ') if word] for row in lst] >>> splitted ['PURCHASE', 'AUTHORIZED ON', '09/28 UA UNION SQUARE 14', 'NEW YORK', 'NY', 'S388272071655085', 'CARD 0057'], ['PURCHASE', 'AUTHORIZED ON', '09/28 7-ELEVEN 18594', 'ARVADA', 'CO', 'S588272206422481', 'CARD 7621'], ['PURCHASE', 'AUTHORIZED ON', '09/30 JCPENNEY 1330', 'CORPUS CHRIST TX', 'S468273721740671', 'CARD 8143']] >>> merchants = [' '.join(row[2].split()[1:]) for row in splitted] >>> merchants ['UA UNION SQUARE 14', '7-ELEVEN 18594', 'JCPENNEY 1330'] >>> first_word = [row.split()[0] for row in merchants] >>> first_word ['UA', '7-ELEVEN', 'JCPENNEY'] RE: How to parse the data in python - sandy - Jan-14-2019 cool brother. What about if the string start with JC-PENNY EX: JCPenney CC JCPTELPAY 092818 1688660513N6008895915763857 - THIS IS ONE STRING ... for every instance we cant write all functions. so my thinking is I have 800 merchants .. want to scan each row like sql like operator .. case when substr(upper(TRAN_STMT_DESC),1,16) IN ('ACORNS.COM ', 'ACORNS INVESTMEN') then 'ACORNS' when substr(upper(TRAN_STMT_DESC),1,16) ='ACS EXPRESS PAY ' then 'ACS' when substr(upper(TRAN_STMT_DESC),1,16) ='ACS' then 'ACS' when substr(upper(TRAN_STMT_DESC),1,4) ='AD&D' then 'AD&D' when substr(upper(TRAN_STMT_DESC),1,16) ='ADP PAYROLL FEES' then 'ADP' when substr(upper(TRAN_STMT_DESC),1,16) ='ADP TX/FINCL SVC' then 'ADP' when substr(upper(TRAN_STMT_DESC),1,12) ='ADT SECURITY' then 'ADT SECURITY' when substr(upper(TRAN_STMT_DESC),1,16) ='AES ' then 'AES' when substr(upper(TRAN_STMT_DESC),1,06) ='AFLAC ' then 'AFLAC' when substr(upper(TRAN_STMT_DESC),1,15) ='ALLIED' then 'ALLIED' when substr(upper(TRAN_STMT_DESC),1,8) = 'ALLSTATE' then 'ALLSTATE' when substr(upper(TRAN_STMT_DESC),1,05) ='ALLY 'then 'ALLY Bank/Financial' when substr(upper(TRAN_STMT_DESC),1,16) ='AM INCOME LIFE ' then 'AM INCOME LIFE' when ( substr(upper(TRAN_STMT_DESC),1,6) = 'AMAZON' OR substr(upper(TRAN_STMT_DESC),1,15)= 'PAYMENT FOR AMZ' OR substr(upper(TRAN_STMT_DESC),9,6) = 'AMAZON') then 'AMAZON' when substr(upper(TRAN_STMT_DESC),1,15) ='AMERICAN FAMILY' then 'AMERICAN FAMILY' when substr(upper(TRAN_STMT_DESC),1,16) ='AMERICAN FUNDS ' then 'AMERICAN FUNDS' when substr(upper(TRAN_STMT_DESC),1,16) ='AMERICAN GEN LIF' then 'AMERICAN GEN LIF INS' when substr(upper(TRAN_STMT_DESC),1,16) ='AMERICAN GENERAL' then 'AMERICAN GENERAL' can this we implement in Python.. actually i am 5 days baby to python , so asking so many questions sorry for that... my goal: is want to bring terdata table into python .. scan a row and parse the column(remove unwanted data) and send that parsed data into a table .. RE: How to parse the data in python - perfringo - Jan-15-2019 'JC-PENNY' is continuous string and will not be splitted with default settings. Regarding the implementation of code you presented in Python.... huh, how to put it. I am more in programming and not in manual labor. I let Python work for me, I don't use Python as type machine. If one needs to treat every row by separate (arbitrary) rules then... it's manual and error prone. However, it can be done (and quite easily when not considering the rules you must define for every row) The more general question is - what you intend to do with this data at the end? Maybe there are easier ways to achieve your desired results than have specific rules for every row? You stated that you want send parsed data into table. Then what? It seems to me that it can't be the end-result. |