Python Forum
df and regex, NaN and df.concat
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
df and regex, NaN and df.concat
#1
Hi Pandas experts,

I got some messy data with dates like:

04/20/2001; 04/20/11; 4/20/09; 4/3/01
Mar-20-2001; Dec 20, 2009; April 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2001; 20 April 2001; 20 Mar. 2007; 20 March, 2009
Mar 21th, 2009; Mar 21st, 2009; Feb 22nd, 2002
Feb 2009; Sep 2009; Nov 2016
6/2008; 12/2009
2003; 2012


I want to do this via regex to extract the year only but struggle to combine them into one big extract.

def dates():

    df1 = df['text'].str.extract('^[0,1]?\d{1}\/(([0-2]?\d{1})|([3][0,1]{1}))\/(([1]{1}[9]{1}[9]{1}\d{1})|([2-9]{1}\d{3}))$')
    df2 = df['text'].str.extract('(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\,*\s\s*\d{4}$|^(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\,*\s\d{4}$|^(January|February|March|April|May|June|July|August|September|October|November|December)\,*\s\d{4}$|^(january|february|march|april|may|june|july|august|september|october|november|december)\,*\s\d{4}$')
    df3 = df['text'].str.extract('^((0[1-9])|(1[0-2]))\/(\d{4})$')
    df4 = df['text'].str.extract('^(19[0-8][0-9]|199[0-9]|200[0-9]|201[0-8])$')   
    df5 = df['text'].str.extract('^(((0[1-9]|[12]\d|3[01])\/(0[13578]|1[02])\/(\d{2}))|((0[1-9]|[12]\d|30)\/(0[13456789]|1[012])\/(\d{2}))|((0[1-9]|1\d|2[0-8])\/02\/(\d{2}))|(29\/02\/((0[48]|[2468][048]|[13579][26])|(00))))$')
    
    df_all = pd.concat([df1, df2, df3, df4, df5])
    return df_all
dates()
The issue is that my data frame now has 16 columns and all are NaN.
I expected one column and..well, no NaN's :)

Does someone know what I am doing wrong?
Reply


Messages In This Thread
df and regex, NaN and df.concat - by metalray - Jan-04-2018, 10:04 AM
RE: df and regex, NaN and df.concat - by metalray - Jan-05-2018, 09:30 AM
RE: df and regex, NaN and df.concat - by buran - Jan-05-2018, 09:58 AM
RE: df and regex, NaN and df.concat - by metalray - Jan-10-2018, 09:29 AM
RE: df and regex, NaN and df.concat - by buran - Jan-10-2018, 07:02 PM
RE: df and regex, NaN and df.concat - by metalray - Jan-22-2018, 03:12 PM
RE: df and regex, NaN and df.concat - by buran - Jan-22-2018, 04:37 PM
RE: df and regex, NaN and df.concat - by metalray - Feb-06-2018, 03:11 PM
RE: df and regex, NaN and df.concat - by buran - Feb-06-2018, 03:29 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  concat two list based on array MeeranRizvi 9 7,682 Jan-03-2017, 06:28 PM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020