Python Forum
df.str.contains() Not Working
Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
df.str.contains() Not Working
#1
I have a if/else procedural code that is querying a data frame. I am passing a str.contains() command after the main (else) function, yet it only shows the 'tempdf' as an empty data frame - See below.

When I print the data frame before tempdf, it shows me the values correctly for 'loct1i4Level24' can someone tell me if my str.contains() has some error?

The Code:
if "1.4.25" in df.values:
	if "1.4.24" in df.values:
		df = df.set_index("identifier")
		t1i4Level24 = df["1.4.24":"1.4.25"]
		t1i4Level24 = t1i4Level24[:-1]
		t1i4Level24 = t1i4Level24.reset_index(level=['identifier'])
		t1i4Level24 = t1i4Level24[~t1i4Level24.identifier.str.contains(fromMatch_Subjob)]
		countRows = t1i4Level24.identifier.count()
		countRows = countRows - 1
		countRows = countRows.astype(float)
		S_t1i4Level24 = t1i4Level24['%complete'].sum()
		P_t1i4Level24 = S_t1i4Level24 / countRows
		df.at['1.4.24', '%complete']=P_t1i4Level24
		df = df.reset_index(level=['identifier'])
	else:
		pass
else:
	if "1.4.24" in df.values:
		df = df.set_index("identifier")
		loct1i4Level24 = df.loc['1.4.24':, :]
		loct1i4Level24 = loct1i4Level24.reset_index(level=['identifier'])
		tempdf = loct1i4Level24[loct1i4Level24['identifier'].str.contains(r'^\d'+'.'+'\d'+'.'+'\d$')]
		print(tempdf)
Data Frame Results:

if I print loct1i4Level24, it correctly shows as below
  identifier  taskname  %complete
0     1.4.24  Level 24       0.00
1   1.4.24.1     Job 1       0.56
2   1.4.24.2     Job 2       0.33
3   1.4.24.3     Job 3       0.28
4     1.4.30  Level 30       0.00
5   1.4.30.1     Job 1       0.26
6   1.4.30.2     Job 2       0.41
7   1.4.30.3     Job 3       0.66
8   1.4.30.4     Job 4       0.89
But tempdf is showing Empty data frame while based on the code it should only print

  identifier  taskname  %complete
0     1.4.24  Level 24       0.00
4     1.4.30  Level 30       0.00
Can anyone help identify the problem with this?
Reply
#2
[Update] - Set is Solved.

I realized that I just had to put * in front of each '\d' regex pattern in the below code. For anyone wanting to know the solution, below is the working code:

if "1.4.25" in df.values:
    if "1.4.24" in df.values:
        df = df.set_index("identifier")
        t1i4Level24 = df["1.4.24":"1.4.25"]
        t1i4Level24 = t1i4Level24[:-1]
        t1i4Level24 = t1i4Level24.reset_index(level=['identifier'])
        t1i4Level24 = t1i4Level24[~t1i4Level24.identifier.str.contains(fromMatch_Subjob)]
        countRows = t1i4Level24.identifier.count()
        countRows = countRows - 1
        countRows = countRows.astype(float)
        S_t1i4Level24 = t1i4Level24['%complete'].sum()
        P_t1i4Level24 = S_t1i4Level24 / countRows
        df.at['1.4.24', '%complete']=P_t1i4Level24
        df = df.reset_index(level=['identifier'])
    else:
        pass
else:
    if "1.4.24" in df.values:
        df = df.set_index("identifier")
        loct1i4Level24 = df.loc['1.4.24':, :]
        loct1i4Level24 = loct1i4Level24.reset_index(level=['identifier'])
        tempdf = loct1i4Level24[loct1i4Level24['identifier'].str.contains(r'^\d*'+'.'+'\d*'+'.'+'\d*$')]
        print(tempdf)
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020