Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
str.endswith Output Query
#1
Hi there,

I have the following part of a Line of Code :-

(df3['DATE'].str.endswith('-6' or '-8'))
But when I execute the full code only the -6 Values are shown in a DataFrame output in Pandas, not the -8 ones.

But if I use just one or the other, the value used does give an Output.

Could someone tell me, what I need to change in that part of Code, so that both -6 and -8 Rows are shown in the resulting Output ?

I am filtering by day i.e. from the DATE Column, and day 6 is shown in the format 06 , seen as -6 and 08 i.e day 8 is seen as -8.

The Date's for the Data are in the format :-

%d-%b-%Y
I.e. Day-Month(Shown as a word i.e. June = Jun)-Year

Any help would be much appreciated

Regards

Eddie Winch Smile
Reply
#2
try
df3['DATE'].str.endswith(('-6','-8'))
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Hi buran,

Thankyou for your suggestion, I tried your suggestion just now, but I am still only getting all the DataFrame Rows, where the Date begins with the 6th Day of the month, when I re run the code.

Eddie
Reply
#4
(Aug-18-2020, 07:30 PM)eddywinch82 Wrote: where the Date begins with the 6th Day of the month
(Aug-18-2020, 07:15 PM)eddywinch82 Wrote: The Date's for the Data are in the format :%d-%b-%Y

not sure what you mean by "begins" with 6th day of the month", when format ends with year, so endswith should return if year ends with '-6' or '-8'

spam = 'foo-6', 'foo-5', 'foo-8'
for egg in spam:
    if egg.endswith(('-6', '-8')):
        print(egg)
Output:
foo-6 foo-8
maybe give us runable example, including some sample data, that reproduce the problem.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
Sorry, what I mean is I am trying, to filter the Data using the DATE Column, so that the DataFrame Rows Output, are from the 6th Day of any month, or the 8th Day of any month i.e. :-

06-Jun-2002

08-Jun-2002

06-Sep-2002

08-Aug-2002

etc

That is the format of the dates in the DataFrame.

Typing 06 doesn't give any DataFrame Row output, in that line Code, as it is seen as -6 and 08 as -8 etc,

Eddie
Reply
#6
I asked for runable example, but you don't provide one. I don't know your code, but
import pandas as pd
df = pd.DataFrame({'DATE':('06-Jun-2002', '08-Jun-2002', '09-Jun-2002')})
df2 = df[df['DATE'].str.startswith(('06', '08'))]
print(df2)
Output:
DATE 0 06-Jun-2002 1 08-Jun-2002
There is also question why you have dates as string, not as date objects
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#7
Many apologies buran, here is my Full Code :-

import pandas as pd
import requests
from bs4 import BeautifulSoup
from contextlib import suppress  

res1 = requests.get("http://web.archive.org/web/20020602133812/http://www.raf.mod.uk/bbmf/calendar.html")     
res2 = requests.get("http://web.archive.org/web/20020803081304/http://www.raf.mod.uk/bbmf/calendar.html")
soup1 = BeautifulSoup(res1.content,'lxml')
soup2 = BeautifulSoup(res2.content,'lxml')
table1 = soup1.find_all('table', align="CENTER")[0]
table2 = soup2.find_all('table', align="CENTER")[0]
df1 = pd.read_html(str(table1))
df2 = pd.read_html(str(table2))


   
df1 = df1[0]
df2 = df2[0]
df = pd.concat([df2, df1], axis=0)
  
  
##################
##################
##################
  
  
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
 
 
#make df[0] to list
list=[]
for i in df[0]:
    list.append(i)
  
#reverse the list to make split to sublist easier
list.reverse()
  
#split list to sublist using condition len(val)> 2 
size = len(list) 
idx_list = [idx + 1 for idx, val in
            enumerate(list) if len(val) > 2] 
res = [list[i: j] for i, j in
        zip([0] + idx_list, idx_list + 
        ([size] if idx_list[-1] != size else []))] 
  
#make monthname to numbers and print
for i in res:
    for j in range(len(i)):
        if i[j].upper()=='JUNE':
            i[j]='6'
        elif i[j].upper() =='MAY':
            i[j]='5'
        elif i[j].upper() == 'APRIL':
            i[j]='4'
        elif i[j].upper() =='JANUARY':
            i[j]='1'
        elif i[j].upper() == 'FEBRUARY':
            i[j]='2'
        elif i[j].upper() =='MARCH':
            i[j]='3'
        elif i[j].upper() == 'JULY':
            i[j]='7'        
        elif i[j].upper() =='AUGUST':
            i[j]='8'
        elif i[j].upper() == 'SEPTEMBER':
            i[j]='9'
        elif i[j].upper() =='OCTOBER':
            i[j]='10'
        elif i[j].upper() == 'NOVEMBER':
            i[j]='11'
        elif i[j].upper() =='DECEMBER':
            i[j]='12'       
  
  
#append string and append to new list
finallist=[]
for i in res:
    for j in range(len(i)):
        if j < len(i) - 1:
            #print(f'2004-{i[-1]}-{i[j]}')
            finallist.append(f'2002-{i[-1]}-{i[j]}')
#print(finallist)
finallist.reverse()
  
  
  
  
#print("\n=== ORIGINAL DF ===\n")
#print(df)
  
  
  
#convert dataframe to list
listtemp1=df.values.tolist()
  
#replace found below values with 0000_removable
removelist=['LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA','DATE','JUNE','JANUARY','FEBRUARY','MARCH','MAY','JULY','AUGUST','SEPTEMBER','OCTOBER','NOVEMBER','DECEMBER','APRIL']
for i in listtemp1:
    for j in range(len(i)):
        for place in removelist:
            if str(i[j]).upper()==place:
                i[j]='0000_removable'
            else:
                pass
  
                  
#remove sublists with the replaced values we redirected
dellist=['0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable']
res = [i for i in listtemp1 if i != dellist]
  
#assign back to dataframe DF3
df3=pd.DataFrame()
df3=pd.DataFrame(res, columns=['Date','LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA'])
#print("\n=== AFTER REMOVE month and column names from DF, assigned to new as DF3 ===\n")
#print(df3)
  
  
#now assign that sorted date list to dataframe DF3
idx = 0
df3.insert(loc=idx, column='DATE', value=finallist)
pd.options.display.max_rows = 500
  
#print("\n=== FINAL DF3 after joining the edited date format column list ===\n")
#print(df3)
  
  
#validation logic if needed compare processed date from new joined "edited_Date_format" column with already existing "Date" column
#df3['ED1']=  pd.to_datetime(df3['EDITED_DATE_FORMAT'],format='%Y-%m-%d').dt.day
#df3['validation of date'] = df3.apply(lambda x: str(x['ED1']) == x['Date'], axis=1)
  
  
#convert df3['EDITED_DATE_FORMAT'] column from object to datetime64 foramt
#df3['EDITED_DATE_FORMAT']= pd.to_datetime(df3['EDITED_DATE_FORMAT'],format='%Y-%m-%d')
  
  
  
##################
##################
##################
  
  
 
 
  
 
#df3 = df3.rename(columns=df.iloc[0])
#df3 = df.iloc[2:]
#df3.head(15)
pd.options.display.max_rows = 1000
pd.options.display.max_columns = 1000

df3['LANCASTER'] = df3['LANCASTER'].replace({'X': 'L'})
df3['HURRICANE'] = df3['HURRICANE'].replace({'X': 'H'})
df3['SPITFIRE'] = df3['SPITFIRE'].replace({'X': 'S'})
df3['SPITFIRE'] = df3['SPITFIRE'].replace({'X x 2': 'SS'})
df3['DAKOTA'] = df3['DAKOTA'].replace({'X': 'D'})


#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DAKOTA'].str.contains('X')) & (df3['SPITFIRE'].str.contains('X', na=True)) & (df3['LANCASTER'] != 'X')]    
#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DAKOTA'].str.contains('D')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['LANCASTER'] != 'L')]    

#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DATE'].str.contains('-6-')) & (df3['DAKOTA'].str.contains('D')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['LANCASTER'] != 'L')]    

#Use the above Line of Code when filtering DataFrame by Month

#Months = May Jun Jul Aug Sep              ('-6' or '-8'))
#Months = -5- -6- -7- -8- -9-

display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DATE'].str.endswith('-6','-8')) & (df3['DAKOTA'].str.contains('D')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['LANCASTER'] != 'L')]    

#df3['DATE'].str.endswith('-6')) Use minus sign in front of the number when filtering the DataFrame by Days of Month starting with 0 i.e. 06 i.e. day 6

#print(display) 

#display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d')
#display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%m-%d-%Y')
#display=display.sort_values(by=['DATE'])
##added two lines above to convert date format
#display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')

display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d')
display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%m-%d-%Y')

with suppress(Exception):
 display.drop([303], inplace=True) 
display.rename(index={6: '303', 20: '36'}, inplace=True)
#display.rename(index={6: '19'}, inplace=True)
display=display.rename_axis('MyIdx')
display=display.sort_values(by=['DATE','MyIdx'])
display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')
display.drop_duplicates(subset=['LOCATION', 'DATE'], keep='last', inplace=True)

#display.drop('LANCASTER', axis=1, inplace=True)
#display.drop('Date', axis=1, inplace=True)
display=display.dropna(subset=['SPITFIRE', 'HURRICANE'], how='all')
display=display[['LOCATION','DATE','DAKOTA','HURRICANE','SPITFIRE']]
display=display.fillna('--')
display.reset_index(drop=True, inplace=True)
display.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF-2002-Code (Dakota With Fighters).csv')

display

#print(display)
Eddie
Reply
#8
Hi buran,

I have solved the issues I was having :-

For filtering the Rows, by multiple matching days of the Month, from all Months :-

Instead of using this line of Code :-

(df3['DATE'].str.endswith('-6','-8'))
I now use the following :-

(df3['DATE'].str.contains('-6$|-8$'))
And for Filtering Rows, for the same Day of the Month from all Months.

I now use :-

(df3['DATE'].str.contains('-6$'))
'$' Is used to match the end of a String, you know that no doubt, I found a Stackoverflow Forum Post
that mentioned that.

And now I get the correct DataFrame Output ))

And for filtering by multiple Months I use :-

(df3['DATE'].str.contains('-6-|-8-'))
And that works too. That would show all Rows, from June and August for example.

Many thanks for your help.

Best Regards

Eddie Winch Smile
Reply
#9
Hi there,

I am using the following line of Code :-

display.sort_values(by=['DATE'])
In another code, it correctly outputs the earliest days in the DataFrame Output first i.e. 10 before 15, but not in the month order I want :-

Output:
LOCATION DATE DAKOTA HURRICANE SPITFIRE MyIdx 176 Duxford 10-Jul-2004 D H S 177 Cirencester 10-Jul-2004 D H S 178 Brize Norton 10-Jul-2004 D H S 74 Shrivenham 20:00 10-Jun-2004 D H S 257 Campbletown 15-Aug-2004 D -- S 258 Sunderland 15-Aug-2004 D -- S 261 Scampton 15-Aug-2004 D -- S 200 RIAT Fairford 15-Jul-2004 D -- SS 22 Tilford 15-May-2004 D -- S 23 Abingdon 15-May-2004 D -- S 24 Hyde Heath Village 15-May-2004 D -- S
I want 10th June 2004 first then the 10th of July/s then the 15th of May's then the 15th of August Rows. How do I modify that line of Code, so that I can filter to get that order, without changing the index position of the Rows via code, which I know how to do ?

I mean add something to that line of code, so that the Earlier month with a day, is shown 'favoured' before the later month with the same day ? i.e. 10-Jun-2004 is shown before 10-Jul-2004 , 15-May-2004 is shown before 15-Jul-2004 Rows etc. But still dates with day 10 , showing before day 15 Rows.

(df3['DATE'].str.contains('-10$|15$'))
Was how I filtered the days.

Update :-

My Code now ends :-

display['DATE'] = pd.to_datetime(display['DATE'])

display = display.sort_values(by='DATE', key=lambda col: 100 * col.dt.day + col.dt.month)

display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')

display
I was getting an TypeError, as I was using an early version of Pandas, and unfortunately you cant use sort_values with a key Argument, in versions of Pandas below v1.1. So I upgraded to the latest version, and now I get the correct desired DataFrame Output, when using those added lines of Code, when I run my Full Code.

Regards

Eddie Winch Smile
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Query output in Tuple paulo79 2 1,958 Apr-07-2022, 12:54 PM
Last Post: paulo79
  Format SQL Query Output phillyfa 2 4,039 Apr-22-2020, 07:45 AM
Last Post: buran
  syntax for endswith a digit birdieman 2 5,632 Mar-04-2017, 07:21 PM
Last Post: birdieman

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020