Obtaining Correct Date In Pandas DataFrame

eddywinch82 · (This post was last modified: Jan-28-2020, 07:56 PM by eddywinch82.)

Thankyou so so much Sandeep,

You sorted the problem out for me really well ))

I very much appreciate your help.

Could you read my post 21, in the following Thread of mine ? And respond accordingly ?

In the Following Link :-

https://python-forum.io/Thread-Filtering...ues?page=3

Best Regards

Eddie Winch Smile

eddywinch82 · (This post was last modified: Jan-31-2020, 07:02 PM by eddywinch82.)

I have modified the Code, on this Thread, For a BBMF Year 2005 Display Schedule, which is broken down, to seperate Urls, for each Month. So I am trying to get, a DataFrame Output, for the Whole Year.

Here is the Modified Code :-

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

import pandas as pd
import requests
from bs4 import BeautifulSoup
 
res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/may05.html")
res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/june05.html")
res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/july05.html")
res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/august05.html")
res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/september05.html")
 
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
 
df = df[0]
 
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
  
  
#make df[0] to list
list=[]
for i in df[0]:
    list.append(i)
   
#reverse the list to make split to sublist easier
list.reverse()
   
#split list to sublist using condition len(val)> 2 
size = len(list) 
idx_list = [idx + 1 for idx, val in
            enumerate(list) if len(val) > 2] 
res = [list[i: j] for i, j in
        zip([0] + idx_list, idx_list + 
        ([size] if idx_list[-1] != size else []))] 
   
#make monthname to numbers and print
for i in res:
    for j in range(len(i)):
        if i[j].upper()=='JUNE':
            i[j]='6'
        elif i[j].upper() =='MAY':
            i[j]='5'
        elif i[j].upper() == 'APRIL':
            i[j]='4'
        elif i[j].upper() =='JANUARY':
            i[j]='1'
        elif i[j].upper() == 'FEBRUARY':
            i[j]='2'
        elif i[j].upper() =='MARCH':
            i[j]='3'
        elif i[j].upper() == 'JULY':
            i[j]='7'        
        elif i[j].upper() =='AUGUST':
            i[j]='8'
        elif i[j].upper() == 'SEPTEMBER':
            i[j]='9'
        elif i[j].upper() =='OCTOBER':
            i[j]='10'
        elif i[j].upper() == 'NOVEMBER':
            i[j]='11'
        elif i[j].upper() =='DECEMBER':
            i[j]='12'       
   
   
#append string and append to new list
finallist=[]
for i in res:
    for j in range(len(i)):
        if j < len(i) - 1:
            #print(f'2005-{i[-1]}-{i[j]}')
            finallist.append(f'2005-{i[-1]}-{i[j]}')
#print(finallist)
finallist.reverse()
   
#print("\n=== ORIGINAL DF ===\n")
#print(df)
   
#convert dataframe to list
listtemp1=df.values.tolist()
   
#replace found below values with 0000_removable
removelist=['LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA','DATE','JUNE','JANUARY','FEBRUARY','MARCH','MAY','JULY','AUGUST','SEPTEMBER','OCTOBER','NOVEMBER','DECEMBER','APRIL']
for i in listtemp1:
    for j in range(len(i)):
        for place in removelist:
            if str(i[j]).upper()==place:
                i[j]='0000_removable'
            else:
                pass
   
                   
#remove sublists with the replaced values we redirected
dellist=['0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable']
res = [i for i in listtemp1 if i != dellist]
   
#assign back to dataframe DF3
df3=pd.DataFrame()
df3=pd.DataFrame(res, columns=['Date','LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA'])
#print("\n=== AFTER REMOVE month and column names from DF, assigned to new as DF3 ===\n")
#print(df3)
   
   
#now assign that sorted date list to dataframe DF3
idx = 0
df3.insert(loc=idx, column='DATE', value=finallist)
pd.options.display.max_rows = 500
 
df["DATE"].fillna(method='ffill', inplace = True)
 
display = df3[(df3['Location'].str.contains('- Display')) & (df3['Dakota'].str.contains('D')) & (df3['Spitfire'].str.contains('S', na=True)) & (df3['Lancaster'] != 'L')]  
display
 
display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d')
display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%m-%Y')
##added two lines above to convert date format
 
display.drop('Lancaster', axis=1, inplace=True)
display.dropna(subset=['Spitfire', 'Hurricane'], how='all')
 
#df[(df['Location'].str.contains('- Display'))
 
#df[(df['Dakota'].str.contains('D'))
 
#(df['Dakota'].str.contains('D'))
 
#(df['Spitfire'] == 'SSS')

I am trying to get a DataFrame Output, for the whole Year 2005, from all those Url Links, in the Code.

But I get the following Traceback Error, when I run the Code, in Jupyter Notebook :-

Error:TypeError                                 Traceback (most recent call last)
<ipython-input-1-ae00b7540e28> in <module>
     31 size = len(list)
     32 idx_list = [idx + 1 for idx, val in
---> 33             enumerate(list) if len(val) > 2] 
     34 res = [list[i: j] for i, j in
     35         zip([0] + idx_list, idx_list + 

<ipython-input-1-ae00b7540e28> in <listcomp>(.0)
     31 size = len(list)
     32 idx_list = [idx + 1 for idx, val in
---> 33             enumerate(list) if len(val) > 2] 
     34 res = [list[i: j] for i, j in
     35         zip([0] + idx_list, idx_list + 

TypeError: object of type 'float' has no len()

I can't work out, what is causing the Error, Any ideas ?

Any help would be appreciated

Best Regards

Eddie Winch

eddywinch82 · Feb-07-2020, 04:45 PM

Can anyone help me ?

I would really appreciate someones help.

Regards

Eddie Winch

eddywinch82 · (This post was last modified: Feb-15-2020, 06:29 PM by eddywinch82.)

I have looked on the Internet, for similar Threads on Forums, like mine here, but I can't find a solution, to the issue I am having.

Could someone help me out here, if that is okay ?

Regards

Eddie Winch Smile

eddywinch82 · (This post was last modified: Feb-17-2020, 11:45 AM by eddywinch82.)

Hi bitasiavi,

I am sorry I didn't get your PM Message, could you send it again for me ?

Also I have sent you a PM Message, could you look at it, and get back to me ?

Regards

Eddie Winch ))

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[Solved] Formatting cells of a pandas dataframe into an OpenDocument ods spreadsheet	Calab	1	820	Mar-01-2025, 04:51 AM Last Post: Calab
	Find duplicates in a pandas dataframe list column on other rows	Calab	2	2,350	Sep-18-2024, 07:38 PM Last Post: Calab
	Find strings by index from a list of indexes in a different Pandas dataframe column	Calab	3	1,688	Aug-26-2024, 04:52 PM Last Post: Calab
	Add NER output to pandas dataframe	dg3000	0	1,202	Apr-22-2024, 08:14 PM Last Post: dg3000
	HTML Decoder pandas dataframe column	mbrown009	3	2,778	Sep-29-2023, 05:56 PM Last Post: deanhystad
	Pandas read csv file in 'date/time' chunks	MorganSamage	4	3,102	Feb-13-2023, 11:24 AM Last Post: MorganSamage
	Use pandas to obtain cartesian product between a dataframe of int and equations?	haihal	0	2,071	Jan-06-2023, 10:53 PM Last Post: haihal
	Pandas Dataframe Filtering based on rows	mvdlm	0	2,120	Apr-02-2022, 06:39 PM Last Post: mvdlm
	Pandas dataframe: calculate metrics by year	mcva	1	3,496	Mar-02-2022, 08:22 AM Last Post: mcva
	Pandas dataframe comparing	anto5	0	1,962	Jan-30-2022, 10:21 AM Last Post: anto5

Obtaining Correct Date In Pandas DataFrame

User Panel Messages

Announcements