Apr-16-2017, 03:15 AM
I have a CSV file containing semi structured dataset. Sample dataset is attached in a CSV file.
I want to structure the dataset in order to use for further analysis.
2. How can i copy the Type of the data row which appears just before the data row untill the next type of the data row appears?
3. Then i want to delete the rows that does not contain data in Reference, Date & Item Code columns
Thank you very much for making an effort to help on this.
df = pd.DataFrame([[np.nan,np.nan,np.nan,np.nan,np.nan], [np.nan,'JOB NO : ','E1402CJ00001',np.nan,'PROJECT HOTEL BLUE',np.nan],[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],[np.nan,'MRN',np.nan,np.nan,np.nan,np.nan],[np.nan,'MREN1402/316',np.nan,'2014-02-28','EK00NEL047','PVC CASING 24 x 14'],[np.nan,'MREN1402/316',np.nan,'2014-02-28','EK00NEL048','PVC CASING 40 x 16'],[np.nan,'ISSUES',np.nan,np.nan,np.nan,np.nan],[np.nan,'ISEN1402/340',np.nan,'2014-02-28','EK00NEL047','PVC CASING 24 x 14'],[np.nan,'ISEN1402/340',np.nan,'2014-02-28','EK00NEL048','PVC CASING 40 x 16']], columns=['BlankColumn1','REFERENCE','BlankColumn2','DATE','ITEM CODE','ITEM NAME'])
I want to structure the dataset in order to use for further analysis.
Output: JOB NO JOB NAME TYPE REFERENCE DATE ITEM CODE ITEM NAME
E1402CJ00001 PROJECT HOTEL BLUE MRN MREN1402/316 28-02-14 EK00NEL047 PVC CASING 24 x 14
E1402CJ00001 PROJECT HOTEL BLUE MRN MREN1402/316 28-02-14 EK00NEL048 PVC CASING 40 x 16
E1402CJ00001 PROJECT HOTEL BLUE ISSUES ISEN1402/340 28-02-14 EK00NEL047 PVC CASING 24 x 14
E1402CJ00001 PROJECT HOTEL BLUE ISSUES ISEN1402/340 28-02-14 EK00NEL048 PVC CASING 40 x 16
1. How can i copy the job no & job name to each data row until the next job no apperas?2. How can i copy the Type of the data row which appears just before the data row untill the next type of the data row appears?
3. Then i want to delete the rows that does not contain data in Reference, Date & Item Code columns
Thank you very much for making an effort to help on this.
Attached Files