Mar-30-2023, 03:07 PM
Hi Dean, I inserted the code into my existing code for reading the file and deleting empty lines. I altered it some because the file is an excel file, and am getting the following error:
TypeError: read_excel() got an unexpected keyword argument 'delimiter'
Here is the whole block of code, some names are changed for confidentiality. I also changed the column names after checking my data.
# Read the Directory file and delete the empty lines
project = "V2_L"
folder = "File exports"
filename = "DIRECTORY.xlsx"
filePath = project + "/" + folder + "/" + filename
print(filePath)
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=ennovfPath)
data = obj['Body'].read()
directory = pd.read_excel(io.BytesIO(data), delimiter="-",names=["SAMPLEID", "VIS_ISOLATE_NUMBER"])
directory.sort_values(by=["SAMPLEID", "VIS_ISOLATE_NUMBER"], inplace=True)
directory.columns = map(lambda x: str(x).upper(), directory.sort_values.columns)
directory=directory.sort_values.columns[directory['SAMPLEID'].isna()!=True]
directory['SAMPLEID']=directory['SAMPLEID'].astype(str)
#Create a variable SITEID based on the SUBJID (run)
directory['SITEID'] = directory['SUBJID'].str.split('_').str[0]
TypeError: read_excel() got an unexpected keyword argument 'delimiter'
Here is the whole block of code, some names are changed for confidentiality. I also changed the column names after checking my data.
# Read the Directory file and delete the empty lines
project = "V2_L"
folder = "File exports"
filename = "DIRECTORY.xlsx"
filePath = project + "/" + folder + "/" + filename
print(filePath)
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=ennovfPath)
data = obj['Body'].read()
directory = pd.read_excel(io.BytesIO(data), delimiter="-",names=["SAMPLEID", "VIS_ISOLATE_NUMBER"])
directory.sort_values(by=["SAMPLEID", "VIS_ISOLATE_NUMBER"], inplace=True)
directory.columns = map(lambda x: str(x).upper(), directory.sort_values.columns)
directory=directory.sort_values.columns[directory['SAMPLEID'].isna()!=True]
directory['SAMPLEID']=directory['SAMPLEID'].astype(str)
#Create a variable SITEID based on the SUBJID (run)
directory['SITEID'] = directory['SUBJID'].str.split('_').str[0]