doing data treatment on a file import-parsing a variable

EmBeck87 · Mar-30-2023, 03:07 PM

Hi Dean, I inserted the code into my existing code for reading the file and deleting empty lines. I altered it some because the file is an excel file, and am getting the following error:

TypeError: read_excel() got an unexpected keyword argument 'delimiter'

Here is the whole block of code, some names are changed for confidentiality. I also changed the column names after checking my data.

# Read the Directory file and delete the empty lines
project = "V2_L"
folder = "File exports"
filename = "DIRECTORY.xlsx"
filePath = project + "/" + folder + "/" + filename
print(filePath)

s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=ennovfPath)
data = obj['Body'].read()
directory = pd.read_excel(io.BytesIO(data), delimiter="-",names=["SAMPLEID", "VIS_ISOLATE_NUMBER"])

directory.sort_values(by=["SAMPLEID", "VIS_ISOLATE_NUMBER"], inplace=True)

directory.columns = map(lambda x: str(x).upper(), directory.sort_values.columns)

directory=directory.sort_values.columns[directory['SAMPLEID'].isna()!=True]

directory['SAMPLEID']=directory['SAMPLEID'].astype(str)

#Create a variable SITEID based on the SUBJID (run)

directory['SITEID'] = directory['SUBJID'].str.split('_').str[0]

buran write Mar-30-2023, 04:04 PM:
Please, use proper tags when post code, traceback, output, etc.
See BBcode help for more info.

**deanhystad** · Mar-30-2023, 04:18 PM

s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=ennovfPath)
data = obj['Body'].read()
directory = pd.read_excel(io.BytesIO(data), delimiter="-",names=["SAMPLEID", "VIS_ISOLATE_NUMBER"])

What is "boto3"? More importantly, what is data? Is it actually an excel file?

Assuming data is an excel file (or the binary data from an excel file), you need to use the different approach.

Read the excel file/data.
You do not specify a separator for excel files, but you might have to provide a sheet name.

directory = pd.read_excel(io.BytesIO(data), sheet_name="sheet1")

If you don't provide a sheet name, the first sheet is used. You can provide multiple sheet names in a list or tuple ("sheet1", "sheet3").

One of the columns in the spreadsheet contains the sampleID's ("5269-1" for example). You need to find the name of that column. Let's assume it is called "Samples".

Create "SAMPLE ID" and "ISOLATE NUMBER" columns by splitting the Samples columns.

directory[["SAMPLE ID", "ISOLATE NUMBER"]] = directory["Sample"].str.split('-', expand=True)

The new columns are both strings. You might want to convert the ISOLATE NUMBER column to ints.

directory = directory.astype({"ISOLATE NUMBER": int})

And you might want to sort the dataframe to group all the SAMPLE ID's together.

df = df.sort_values((["SAMPLE ID", "ISOLATE NUMBER"])

EmBeck87 · Mar-30-2023, 04:50 PM

Let me clarify- this code has worked perfectly before adding the code for parsing. Files have been imported already earlier in the code and specified, that all works just fine. I do not need to repeat this step.
boto3.client is the official AWS SDK for pYTHON USED TO CREAT CONFIGURE AND MANAGER AWS SERVICES.

EmBeck87 · Mar-30-2023, 05:25 PM

Is there a way to do this so that two new columns are not made, just the one column, VIS_ISOLATE_NUMBERS?

**deanhystad** · (This post was last modified: Mar-31-2023, 05:14 PM by deanhystad.)

Sure. You could use apply() which gives you more freedom in what you can do.

dictionary["Number"] = dictionary["ID"].apply(lambda x: x.split("-")[1])

I think it is better to stay with the current solution and either ignore the extra column or drop it.

dictionary.drop(axis=1, labels=["SAMPLE ID"])

or

new_dictionary = dictionary[["names", "of", "columns", "i", "want", "to", "keep"]]

EmBeck87 · Apr-17-2023, 06:54 PM

data..........................

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Help with writing monitored data to mysql upon change of one particular variable	donottrackmymetadata	3	305	Apr-18-2024, 09:55 PM Last Post: deanhystad
	Plot a pandas data fram via pyqtgraph with an modul import and qt designer widget	Nietzsche	0	858	May-29-2023, 02:42 PM Last Post: Nietzsche
	Python 3.11 data import question	love0715	2	819	Mar-05-2023, 06:50 PM Last Post: snippsat
	Import XML file directly into Excel spreadsheet	demdej	0	860	Jan-24-2023, 02:48 PM Last Post: demdej
	Need help on how to include single quotes on data of variable string	hani_hms	5	2,054	Jan-10-2023, 11:26 AM Last Post: codinglearner
	USE string data as a variable NAME	rokorps	1	969	Sep-30-2022, 01:08 PM Last Post: deanhystad
	Can't import csv data	JonWayn	4	1,407	Sep-18-2022, 02:07 AM Last Post: JonWayn
	How can I import a variable from another script without executing it	ThomasFab	12	7,839	May-06-2022, 03:21 PM Last Post: bowlofred
	json api data parsing	elvis	0	936	Apr-21-2022, 11:59 PM Last Post: elvis
	Modify values in XML file by data from text file (without parsing)	Paqqno	2	1,696	Apr-13-2022, 06:02 AM Last Post: Paqqno

doing data treatment on a file import-parsing a variable

User Panel Messages

Announcements