Python Forum
doing data treatment on a file import-parsing a variable
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
doing data treatment on a file import-parsing a variable
#11
Hi Dean, I inserted the code into my existing code for reading the file and deleting empty lines. I altered it some because the file is an excel file, and am getting the following error:

TypeError: read_excel() got an unexpected keyword argument 'delimiter'

Here is the whole block of code, some names are changed for confidentiality. I also changed the column names after checking my data.

# Read the Directory file and delete the empty lines
project = "V2_L"
folder = "File exports"
filename = "DIRECTORY.xlsx"
filePath = project + "/" + folder + "/" + filename
print(filePath)

s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=ennovfPath)
data = obj['Body'].read()
directory = pd.read_excel(io.BytesIO(data), delimiter="-",names=["SAMPLEID", "VIS_ISOLATE_NUMBER"])

directory.sort_values(by=["SAMPLEID", "VIS_ISOLATE_NUMBER"], inplace=True)

directory.columns = map(lambda x: str(x).upper(), directory.sort_values.columns)

directory=directory.sort_values.columns[directory['SAMPLEID'].isna()!=True]

directory['SAMPLEID']=directory['SAMPLEID'].astype(str)

#Create a variable SITEID based on the SUBJID (run)

directory['SITEID'] = directory['SUBJID'].str.split('_').str[0]
buran write Mar-30-2023, 04:04 PM:
Please, use proper tags when post code, traceback, output, etc.
See BBcode help for more info.
Reply
#12
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=ennovfPath)
data = obj['Body'].read()
directory = pd.read_excel(io.BytesIO(data), delimiter="-",names=["SAMPLEID", "VIS_ISOLATE_NUMBER"])
What is "boto3"? More importantly, what is data? Is it actually an excel file?

Assuming data is an excel file (or the binary data from an excel file), you need to use the different approach.

Read the excel file/data.
You do not specify a separator for excel files, but you might have to provide a sheet name.
directory = pd.read_excel(io.BytesIO(data), sheet_name="sheet1")
If you don't provide a sheet name, the first sheet is used. You can provide multiple sheet names in a list or tuple ("sheet1", "sheet3").

One of the columns in the spreadsheet contains the sampleID's ("5269-1" for example). You need to find the name of that column. Let's assume it is called "Samples".

Create "SAMPLE ID" and "ISOLATE NUMBER" columns by splitting the Samples columns.
directory[["SAMPLE ID", "ISOLATE NUMBER"]] = directory["Sample"].str.split('-', expand=True)
The new columns are both strings. You might want to convert the ISOLATE NUMBER column to ints.
directory = directory.astype({"ISOLATE NUMBER": int})
And you might want to sort the dataframe to group all the SAMPLE ID's together.
df = df.sort_values((["SAMPLE ID", "ISOLATE NUMBER"])
Reply
#13
Let me clarify- this code has worked perfectly before adding the code for parsing. Files have been imported already earlier in the code and specified, that all works just fine. I do not need to repeat this step.
boto3.client is the official AWS SDK for pYTHON USED TO CREAT CONFIGURE AND MANAGER AWS SERVICES.
Reply
#14
Is there a way to do this so that two new columns are not made, just the one column, VIS_ISOLATE_NUMBERS?
Reply
#15
Sure. You could use apply() which gives you more freedom in what you can do.
dictionary["Number"] = dictionary["ID"].apply(lambda x: x.split("-")[1])
I think it is better to stay with the current solution and either ignore the extra column or drop it.
dictionary.drop(axis=1, labels=["SAMPLE ID"])
or
new_dictionary = dictionary[["names", "of", "columns", "i", "want", "to", "keep"]]
Reply
#16
data..........................
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Help with writing monitored data to mysql upon change of one particular variable donottrackmymetadata 3 307 Apr-18-2024, 09:55 PM
Last Post: deanhystad
  Plot a pandas data fram via pyqtgraph with an modul import and qt designer widget Nietzsche 0 859 May-29-2023, 02:42 PM
Last Post: Nietzsche
  Python 3.11 data import question love0715 2 820 Mar-05-2023, 06:50 PM
Last Post: snippsat
  Import XML file directly into Excel spreadsheet demdej 0 862 Jan-24-2023, 02:48 PM
Last Post: demdej
  Need help on how to include single quotes on data of variable string hani_hms 5 2,059 Jan-10-2023, 11:26 AM
Last Post: codinglearner
  USE string data as a variable NAME rokorps 1 970 Sep-30-2022, 01:08 PM
Last Post: deanhystad
  Can't import csv data JonWayn 4 1,412 Sep-18-2022, 02:07 AM
Last Post: JonWayn
Question How can I import a variable from another script without executing it ThomasFab 12 7,844 May-06-2022, 03:21 PM
Last Post: bowlofred
  json api data parsing elvis 0 937 Apr-21-2022, 11:59 PM
Last Post: elvis
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,704 Apr-13-2022, 06:02 AM
Last Post: Paqqno

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020