Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
CVS file to EXCEL
#1
I have a CVS file, published by the author of a scientific paper, having 700,000 rows. I want to split this file up into smaller "units" preferably putting the results into EXCEL.

The limits of the "units" are set by the contents of column 9 of 10. Column 9 runs for roughly 1000 rows and then the content of (row,9) changes.

I understand that pandas will do this and I understand the general way to go.

BUT I got stuck on the detail : I cannot just figure out how to run down (row,9) until (row,9) changes from, say, E456 to BV789.

Please point me to a good descriptive reference because I haven't, so far, been able to find one.
Reply
#2
This looks like a good start, without the pomp: https://towardsdatascience.com/quick-div...1c1a80d9c4
Reply
#3
I think i can manage things now I have read this. Thank you
Reply
#4
I knew those CVS receipts were getting out of hand, but 700,000 rows? Wink
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#5
I still make that mistake from time to time.
Reply
#6
To be honest, so do I.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#7
Do you want to group your data by values in a certain column and write the
content of each group to a file. If so, your code might be like this one:

import pandas as pd
data = pd.read_csv('your big file.csv')

for gr, d in data.groupby(data.iloc[:, 9]):
    d.to_csv('output_%s.csv' % gr)
Reply
#8
"for gr, d in data.groupby(data.iloc[:, 9]):" - Please can you recommend a tutorial about the instructions for use with pandas ?

707,000 data rows. FYI these were the GPS locations of flights by a bird called the Manx Shearwater and gathered over a number of years by a group of people researching how the birds navigated. 700+ flight paths, GPS readings every five minutes. (!!)
Reply
#9
(Oct-24-2019, 10:59 AM)DavidTheGrockle Wrote: "for gr, d in data.groupby(data.iloc[:, 9]):" - Please can you recommend a tutorial about the instructions for use with pandas ?
Official documentation will be enough, I think. Pandas can read your data by chunks (however, I've never used groupby with chunked data). Let you have 1kb of data per row (it seems to be a reasonable assumption), so you have 700MB file. It should not be a problem to process this file at once if your computer have at least 8GB of memory.
Reply
#10
I cannot get my CVS data into a variable what I have is
import pandas as pd
print("Hello, World!")
# Read the data into a variable pacto
url = "F:\Carrier Bag F\NAV Padget\Padget.csv"
pacto = pd.read_csv(url)
It seems to be unable to find the file.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python openyxl not updating Excel file MrBean12 1 250 Mar-03-2024, 12:16 AM
Last Post: MrBean12
  Copy Paste excel files based on the first letters of the file name Viento 2 348 Feb-07-2024, 12:24 PM
Last Post: Viento
  Search Excel File with a list of values huzzug 4 1,147 Nov-03-2023, 05:35 PM
Last Post: huzzug
  Updating sharepoint excel file odd results cubangt 1 755 Nov-03-2023, 05:13 PM
Last Post: noisefloor
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,046 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Save and Close Excel File avd88 0 2,840 Feb-20-2023, 07:19 PM
Last Post: avd88
  Trying to access excel file on our sharepoint server but getting errors cubangt 0 773 Feb-16-2023, 08:11 PM
Last Post: cubangt
  Import XML file directly into Excel spreadsheet demdej 0 801 Jan-24-2023, 02:48 PM
Last Post: demdej
  how to read txt file, and write into excel with multiply sheet jacklee26 14 9,517 Jan-21-2023, 06:57 AM
Last Post: jacklee26
Thumbs Up Need to compare the Excel file name with a directory text file. veeran1991 1 1,065 Dec-15-2022, 04:32 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020