Python Forum
How to extract different data groups from multiple CSV files using python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to extract different data groups from multiple CSV files using python
#1
Stock Additions and Returns
Business Day Outlet Product Currency Amount Type

Seller: RSS
Date: 22/05/2019

Closing Balances
Business Day Outlet Product Currency Amount Type
22/05/2019 9526 FX AED 1665 Close
22/05/2019 9526 FX AUD 480 Close
22/05/2019 9526 FX CNY 4220 Close
22/05/2019 9526 FX CZK 16500 Close
22/05/2019 9526 FX EUR 8986 Close
22/05/2019 9526 FX HRK 4210 Close
22/05/2019 9526 FX HUF 10000 Close
22/05/2019 9526 FX IDR 100000 Close
22/05/2019 9526 FX JPY 5000 Close
22/05/2019 9526 FX PLN 980 Close
22/05/2019 9526 FX TRY 5810 Close[/b]

Customer Sales and Purchases
Business Day Outlet Product Currency Amount Type
22/05/2019 9526 FX HRK 1600 Sell
22/05/2019 9526 FX USD 305 Sell
22/05/2019 9565 FX EUR 110 Sell
22/05/2019 9565 FX EUR 100 Buy
22/05/2019 9616 FX BGN 840 Sell
22/05/2019 9616 FX EUR 440 Sell
22/05/2019 9616 FX NOK 600 Sell
22/05/2019 9616 FX USD 147 Sell
22/05/2019 9646 FX EUR 110 Sell
22/05/2019 9646 FX NOK 2150 Sell
22/05/2019 9690 FX EUR 330 Sell
22/05/2019 9691 FX CHF 250 Sell


The csv file excerpt that I want to extract and build into a panda data frame looks like the one above. The main problem is there is hundred of this csv files ( saved as different dates)of which their row numbers for the headings of 'Closing Balances' and 'Customer Sales and Purchases' names are not identical between each csv files. I have been struggling to find the answers on how to code this task but unfortunately couldn't find anything that matched. The other problem is that there is the same headings of the latter contain no values which need to be ignored. Is there any solution to solve this problem?

Your help in this is very much appreciated and this course has helped me tremendously going through my current task.

Thanks again!

Rafi
Reply
#2
Is there anything in the title of the file that tells you what sort of data it is, so you could use that to get the row for the headers? If not, I would load the first few rows of each file, determine where the column headers are, and then read the whole file with pandas.read_csv with the appropriate header parameter.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
The column headers are Business Day, Outlet, Product, Currency, Amount, Type respectively. The key challenge is to read each row and select the rows that are relevant to the column headers. There are 2 data frame should be made from this csv file: Closing Balances and Customer sales and Purchases.
Thanks!
Reply
#4
What code do you have so far?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Training a model to identify specific SMS types and extract relevant data? lord_of_cinder 0 955 Oct-10-2022, 04:35 AM
Last Post: lord_of_cinder
  extract and plot data from a txt file usercat123 2 1,208 Apr-20-2022, 06:50 PM
Last Post: usercat123
  Import multiple CSV files into pandas Krayna 0 1,694 May-20-2021, 04:56 PM
Last Post: Krayna
  How to extract data from paragraph using Machine Learning with python? bccsthilina 2 3,006 Jul-27-2020, 07:02 AM
Last Post: hussainmujtaba
  Loading multiple JSON files to create a csv 0LI5A3A 0 2,081 Jun-28-2020, 10:35 PM
Last Post: 0LI5A3A
  Binning data to files Kappel 4 2,367 Jun-22-2020, 06:25 PM
Last Post: Kappel
  Filter rows by multiple text conditions in another data frame i.e contains strings an Pan 0 2,131 Jun-09-2020, 06:05 AM
Last Post: Pan
  Least-squares fit multiple data sets multiverse22 1 2,226 Jun-06-2020, 01:38 AM
Last Post: Larz60+
  how to extract financial data from photocopy of document angela1 6 3,607 Feb-15-2020, 05:50 PM
Last Post: jim2007
  pandas str.extract multiple regex groups with OR pythonidae 2 7,806 Dec-19-2019, 05:43 PM
Last Post: pythonidae

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020