Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How to extract different data groups from multiple CSV files using python
#1
Stock Additions and Returns
Business Day Outlet Product Currency Amount Type

Seller: RSS
Date: 22/05/2019

Closing Balances
Business Day Outlet Product Currency Amount Type
22/05/2019 9526 FX AED 1665 Close
22/05/2019 9526 FX AUD 480 Close
22/05/2019 9526 FX CNY 4220 Close
22/05/2019 9526 FX CZK 16500 Close
22/05/2019 9526 FX EUR 8986 Close
22/05/2019 9526 FX HRK 4210 Close
22/05/2019 9526 FX HUF 10000 Close
22/05/2019 9526 FX IDR 100000 Close
22/05/2019 9526 FX JPY 5000 Close
22/05/2019 9526 FX PLN 980 Close
22/05/2019 9526 FX TRY 5810 Close[/b]

Customer Sales and Purchases
Business Day Outlet Product Currency Amount Type
22/05/2019 9526 FX HRK 1600 Sell
22/05/2019 9526 FX USD 305 Sell
22/05/2019 9565 FX EUR 110 Sell
22/05/2019 9565 FX EUR 100 Buy
22/05/2019 9616 FX BGN 840 Sell
22/05/2019 9616 FX EUR 440 Sell
22/05/2019 9616 FX NOK 600 Sell
22/05/2019 9616 FX USD 147 Sell
22/05/2019 9646 FX EUR 110 Sell
22/05/2019 9646 FX NOK 2150 Sell
22/05/2019 9690 FX EUR 330 Sell
22/05/2019 9691 FX CHF 250 Sell


The csv file excerpt that I want to extract and build into a panda data frame looks like the one above. The main problem is there is hundred of this csv files ( saved as different dates)of which their row numbers for the headings of 'Closing Balances' and 'Customer Sales and Purchases' names are not identical between each csv files. I have been struggling to find the answers on how to code this task but unfortunately couldn't find anything that matched. The other problem is that there is the same headings of the latter contain no values which need to be ignored. Is there any solution to solve this problem?

Your help in this is very much appreciated and this course has helped me tremendously going through my current task.

Thanks again!

Rafi
Quote
#2
Is there anything in the title of the file that tells you what sort of data it is, so you could use that to get the row for the headers? If not, I would load the first few rows of each file, determine where the column headers are, and then read the whole file with pandas.read_csv with the appropriate header parameter.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures

Quote
#3
The column headers are Business Day, Outlet, Product, Currency, Amount, Type respectively. The key challenge is to read each row and select the rows that are relevant to the column headers. There are 2 data frame should be made from this csv file: Closing Balances and Customer sales and Purchases.
Thanks!
Quote
#4
What code do you have so far?
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  how to extract financial data from photocopy of document angela1 6 211 Feb-15-2020, 05:50 PM
Last Post: jim2007
  pandas str.extract multiple regex groups with OR pythonidae 2 653 Dec-19-2019, 05:43 PM
Last Post: pythonidae
  Weighted average with multiple weights and groups amyd 0 291 Oct-11-2019, 10:30 AM
Last Post: amyd
  how to fit an exponential function to multiple data python_newbie09 1 362 Sep-24-2019, 08:12 AM
Last Post: scidam
  Ask for machine learning Python example with 2 data files user5566b 2 298 Sep-05-2019, 12:15 PM
Last Post: user5566b
  How to extract data between two strings SriMekala 2 396 Aug-08-2019, 01:54 PM
Last Post: SriMekala
  Need Help With Filtering Data For Excel Files Using Pandas eddywinch82 9 875 Aug-06-2019, 03:44 PM
Last Post: eddywinch82
  Append Multiple CSV files Nidhesh 2 572 Jul-03-2019, 11:55 AM
Last Post: Nidhesh
  Grab columns from multiple files, combine into one jon0852 0 501 Feb-12-2019, 02:53 AM
Last Post: jon0852
  Concatenate multiple csv files Oscarca 1 1,014 Nov-05-2018, 11:18 AM
Last Post: Larz60+

Forum Jump:


Users browsing this thread: 1 Guest(s)