Python Forum
How to extract different data groups from multiple CSV files using python - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How to extract different data groups from multiple CSV files using python (/thread-18754.html)



How to extract different data groups from multiple CSV files using python - Rafiz - May-30-2019

Stock Additions and Returns
Business Day Outlet Product Currency Amount Type

Seller: RSS
Date: 22/05/2019

Closing Balances
Business Day Outlet Product Currency Amount Type
22/05/2019 9526 FX AED 1665 Close
22/05/2019 9526 FX AUD 480 Close
22/05/2019 9526 FX CNY 4220 Close
22/05/2019 9526 FX CZK 16500 Close
22/05/2019 9526 FX EUR 8986 Close
22/05/2019 9526 FX HRK 4210 Close
22/05/2019 9526 FX HUF 10000 Close
22/05/2019 9526 FX IDR 100000 Close
22/05/2019 9526 FX JPY 5000 Close
22/05/2019 9526 FX PLN 980 Close
22/05/2019 9526 FX TRY 5810 Close[/b]

Customer Sales and Purchases
Business Day Outlet Product Currency Amount Type
22/05/2019 9526 FX HRK 1600 Sell
22/05/2019 9526 FX USD 305 Sell
22/05/2019 9565 FX EUR 110 Sell
22/05/2019 9565 FX EUR 100 Buy
22/05/2019 9616 FX BGN 840 Sell
22/05/2019 9616 FX EUR 440 Sell
22/05/2019 9616 FX NOK 600 Sell
22/05/2019 9616 FX USD 147 Sell
22/05/2019 9646 FX EUR 110 Sell
22/05/2019 9646 FX NOK 2150 Sell
22/05/2019 9690 FX EUR 330 Sell
22/05/2019 9691 FX CHF 250 Sell


The csv file excerpt that I want to extract and build into a panda data frame looks like the one above. The main problem is there is hundred of this csv files ( saved as different dates)of which their row numbers for the headings of 'Closing Balances' and 'Customer Sales and Purchases' names are not identical between each csv files. I have been struggling to find the answers on how to code this task but unfortunately couldn't find anything that matched. The other problem is that there is the same headings of the latter contain no values which need to be ignored. Is there any solution to solve this problem?

Your help in this is very much appreciated and this course has helped me tremendously going through my current task.

Thanks again!

Rafi


RE: How to extract different data groups from multiple CSV files using python - ichabod801 - May-30-2019

Is there anything in the title of the file that tells you what sort of data it is, so you could use that to get the row for the headers? If not, I would load the first few rows of each file, determine where the column headers are, and then read the whole file with pandas.read_csv with the appropriate header parameter.


RE: How to extract different data groups from multiple CSV files using python - Rafiz - May-30-2019

The column headers are Business Day, Outlet, Product, Currency, Amount, Type respectively. The key challenge is to read each row and select the rows that are relevant to the column headers. There are 2 data frame should be made from this csv file: Closing Balances and Customer sales and Purchases.
Thanks!


RE: How to extract different data groups from multiple CSV files using python - jefsummers - Jun-04-2019

What code do you have so far?