Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Grab columns from multiple files, combine into one
#1
I have some files stored in a cloud storage bucket and each file contains different variables. What I would like to develop is a function whereby I simply enter in the variables I am interested in and run the function to create a master data set with only those columns/variables. The function iterates through the files and when it finds one of the variable/column names entered as input in the function in one of the files, it grabs that column(s) and joins it to a master dataframe. Below is what I have so far. Any help in developing this further would be very much appreciated.

---
from tensorflow.python.lib.io import file_io

files = [o.key for o in storage.Objects(bucket_name, '', '')]
def get_my_data(list1):
  df=pd.DataFrame()
  files = [o.key for o in storage.Objects(bucket_name, '', '')]
  for l in list1:
    for f in files:
      file1="gs://bucket_name/%s" % f
      with file_io.FileIO(file1, 'r') as f:
        columns = pd.read_csv(f, nrows=1)
        if l in columns:
          data=pd.read_csv(f)
          print file1, data[l]
          #append desired column to our new df
        else:
          pass
get_my_data(['var1', 'var2', 'var3'])

Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Creating A List of DataFrames & Manipulating Columns in Each DataFrame firebird 1 114 Jul-31-2019, 04:04 AM
Last Post: scidam
  compound columns using IN rregorr 3 367 Jul-22-2019, 11:27 PM
Last Post: scidam
  Append Multiple CSV files Nidhesh 2 196 Jul-03-2019, 11:55 AM
Last Post: Nidhesh
  display graph in columns and rows william888 1 227 Jul-02-2019, 10:19 AM
Last Post: dataman
  [pandas] How to re-arrange DataFrame columns SriMekala 8 697 Jun-22-2019, 12:55 AM
Last Post: scidam
  How to extract different data groups from multiple CSV files using python Rafiz 3 290 Jun-04-2019, 05:20 PM
Last Post: jefsummers
  Apply function on different columns as defined DavidGG 5 573 Jun-03-2019, 01:12 PM
Last Post: ichabod801
  The combination of four columns in two. pawlo392 2 294 May-31-2019, 01:54 PM
Last Post: heiner55
  Pandas - cumulative sum of two columns tobbs 12 677 May-25-2019, 08:37 PM
Last Post: tobbs
  Selecting Few Columns from a dataframe Shivi_Bhatia 2 331 Mar-24-2019, 12:20 PM
Last Post: Shivi_Bhatia

Forum Jump:


Users browsing this thread: 1 Guest(s)