Python Forum

Full Version: Best way to append data frame?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I read by accident, somewhere online, recently that it is highly unsought to append/insert a data frame either row by row, or cell by cell (for iloc). I cannot recall which one exactly, but to the point; What is the best way to append data into a data frame? Here is the snippit I have. I am doing my best to work with PEP8 style guide while making changes at the same time so feel free to mention discrepancies in the style. I have a total of 12 columns and I feel that appending each line singly is not good to do. It looks too separated (for lack of better words to explain it).

                    # Sets the quantity #
                    InvoiceDataFrame.loc[[len(InvoiceDataFrame) - 1],
                                         InvoiceDataFrame.columns[3]]="1"

                    # Sets the class #
                    InvoiceDataFrame.loc[[len(InvoiceDataFrame) - 1],
                                         InvoiceDataFrame.columns[4]]=ClassIndex[LoadedSales.iloc[inx, 1]]

                    # Sets the Job name #
                    InvoiceDataFrame.loc[[len(InvoiceDataFrame) - 1],
                                         InvoiceDataFrame.columns[5]]=HA

                    # Sets the Terms #
                    InvoiceDataFrame.loc[[len(InvoiceDataFrame) - 1],
                                         InvoiceDataFrame.columns[6]]="Redacted"
Note: With regards to PEP8 Style Guide. I haven't taken time yet to change my variable names to match the style guide.
(Oct-04-2018, 05:11 PM)WuchaDoin Wrote: [ -> ]I read by accident, somewhere online, recently that it is highly unsought to append/insert a data frame either row by row, or cell by cell (for iloc). I cannot recall which one exactly, but to the point; What is the best way to append data into a data frame?

The best way is to create DataFrame from either a list of lists or lists of dictionaries or dictionary of lists(see Pandas cookbook).

You may also read data from either CSV or Excel file - or from SQL DB. You may load JSON structure. See pandas IO methods But you never build DataFrame row by row. If you want some data processing to create a new data - adding columns is OK.

Let me give you some simple demonstration
Output:
In [17]: df = pd.DataFrame([{'a': 1, 'b':2, 'c': 3}, ...: {'a': 4, 'b': 3, 'c': 10}]) ...: In [18]: df['d'] = df.apply(lambda r: r['a'] ** 3 + r['b'] ** 2 + r['c'], axis=1) In [19]: df Out[19]: a b c d 0 1 2 3 8 1 4 3 10 83 In [20]: df.a**2 + df.b**2 Out[20]: 0 5 1 25 dtype: int64
You may bulk change cell values too - in this example, I replace all odd values with zeroes
Output:
In [23]: df[df % 2 == 1] = 0 In [24]: df Out[24]: a b c d 0 0 2 0 8 1 4 0 10 0
But processing row by row defies the purpose of pandas and is inefficient.