Python Forum

I read by accident, somewhere online, recently that it is highly unsought to append/insert a data frame either row by row, or cell by cell (for iloc). I cannot recall which one exactly, but to the point; What is the best way to append data into a data frame? Here is the snippit I have. I am doing my best to work with PEP8 style guide while making changes at the same time so feel free to mention discrepancies in the style. I have a total of 12 columns and I feel that appending each line singly is not good to do. It looks too separated (for lack of better words to explain it).

                    # Sets the quantity #
                    InvoiceDataFrame.loc[[len(InvoiceDataFrame) - 1],
                                         InvoiceDataFrame.columns[3]]="1"

                    # Sets the class #
                    InvoiceDataFrame.loc[[len(InvoiceDataFrame) - 1],
                                         InvoiceDataFrame.columns[4]]=ClassIndex[LoadedSales.iloc[inx, 1]]

                    # Sets the Job name #
                    InvoiceDataFrame.loc[[len(InvoiceDataFrame) - 1],
                                         InvoiceDataFrame.columns[5]]=HA

                    # Sets the Terms #
                    InvoiceDataFrame.loc[[len(InvoiceDataFrame) - 1],
                                         InvoiceDataFrame.columns[6]]="Redacted"

Note: With regards to PEP8 Style Guide. I haven't taken time yet to change my variable names to match the style guide.

(Oct-04-2018, 05:11 PM)WuchaDoin Wrote: [ -> ]I read by accident, somewhere online, recently that it is highly unsought to append/insert a data frame either row by row, or cell by cell (for iloc). I cannot recall which one exactly, but to the point; What is the best way to append data into a data frame?

The best way is to create DataFrame from either a list of lists or lists of dictionaries or dictionary of lists(see Pandas cookbook).

You may also read data from either CSV or Excel file - or from SQL DB. You may load JSON structure. See pandas IO methods But you never build DataFrame row by row. If you want some data processing to create a new data - adding columns is OK.

Let me give you some simple demonstration

Output:In [17]: df = pd.DataFrame([{'a': 1, 'b':2, 'c': 3},
    ...:                    {'a': 4, 'b': 3, 'c': 10}])
    ...:

In [18]: df['d'] = df.apply(lambda r: r['a'] ** 3 + r['b'] ** 2 + r['c'], axis=1)

In [19]: df
Out[19]:
   a  b   c   d
0  1  2   3   8
1  4  3  10  83

In [20]: df.a**2 + df.b**2
Out[20]:
0     5
1    25
dtype: int64

You may bulk change cell values too - in this example, I replace all odd values with zeroes

Output:In [23]: df[df % 2 == 1] = 0

In [24]: df
Out[24]:
   a  b   c  d
0  0  2   0  8
1  4  0  10  0

But processing row by row defies the purpose of pandas and is inefficient.

WuchaDoin

volcano63