Oct-04-2018, 07:17 PM
(Oct-04-2018, 05:11 PM)WuchaDoin Wrote: I read by accident, somewhere online, recently that it is highly unsought to append/insert a data frame either row by row, or cell by cell (for iloc). I cannot recall which one exactly, but to the point; What is the best way to append data into a data frame?
The best way is to create
DataFrame
from either a list of lists or lists of dictionaries or dictionary of lists(see Pandas cookbook). You may also read data from either CSV or Excel file - or from SQL DB. You may load JSON structure. See pandas IO methods But you never build
DataFrame
row by row. If you want some data processing to create a new data - adding columns is OK.Let me give you some simple demonstration
Output:In [17]: df = pd.DataFrame([{'a': 1, 'b':2, 'c': 3},
...: {'a': 4, 'b': 3, 'c': 10}])
...:
In [18]: df['d'] = df.apply(lambda r: r['a'] ** 3 + r['b'] ** 2 + r['c'], axis=1)
In [19]: df
Out[19]:
a b c d
0 1 2 3 8
1 4 3 10 83
In [20]: df.a**2 + df.b**2
Out[20]:
0 5
1 25
dtype: int64
You may bulk change cell values too - in this example, I replace all odd values with zeroesOutput:In [23]: df[df % 2 == 1] = 0
In [24]: df
Out[24]:
a b c d
0 0 2 0 8
1 4 0 10 0
But processing row by row defies the purpose of pandas and is inefficient.
Test everything in a Python shell (iPython, Azure Notebook, etc.)
- Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
- Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
- You posted a claim that something you did not test works? Be prepared to eat your hat.