Python Forum
Optmized way to rewrite this very slow code - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Optmized way to rewrite this very slow code (/thread-34315.html)



Optmized way to rewrite this very slow code - liva28 - Jul-18-2021

Hey folks,

This is my first post so please bear with me. I need some help to optimize the below one liner.

pd_df.loc[flag, 'COL_{}'.format(col_number)] = pd_df.loc[flag, 'COL{}'.format(col_number)].apply(lambda x: x + str(userid) + "@")
pd_df : Panda data frame contains 2M rows
flag= numpy one dimension boolean array to filter/update many rows at once in pd_df
COL_{}'.format(col_number)= Random column number as per main FOR loop like COL_1,COL_5 upto COL_15 (Data type string with 5000 character length)

In general what this code does it, first filter the rows to be updated according to the flag and column to be updated as per column number and append list of user id in those multiple rows and single column with @ as delimiter. For examples @userid1@userid2@userid2 and so one .

This line of code consume 75% of my overall time due to slow pandas data frame loc function and large no of rows i.e 2M.

Can someone please help me to convert this piece into something more optimized way like dictionary/numpy data type.

Thanks in advance.

Regards,
Liva