Im converting one chemical notation to another type. My list has like over 6k different names to convert and it takes so long. How can I use multiprocessing? I tried to implement myself, but im a noob. Other code optimisations are welcome too!
I tried to implement multiprocessing myself, but im a noob
def resolve(str_input, representation):
import cirpy
return cirpy.resolve(str_input, representation)
compound_list = []
smiles_list = []
for index, row in df_Verteilung.iterrows():
try:
actual_smiles = resolve(row['Compound'], 'smiles')
except:
actual_smiles = 'Error'
print('\r', row['Compound'], actual_smiles, end='')
compound_list.append(row['Compound'])
smiles_list.append(actual_smiles)
df_new = pd.DataFrame({'Compound' : compound_list, 'SmilesCode' : smiles_list})
df_new.to_csv(index=False)
Hi,
there is no multiprocessing in your posted code at all...What did you try yet? Also, 6000 names doesn't sound that excessive - how long does one conversion take?
Is there any reason why you don't use the apply
method of Pandas for converting the Compound
column? This would make your code much easier.
Notes on your code:
* Never use naked try... except
as this catches all errors, incl. programming error. Errors should be caught explicitely.
* Having a postfix stating the data type of a variable doesn't make too much sense. The data type should be clear from your code.
* The import statement in line 2 should be move to the top of your code, shouldn't be inside the function.
Regards, noisefloor
I know there is no multiprocessing in it. I didn't post it, because there was always an error.
A conversion took on my MBP like 1 1/2 hour.
I tried the apply function, but it didn't work. Could you show me how the apply method works?
try..except is the only thing I know. Is there any other option?
Thank you for the fast answer noisefloor!!
Hi,
Quote: I know there is no multiprocessing in it. I didn't post it, because there was always an error.
Let's say it like this: if there would have been no error, there would have been no reason to post here ;-)
The point is: even wrong code is a better starting point than no code at all. So please post your code here.
Quote: I tried the apply function, but it didn't work. Could you show me how the apply method works?
What did you try? Show the code. "It did not work" is not helpful as an error message.
Quote: try..except is the only thing I know. Is there any other option?
except
accepts an argument to catch certain exceptions, e.g.
except IndexError
would catch index errors only, no syntax errors or name errors or ...
These are Python basics, so I'd recommend to read again the official documentation or the corresponding section in the tutorial on this.
Regards, noisefloor
That was my try implementing multiprocessing. The error was : unexpected indent.
from multiprocessing import Pool
def resolve(str_input, representation):
try:
import cirpy
res = cirpy.resolve(str_input, representation)
except:
res = "Error"
print('\r', row['Compound'], res, end='')
return res
compound_list = [row['Compound'] for row in df_Verteilung.iterrows()]
n = 5
with Pool(processes=n) as pool:
smiles_list = pool.starmap(resolve, [(row['Compound'], 'smiles') for row in df_Verteilung.iterrows()])
df_new = pd.DataFrame({'Compound' : compound_list, 'SmilesCode' : smiles_list})
df_new.to_csv(index=False)
This was my try for the apply-function:
import CIRpy
def resolve(x):
return cirpy.resolve(str_input, "smiles")
df2["Compound"] = df2["Compound"].apply(resolve)
Thank you again noisefloor!!
First I got: unexpected indent in line 3
but then nothing happened no error, but I can't see any results...
Im gonna try map now and post the code afterwards.
Noisefloor, thank you very much. You carrying my ass!!!
Hi,
well I though the code in the previous post had wrong indention because of a C&P error. But if your code is REALLY like this, line 3 to end are indendet 4 spaces to much. This won't run.
Regards, noisefloor