Jul-20-2018, 06:26 AM
Hi Experts,
I have a csv file containing more than 60,000 records in a ftp location.
I have written a Python script which searches the product from the above csv, translation the product data to different languages and then storing in db.
I am trying to use multiprocessing to make faster the processing and for that I am splitting my csv dataframe into chunks but problem is multiprocessing generates only first chunks result, it is not giving me result of second chunks.
Below is my code-
PFA sample data for testing also.
I have a csv file containing more than 60,000 records in a ftp location.
I have written a Python script which searches the product from the above csv, translation the product data to different languages and then storing in db.
I am trying to use multiprocessing to make faster the processing and for that I am splitting my csv dataframe into chunks but problem is multiprocessing generates only first chunks result, it is not giving me result of second chunks.
Below is my code-
import multiprocessing
def process_product_data(product_df, language_code): logger.info("Processing of product data is started...") translated_product_combind = [] file_path = "../output/output_" + language_code + ".csv" num_processes = multiprocessing.cpu_count() # calculate the chunk size as an integer chunk_size = int(product_df.shape[0] / num_processes) print('process is',num_processes) print('chunks are',chunk_size) chunks = [product_df.ix[product_df.index[i:i + chunk_size]] for i in range(0, product_df.shape[0], chunk_size)] print('print shape is',product_df.shape[0]) print("length chunks::",len(chunks)) with Pool(num_processes) as p: logger.info("multiprocessing is started...") translated_product = p.map(fetch_product_details, [(chunks,language_code)]) translated_product_combind.append(translated_product) logger.info("multiprocessing is done in {} seconds".format(time.time() - start_time)) generate_csv(file_path,language_code, translated_product_combind) logger.info("Processing of product data is completed in {} seconds".format(time.time() - start_time)) return translated_product_combind
def fetch_product_details(args): product_df, language_code=args translated_product_combind=[] for productdf1 in product_df: print('chunk working::::::') for product in productdf1.itertuples(): if product!='Title': print('product is',product) print('language_code is',language_code) check_flg = False title = product.Title.replace('%', '') logger.info('Product Title is: {}'.format(title)) desc = product.Description logger.info('Product Description is: {}'.format(desc)) product_url = search_product(language_code, title) logger.info("Product Url is: {}".format(product_url)) if product_url != 'Not Found': translated_product, check_flg = translate_data_web(product_url) if (check_flg == False): translated_product = translate_data(language_code, title, desc) else: translated_product = translate_data(language_code, title, desc) translated_product_combind.append(translated_product) return translated_product_combindCan anyone please see my code and let me know what I am doing wrong here?
PFA sample data for testing also.