Python Forum

Full Version: Need help to correct my multiprocessing code
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi Experts,

I have a csv file containing more than 60,000 records in a ftp location.

I have written a Python script which searches the product from the above csv, translation the product data to different languages and then storing in db.

I am trying to use multiprocessing to make faster the processing and for that I am splitting my csv dataframe into chunks but problem is multiprocessing generates only first chunks result, it is not giving me result of second chunks.

Below is my code-
import multiprocessing
def process_product_data(product_df, language_code):
    logger.info("Processing of product data is started...")
    translated_product_combind = []
    file_path = "../output/output_" + language_code + ".csv"
    num_processes = multiprocessing.cpu_count()
    # calculate the chunk size as an integer
    chunk_size = int(product_df.shape[0] / num_processes)
    print('process is',num_processes)
    print('chunks are',chunk_size)
    chunks = [product_df.ix[product_df.index[i:i + chunk_size]] for i in range(0, product_df.shape[0], chunk_size)]
    print('print shape is',product_df.shape[0])
    print("length chunks::",len(chunks))
    with Pool(num_processes) as p:
        logger.info("multiprocessing is started...")
        translated_product = p.map(fetch_product_details, [(chunks,language_code)])
    translated_product_combind.append(translated_product)
    logger.info("multiprocessing is done in {} seconds".format(time.time() - start_time))
    generate_csv(file_path,language_code, translated_product_combind)
    logger.info("Processing of product data is completed in {} seconds".format(time.time() - start_time))
    return translated_product_combind
def fetch_product_details(args):
    product_df, language_code=args
    translated_product_combind=[]
    for productdf1 in product_df:
        print('chunk working::::::')
        for product in productdf1.itertuples():
            if product!='Title':
                print('product is',product)
                print('language_code is',language_code)
                check_flg = False
                title = product.Title.replace('%', '')
                logger.info('Product Title is: {}'.format(title))
                desc = product.Description
                logger.info('Product Description is: {}'.format(desc))
                product_url = search_product(language_code, title)
                logger.info("Product Url is: {}".format(product_url))
                if product_url != 'Not Found':
                    translated_product, check_flg = translate_data_web(product_url)
                if (check_flg == False):
                    translated_product = translate_data(language_code, title, desc)
                else:
                    translated_product = translate_data(language_code, title, desc)
        translated_product_combind.append(translated_product)
    return translated_product_combind
Can anyone please see my code and let me know what I am doing wrong here?
PFA sample data for testing also.
When posting code, make sure you have a complete run-able snippet (without need for any modification).
The code you present cannot be run without modification.