Python Forum
Need help to correct my multiprocessing code
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help to correct my multiprocessing code
#1
Hi Experts,

I have a csv file containing more than 60,000 records in a ftp location.

I have written a Python script which searches the product from the above csv, translation the product data to different languages and then storing in db.

I am trying to use multiprocessing to make faster the processing and for that I am splitting my csv dataframe into chunks but problem is multiprocessing generates only first chunks result, it is not giving me result of second chunks.

Below is my code-
import multiprocessing
def process_product_data(product_df, language_code):
    logger.info("Processing of product data is started...")
    translated_product_combind = []
    file_path = "../output/output_" + language_code + ".csv"
    num_processes = multiprocessing.cpu_count()
    # calculate the chunk size as an integer
    chunk_size = int(product_df.shape[0] / num_processes)
    print('process is',num_processes)
    print('chunks are',chunk_size)
    chunks = [product_df.ix[product_df.index[i:i + chunk_size]] for i in range(0, product_df.shape[0], chunk_size)]
    print('print shape is',product_df.shape[0])
    print("length chunks::",len(chunks))
    with Pool(num_processes) as p:
        logger.info("multiprocessing is started...")
        translated_product = p.map(fetch_product_details, [(chunks,language_code)])
    translated_product_combind.append(translated_product)
    logger.info("multiprocessing is done in {} seconds".format(time.time() - start_time))
    generate_csv(file_path,language_code, translated_product_combind)
    logger.info("Processing of product data is completed in {} seconds".format(time.time() - start_time))
    return translated_product_combind
def fetch_product_details(args):
    product_df, language_code=args
    translated_product_combind=[]
    for productdf1 in product_df:
        print('chunk working::::::')
        for product in productdf1.itertuples():
            if product!='Title':
                print('product is',product)
                print('language_code is',language_code)
                check_flg = False
                title = product.Title.replace('%', '')
                logger.info('Product Title is: {}'.format(title))
                desc = product.Description
                logger.info('Product Description is: {}'.format(desc))
                product_url = search_product(language_code, title)
                logger.info("Product Url is: {}".format(product_url))
                if product_url != 'Not Found':
                    translated_product, check_flg = translate_data_web(product_url)
                if (check_flg == False):
                    translated_product = translate_data(language_code, title, desc)
                else:
                    translated_product = translate_data(language_code, title, desc)
        translated_product_combind.append(translated_product)
    return translated_product_combind
Can anyone please see my code and let me know what I am doing wrong here?
PFA sample data for testing also.

Attached Files

.csv   sample.csv (Size: 208.18 KB / Downloads: 441)
Reply
#2
When posting code, make sure you have a complete run-able snippet (without need for any modification).
The code you present cannot be run without modification.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020