Python Forum

Full Version: Errors using --processes parameter
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello all,

I'm new to Python and not even a "real" programmer, so I apologize if any of my questions show the lack of expertise ;-)

I'm trying to create "bags" (a defined structure) using a Python script that's been provided. It's called "bagit.py" and can be found on Github (https://github.com/edsu/bagit/blob/master/bagit.py). The script works very well, but considering the amount of data to be packed it runs for many hours. In a helpfile (readme.rst) I found:

Quote:Since calculating checksums can take a while when creating a bag, you may want to calculate them in parallel if you are on a multicore machine. You can do that with the --processes option:

bagit.py --processes 4 /directory/to/bag

Unfortunately following this approach leads to multiple errors (I don't know where to attach the screenshot, but I have one) and NO result at all.

As a probable solution I changed every occurance of "processes=1" into "processes=4" in the script, but that didn't help... just different error messages resulting.

Would one of you probably be able to guide me to the correct use (or syntax), please?

Thank you ever so much!
Michael
first of all - show the exact command you are using, e.g. note thar --process is a CLI option, there is no equal sign like you show.
second - copy paste the error in error tags (see BBcode help for more info.).
sonhospa Wrote:Unfortunately following this approach leads to multiple errors (I don't know where to attach the screenshot, but I have one) and NO result at all.
These error messages that you have are the starting point to solving the problem. Find a way to add them to a post. You could perhaps redirect the command's output to a file and then copy and paste that file's content. Use error tags to post.
Hello Buran and Gribouillis,

thank you for your help. I hope adding a "New Reply" is the right way here, as I couldn't see an individual reply button.

Buran, I used the command line exactly as provided in the quote above (just my own path instead of the sample path). Only after this attempt failed, as an alternative experiment, I changed the Python script (bagit.py) where the equal sign is part of several Python commands, and that's where I changed "processes=1" into "processes=4". You could see that in the link I provided to Github.

Now the errors:
a) First attempt before changing the script: "bagit.py --processes 4 path\to\bag"
Error:
2020-07-01 14:34:35,122 - INFO - Creating bag for directory E:\bagit-master\test-data\loc 2020-07-01 14:34:35,124 - INFO - Creating data directory 2020-07-01 14:34:35,124 - INFO - Moving data to E:\bagit-master\test-data\loc\tmpkstm4gie\data 2020-07-01 14:34:35,125 - INFO - Moving E:\bagit-master\test-data\loc\tmpkstm4gie to data 2020-07-01 14:34:35,126 - INFO - Using 4 processes to generate manifests: sha256, sha512 Traceback (most recent call last): File "C:\Users\Michael\AppData\Local\Programs\Python\Python37\lib\site-packages\pkg_resources\_vendor\packaging\requirements.py", line 90, in __init__ req = REQUIREMENT.parseString(requirement_string) File "C:\Users\Michael\AppData\Local\Programs\Python\Python37\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1654, in parseString raise exc File "C:\Users\Michael\AppData\Local\Programs\Python\Python37\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1644, in parseString loc, tokens = self._parse( instring, 0 )
This is only a small part, because the error messages run endlessly (i.e. hours!). I interrupted the process after a few seconds and have a file of 7000 lines already.

b) Second attempt: Changes to the script (at 4 positions) from "processes=1" to "processes=4" and rename it. Command: "bagitprocesses4.py path\to\bag"
Error:
I couldn't reproduce the error now!
BUT: After all the files were moved to a particular 'data' direcory correctly and writing the first text files correctly, there's only one process used. The relevant part of the output is (first 6,000 lines and all the following lines left out, I marked the message):
Quote:2020-07-01 15:08:16,775 - INFO - Moving reel_08.006042.jpg to E:\Alpha-Omega\Alpha-Omega Bilder für Tests\KAMERADSCHAFT_08_dF_Fertig_jpg\tmprkg_lbc4\reel_08.006042.jpg
2020-07-01 15:08:16,776 - INFO - Moving tagmanifest-sha256.txt to E:\Alpha-Omega\Alpha-Omega Bilder für Tests\KAMERADSCHAFT_08_dF_Fertig_jpg\tmprkg_lbc4\tagmanifest-sha256.txt
2020-07-01 15:08:16,777 - INFO - Moving tagmanifest-sha512.txt to E:\Alpha-Omega\Alpha-Omega Bilder für Tests\KAMERADSCHAFT_08_dF_Fertig_jpg\tmprkg_lbc4\tagmanifest-sha512.txt
2020-07-01 15:08:16,778 - INFO - Moving E:\Alpha-Omega\Alpha-Omega Bilder für Tests\KAMERADSCHAFT_08_dF_Fertig_jpg\tmprkg_lbc4 to data
2020-07-01 15:08:16,864 - INFO - Using 1 processes to generate manifests: sha256, sha512
2020-07-01 15:08:16,888 - INFO - Generating manifest lines for file data/reel_08.000043.jpg
2020-07-01 15:08:16,935 - INFO - Generating manifest lines for file data/reel_08.000044.jpg
2020-07-01 15:08:16,965 - INFO - Generating manifest lines for file data/reel_08.000045.jpg
2020-07-01 15:08:16,983 - INFO - Generating manifest lines for file data/reel_08.000046.jpg

So it seems the attempt to change didn't result in errors like using the CLI command, but also didn't change the number of processes and was therefore obsolete.

Is that information helping?