Python Forum

Hey guys,

I have a script that basically modifies a file. In a nutshell, fileA contains some text, and the script reads fileA and uses the text to modify fileB. fileA is in a different directory than fileB, and I have multiple iterations of this. So I've hardcoded the paths in the script to fileA and fileB, but basically I need the script to iterate over multiple paths that contain fileA and fileB. Hopefully this makes sense.

example;

/fruit/oranges/fileA /veggies/tomato/fileB
/fruit/banana/fileA /veggies/lettuce/fileB
/fruit/apple/fileA /veggies/carrot/fileB

Would os.walk come into play here? I'm sure this is a basic function but I'm new to scripting and I'm not sure the technical jargon to search for what I'm trying to do. Any guidance would be appreciated.

my code here

I don't think os.walk would work here. It could find all the fileA's and all the fileB's, but how would it know that the oranges fileA matches with the tomato fileB?

The matching pairs would need to be hard coded or put into a data file that can be read. If your folder structure is consistent you could loop over the pairs:

pairs = [('oranges', 'tomato'), ('banana', 'lettuce'), ('apple', 'carrot')]
def file_mod(pathA, pathB):
    fileA = open(pathA)
    fileB = open(pathB, 'w')
    # modify file...
for nameA, nameB in pairs:
    file_mod('/fruit/{}/fileA'.format(nameA), '/veggies/{}/fileB'.format(nameB))

Quote:I have a script that basically modifies a file. In a nutshell, fileA contains some text, and the script reads fileA and uses the text to modify fileB. fileA is in a different directory than fileB, and I have multiple iterations of this. So I've hardcoded the paths in the script to fileA and fileB, but basically I need the script to iterate over multiple paths that contain fileA and fileB. Hopefully this makes sense.

Maybe you post some content of fileA and of fileB and your script.
My line of thinking: paths in fileA and fileB should be relative. Then you should have a absolute base directory stores inside your script or as env variable or in a config file. Then joining them together os.path.join(base_path, path_from_text_file). This prevents you from manipulating the path with string interpolation and it is os independent. But when the files are already done and have for example 16k lines, you must handle it like ichabod801 has shown. My suggestion is always, don't use absolute paths in files, which you want to process later. If you want to move the files to a different place and all paths are absolute, you'll have to handle this in your script also or doing the trick with a symlink.

This is my current code. So the refFile will be in /some/dir/path(A-Z)/file. The filethatneedsmodified will be in /some/other/dir/path(A-Z)/filethatneedsmodified. I'm not sure how to iterate or loop over these dirs, and creating individual scripts for each file\dir obviously isn't the best way to do it. Can I use os.path.join or something to define the two paths since both file names will remain the same across the different directories?

import fileinput
import sys

def insert_to_line(f_name):
  for line in fileinput.input(f_name, inplace=True):
      if any(item in line for item in theList) and not line.lstrip().startswith('#'):
          sys.stdout.write('# {}'.format(line))
      else:
          sys.stdout.write(line)

if __name__ == '__main__':
  with open("/some/dir/path/file", 'r') as refFile:
   theList = refFile.read().splitlines()
   f_name = "/some/other/dir/path/filethatneedsmodified"
   insert_to_line(f_name)

Given a refFile, how do you know what path to use for the modified file? It is in any way predictable? Or is it like your original post, where the two are completely unrelated?

Sorry, I'm most likely not explaining it well. The name of the refFile will always be the same, but will be in different paths. There is a corresponding file that needs to be modified that will also be named the same, also in different paths. So /this/path/to/A/refFile will correspond to /this/other/different/path/to/A/filetomodify. Next will be /this/path/to/B/refFile and will correspond to /this/other/different/path/to/B/filetomodify, and so on. The reference file is always named the same, and that will correspond to a file that needs to be modified, which will also be named the same throughout the different paths.

/this/path/to/A/refFile --> /this/other/different/path/to/A/filetomodify
/this/path/to/B/refFile --> /this/other/different/path/to/B/filetomodify
/this/path/to/C/refFile --> /this/other/different/path/to/C/filetomodify
/this/path/to/D/refFile --> /this/other/different/path/to/D/filetomodify

I don't think you need os.walk(), since you're not recursing into those directories (it would work, it's just more complicated than you need). glob.glob() could do this just fine.

import glob, pathlib

destination = "/this/other/different/path/to/{0}/filetomodify"

for ref in glob.glob("/this/path/to/*/refFile"):
   path = pathlib.Path(ref)
   folder = path.parent.name
   with open(ref) as refFile:
       with open(destination.format(folder), "w") as file_to_modify:
           for line in refFile:
               # line formatting here
               print(line, file=file_to_modify)

Thanks for the help.

Maybe it's something I'm doing wrong, but it seems it's not able to use destination = "/this/other/different/path/to/{0}/filetomodify". If I add that path to the glob I'm able to print the files. It seems it doesnt like the {0}. If I glob with a /*/ it works.

{0} is str.format() syntax. Which is why this line contains .format(): with open(destination.format(folder), "w") as file_to_modify:.

theturd

ichabod801

DeaD_EyE

theturd

nilamo

theturd

nilamo

theturd

nilamo