Python Forum
Search multiple CSV files for a string or strings
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Search multiple CSV files for a string or strings
#1
So i have a need to be able to search multiple CSV files for a string or set of strings and then return the file name that it was found in.

I want to keep it as general as possible so that the search can be performed for anything within the files.

All files will reside in 1 folder, because they are named with the dates in the file name like so:
Future_Matched_2022-02-19 to 2022-02-23.csv
Future_UnMatched_2022-02-19 to 2022-02-23.csv

The folder contains 2 files per day, all the way back to november, not that it matters, but to give context on how many files there currently are and to understand that there will be plenty more by the end of the year..

So my need is to build some script that i can plug in a value (numerical for this first iteration) 6 digit number

130062 example...

Using that number, i need to search each file in the folder, and if its found in any file, then provide a list of all the files that number was found in so we can pull those files out and investigate further.

so before i go down the rabbit hole of trying all sorts of things.. what should i start with? methods to consider and use?
What's most efficient for such a search? im not super concerned with speed now, but if i can script it so its quick now and later when there is a years worth of files to look thru that would be great.

Any guidance would be great on where or what i should consider to start with. Keeping in mind that i dont have to do anything else with the files, just need to find any files that contain the string provided.
Reply
#2
So here is my first attempt and seems to be working.. just need to tweak it a bit and get all the csv's into the same folder for this to work like i expect.

import csv, os, glob

#input keywords to  search
keywords = ('113006249')

path = r'F:\VS Projects\*.csv'


for Tname in glob.glob(path):
    # print (os.path.basename(Tname))

    #read csv, and split on "," the line
    csv_file = csv.reader(open(os.path.basename(Tname), "r", encoding='utf-8'), delimiter=",")

    #loop through the csv list
    for row in csv_file:
        #if current rows 2nd value is equal to input, print that row
        if keywords == row[0]:
              print (os.path.basename(Tname))
Reply
#3
csv is a text file.

Quote:So i have a need to be able to search multiple CSV files for a string or set of strings and then return the file name that it was found in.

Why bother with the csv module?

from pathlib import Path
mypath = '/home/pedro/Downloads/'
mydir = Path(mypath)
filelist = [filename for filename in mydir.iterdir() if filename.is_file() and filename.suffix == '.csv']
search4str = input('Enter the string you want to find ... ')
# in this case search4str = '魏敏'
files_with_search4str = []
for f in filelist:    
    with open(f)as acsvfile:
        datastring = acsvfile.read()
        if search4str in datastring:
            print('found searchforstr', searchforstr)
            # save the file path
            files_with_search4str.append(f.resolve())
Reply
#4
Can you explain why not CSV module? Seems if i remove all the empty code lines, both solutions are about the same in lines of code. Is the CSV module outdated? Is it not efficient enough for larger number of files in a folder?

I am open to all options and suggestions, just want to understand why go with one vs the other..
Reply
#5
It has to be Python? It seems to me that in CLI it's simpler. In bash:

> grep -l "130062" *.csv
It returns filenames containing searched string in current directory (without recursion), these can be piped or written to file. One can invoke this from Python using subprocess module.
ndc85430, Pedroski55, BashBedlam like this post
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#6
So i finally got all 200+ attachments into a folder and running the search here works, but now im trying to write the found results to a text file so i can send it to others for review...

Update logic doesnt seem to run, or if it is, it never returns anything and no errors

import csv, os, glob
#input keywords to  search
keywords = ('113006249')

path = r'F:\Attachments\*.csv'

for Tname in glob.glob(path):
    
    #read csv, and split on "," the line
    csv_file = csv.reader(open(Tname, "r", encoding='utf-8'), delimiter=",")

    #loop through the csv list
    for row in csv_file:
        with open(r'F:\Attachments\Results\results.txt', 'w') as f:
            if keywords == row[0]:
            # print(row)  If i just print the results out to the console, i get what i need
            # print (os.path.basename(Tname))  If i just print the results out to the console, i get what i need


                f.write(' '.join(row))  but when i try to write the results to a file, i dont get anything
                #f.write(os.path.basename(Tname))  this is also something i need in the file so we know what attachment the number was found in
Reply
#7
Figured out the correct code, i have now setup to save to a CSV with the appropriate data i need.
Reply
#8
No particular reason for not using csv, just it seems redundant looping through each row, but, there is more than 1 way to skin a cat and more than 1 way to Python a solution!

perfringo's grep seems much simpler! I like simple!
BashBedlam likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Writing a Linear Search algorithm - malformed string representation Drone4four 10 933 Jan-10-2024, 08:39 AM
Last Post: gulshan212
  python convert multiple files to multiple lists MCL169 6 1,535 Nov-25-2023, 05:31 AM
Last Post: Iqratech
  Search for multiple unknown 3 (2) Byte combinations in a file. lastyle 7 1,324 Aug-14-2023, 02:28 AM
Last Post: deanhystad
  Trying to understand strings and lists of strings Konstantin23 2 758 Aug-06-2023, 11:42 AM
Last Post: deanhystad
  splitting file into multiple files by searching for string AlphaInc 2 890 Jul-01-2023, 10:35 PM
Last Post: Pedroski55
  xml indent SubElements (wrapping) with multiple strings ctrldan 2 1,459 Jun-09-2023, 08:42 PM
Last Post: ctrldan
  Merging multiple csv files with same X,Y,Z in each Auz_Pete 3 1,152 Feb-21-2023, 04:21 AM
Last Post: Auz_Pete
  unittest generates multiple files for each of my test case, how do I change to 1 file zsousa 0 957 Feb-15-2023, 05:34 PM
Last Post: zsousa
  Find duplicate files in multiple directories Pavel_47 9 3,096 Dec-27-2022, 04:47 PM
Last Post: deanhystad
  Python: re.findall to find multiple instances don't work but search worked Secret 1 1,209 Aug-30-2022, 08:40 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020