Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
keyword matching part2
#1
Script needs to read excel file which has let's say column A having these set of keywords let' s say 100 such keywords in this column and
after reading these keywords from this column 'A' python script should search them to a particular path in D drive for all folders and subfolders
containing miscellaneous file types(.xml,.txt,.html,.yml,.sh...etc.)and once match is found lets say for first keyword it finds
a match in particular file at specific line number in this file and again it finds the same keyword in another file at some other line number
so here totally it was found 2 times and at different line nos in different files but it may also be the scenario that same keyword in same file
is found more than once on same line no itself or at different line no as well in the same file. So we need finally this statics for this keyword :- 1)
File names where it was found 2) At what lines ( it may be repeated more than one line nos as it may depend on the frequency of that particular keyword
how many times it's occuring in a single file itself ) in these files it was found. 3)
Total count of that particular keyword where all it was found while searching all the file in the
given disk drive for all the files,folders,subfolders in it.
also after reading this excel file it should write the found these details in a html file which should contain these statics (as a graph format in html file) :-1)Keyword matching file names 2) All the line nos where all
it was found with their file names as well 3) Total count for each keyword which were found different no. of times while searching in this
D drive( which has different types of files in it) for all the files,folders,subfolders in it. so any code help to achieve the same result please?

[b]Desired result:[/b]- code is supposed to give result in a set format [keyword1,found in file name(at this lines),found in file name(at this line),(total found count of keyword1)] so for all the keywords which are listed in excel file's column 'A', it should give these results that too in a graph format in the html file and up on clicking on this pie chart or graph we should be informed this keyword occurred in this file and at this particular line no. also that particular line no. should also be highlighted when we select that file where that keyword was found after scanning is completed.

Thanks
Reply
#2
What have you tried? We are glad to help, but we are not going to do it for you.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Tried below but it's seems not to be working :-

import os
import glob

def search_words(keyword,target_dir):
    files = glob.glob(target_dir+'/**', recursive=True)
    python_files = []
    results = []
    line_no = []
    #Isolate target files from folders and everything else
    for f in files:
        if f.endswith('.py'):
            python_files.append(f)

    for pyf in python_files:
        with open(pyf,'rb') as f:
            lines = f.readlines()
        for i,line in enumerate(lines):
            line = str(line)
            if line.find(keyword) > -1:
                line_no.append(i)

        results.append({'keyword':keyword,'lines':line_no,'target_file':pyf,'total_found':len(line_no)})
    return results
Reply
#4
How exactly is it not working?
Reply
#5
#importing required packages

import glob
from collections import Counter
import re
import xlwt
from xlwt import Workbook
import xlsxwriter 
import xlrd 
import errno
import time
from datetime import datetime
import datetime
import os
import os.path
import warnings
from xlutils.copy import copy
import openpyxl

# opening excel file

from xlrd import open_workbook

warnings.filterwarnings("ignore")
timestr = time.strftime("%Y%m%d-%H%M%S")

# path where all the folders and sub folders need to be searched.

yourpath = "D:\\mainfolder\\subfolders"


# location where excel containing all the keywords which are to be searched in above path.
loc = ("D:\\sample.xlsx")

cnt = Counter()
wb = xlrd.open_workbook(loc) 
sheet = wb.sheet_by_index(0)
rows = sheet.nrows

excel_word=[]
# loop to pick up all the keywords from the column of excel one after another.

for i in range(1,rows):
    excel_word.append(sheet.cell_value(i,1))

# report to be generated in this location.
report_txt="D:\\mainfolder\\report"+timestr+".txt"

# report is opened in write mode.

FO = open(report_txt, 'w')

# structure layout of text file where records will be written.

str3="|"+"Pattern"+" "*(20-len("Pattern"))+"|"+"Vuernabilitiy  in file"+" "*(200-len("Vuernabilitiy  in file"))+"|"+"Line No"+" "*(10-len("Line No"))+"|" +"\n"
FO.write(str3)
FO.write("--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------"+"\n")

# main logic 

for root, dirs, files in os.walk(yourpath, topdown=False):
    
    line=0
    for name in files:
        
        path=os.path.join(root, name)
        files = glob.glob(path)
        
        for name in files:
                
                
                try:
                    with open(name,encoding="utf8",errors='ignore') as f:
                        
                        text_string1 = f.read()
                        for i in range(0,len(excel_word)):
                            str2=''
                            for num, line in enumerate(name, 1):
                                if excel_word[i] in text_string1:
                                    cnt[excel_word[i]] += 1
                            
                                    str2="|"+excel_word[i]+" "*(20-len(excel_word[i]))+"|"+os.path.join(root, name)+" "*(200-len(os.path.join(root, name)))+"|"+str(num)+" "*(10-len(str(num)))+"|" +"\n"
                                
                            else:
                                cnt[excel_word[i]]+=0
                            FO.write(str2)
                            
                            
                                          
                except IOError as exc:
                    if exc.errno != errno.EISDIR:
                        raise

FO.close()


i have tried to add as many proper comments in above pasted code.
Output i am not appending because line number column is giving incorrect line no. for the found keyword in respective files for the above script.
So experts now could you please advise how to correct above code so that we can get correct line no. for the keywords which are seached for all the files in the mentioned path.

Thanks
Reply
#6
Hi Experts,

Any updates on the same please?

Thanks
Reply
#7
any updates please?
Reply
#8
Is there any python expert on this forum who could assist regarding the same please?

Thanks
Reply
#9
The will write only the last record (str2) because the write statement is outside the for

                            for num, line in enumerate(name, 1):
                                if excel_word[i] in text_string1:
                                    cnt[excel_word[i]] += 1
                             
                                    str2="|"+excel_word[i]+" "*(20-len(excel_word[i]))+"|"+os.path.join(root, name)+" "*(200-len(os.path.join(root, name)))+"|"+str(num)+" "*(10-len(str(num)))+"|" +"\n"
                                 
                            else:
                                cnt[excel_word[i]]+=0
                            FO.write(str2) 
Reply
#10
(Dec-28-2018, 02:56 AM)woooee Wrote: The will write only the last record (str2) because the write statement is outside the for

                            for num, line in enumerate(name, 1):
                                if excel_word[i] in text_string1:
                                    cnt[excel_word[i]] += 1
                             
                                    str2="|"+excel_word[i]+" "*(20-len(excel_word[i]))+"|"+os.path.join(root, name)+" "*(200-len(os.path.join(root, name)))+"|"+str(num)+" "*(10-len(str(num)))+"|" +"\n"
                                 
                            else:
                                cnt[excel_word[i]]+=0
                            FO.write(str2) 
Had it been outside for loop it would not have written anything since it's writing but line numbers are not coming correctly that is the main problem here and i am unable to understand why it's picking up incorrect line numbers.

Is there any option available to upload output file(text file) on this forum?
if yes, then how to upload it here on this forum so that every one could understand that it's writing wrong line number in the text file report based on above code snippet.


Thanks
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Find a specific keyword after another keyword and change the output sgtmcc 5 836 Oct-05-2023, 07:41 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020