Python Forum
How can i combine these two functions so i only open the file once?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can i combine these two functions so i only open the file once?
#1
So i am having to process a few different text files, for each file i process, i need too capture runtime duration, record count and timestamp

Since i'm already using the scan_files() function is it possible to incorporate the mapcount() functionality into the main function?
I mean if im already opening up the first text file, why not get the count returned while its open. so far just adding the readline logic below the strip() in the main logic isnt working..

# ROUTINE TO OPEN THE APPROPRIATE TEXT FILES TO PROCESS THE IP LIST
def scan_files():
    directory = '.'
    for entry in os.scandir(directory):
        if entry.is_file() and entry.name.endswith('.txt'):
            if 'ip_list' in entry.name:
                pt = directory + '/' + entry.name
                with open(pt) as file:
                    for ip in file:
                        yield ip.strip()

# ROUTINE TO GET FILE COUNT                        
def mapcount(filename):
    with open(filename, "r+") as f:
        buf = mmap.mmap(f.fileno(), 0)
        lines = 0
        readline = buf.readline
        while readline():
            lines += 1
        return lines  
I want to open the file once, get the count and move on to the rest of the script.
Reply
#2
There is a problem with the interface: to client code scan_files() is nothing but an iterable of IPs. Suppose it also computes the number of lines in the files, how is it going to output that number?

Also the time taken by the generator to run depends on client code. If client code requests the next IP every 10 minutes, the generator will be very long to consume.
Reply
#3
(Aug-13-2023, 05:00 PM)cubangt Wrote: So i am having to process a few different text files, for each file i process, i need too capture runtime duration, record count and timestamp
You should link to other Thread or continue there,because i guess all these adds should work with code already written.
Quote:Since i'm already using the scan_files() function is it possible to incorporate the mapcount() functionality into the main function?
Have to change scan_files() function and also try to add this in with exiting code.
So start could be something like this.
import time, os
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd

def ping(ip):
    return (
        ip,
        subprocess.run(
            f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
        ).returncode,
    )

def scan_files():
    directory = 'G:/div_code/egg/ping'
    ip_files = []
    for entry in os.scandir(directory):
        if entry.is_file() and entry.name.endswith('.txt'):
            if 'ip_list' in entry.name:
                pt = directory + '/' + entry.name
                ip_files.append(pt)
    return ip_files

if __name__ == '__main__':
    df_lst = []
    for fname in scan_files():
        start = time.time()
        with open(fname) as file:
            park = [ip.strip() for ip in file]
            executor = ThreadPoolExecutor(12)
            df = pd.DataFrame(executor.map(ping, park))
            #df.to_csv(r'ip_output.csv', header=False, index=False, quoting=None)
            print(df)
            end = time.time()
            time_used = end - start
            df_lst.append(df)
            df_lst.append(f'File <{fname}> used {time_used:.2f} sec')
            print(f'File <{fname}> used {time_used:.2f} sec')
Output:
0 1 0 python-forum.io 0 1 youtube.com 0 2 youtube.com99 1 3 www.vg.no 0 4 python-forum.io99 1 File <G:/div_code/egg/ping/ip_list1.txt> used 0.13 sec 0 1 0 python-forum.io 0 1 youtube.com 0 2 youtube.com99 1 3 www.vg.no 0 4 python-forum.io99 1 File <G:/div_code/egg/ping/ip_list2.txt> used 0.07 sec
Output over is also in list df_lst.
So it will take time on each files,and Pandas index will work as file count for each file.
As what's in df_lst is still Pandas so can eg use count lines in files or if dive be 2 will get file count.
>>> df_lst[0]
                   0  1
0    python-forum.io  0
1        youtube.com  0
2      youtube.com99  1
3          www.vg.no  0
4  python-forum.io99  1
>>> df_lst[0].count()
0    5
1    5
dtype: int64
>>> len(df_lst) // 2
2
>>> df_lst[0].count() + df_lst[2].count()
0    10
1    10
dtype: int64
cubangt likes this post
Reply
#4
I don't think it applies to this problem, but you can also pass information through the iterable and user function. In this example I pass the filename and the line number of the IP address in the file. The user function passes all this info along with the ping status and a timestamp.
def ping(args):
    ip, file, counter = args
    returncode = subprocess.run(
        f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
    ).returncode
    return ip, file, counter, datetime.now(), returncode


def scan_files():
    directory = Path('.')
    for name in directory.glob("*ip_list*.txt"):
        count = 0
        with open(name, "r") as file:
            for ip in file:
                count += 1
                yield ip.strip(), name.name, count


executor = ThreadPoolExecutor(125)
df = pd.DataFrame(executor.map(ping, scan_files()))
print(df)
Output:
0 1 2 3 4 0 python-forum.io ip_list.txt 1 2023-08-13 21:53:42.935575 0 1 youtube.com ip_list.txt 2 2023-08-13 21:53:42.929570 0 2 youtube.com99 ip_list.txt 3 2023-08-13 21:53:42.910397 1 3 www.vg.no ip_list.txt 4 2023-08-13 21:53:43.117031 0 4 python-forum.io99 ip_list.txt 5 2023-08-13 21:53:42.911398 1
Reply
#5
Here code where i use Loguru one of my favorite Python libraries.
So now logged all into ip.log file.
Now that also log a date/time stamp when file run,have line number,name of file and time used.
So also just bye looking at 19:00:07 and next file start 19:00:16,that time used is ca 9-sec.
If uncomment pd.set_option('display.max_rows', None) it will log all ip and not just make head/tail as shown now.
import time, os
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
#pd.set_option('display.max_rows', None)
from loguru import logger
logger.remove() # Only info to file
logger.add("ip.log", rotation="2 day", format="{time:YYYY-MM-DD at HH:mm:ss}\n{message}")

def ping(ip):
    return (
        ip,
        subprocess.run(
            f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
        ).returncode,
    )

def scan_files():
    directory = 'G:/div_code/egg/ping'
    ip_files = []
    for entry in os.scandir(directory):
        if entry.is_file() and entry.name.endswith('.txt'):
            if 'ip_list' in entry.name:
                pt = directory + '/' + entry.name
                ip_files.append(pt)
    return ip_files

if __name__ == '__main__':
    for fname in scan_files():
        start = time.time()
        with open(fname) as file:
            park = [ip.strip() for ip in file] * 100
            executor = ThreadPoolExecutor(12)
            df = pd.DataFrame(executor.map(ping, park), columns=["address", "state"])
            df.index += 1
            end = time.time()
            time_used = end - start
            logger.info(f'{df}\n File <{fname}> used {time_used:.2f} sec\n')
Output:
2023-08-14 at 19:00:07 address state 1 python-forum.io 0 2 youtube.com 0 3 youtube.com99 1 4 www.vg.no 0 5 python-forum.io99 1 .. ... ... 496 python-forum.io 0 497 youtube.com 0 498 youtube.com99 1 499 www.vg.no 0 500 python-forum.io99 1 [500 rows x 2 columns] File <G:/div_code/egg/ping/ip_list1.txt> used 9.04 sec 2023-08-14 at 19:00:16 address state 1 python-forum.io 0 2 youtube.com 0 3 youtube.com99 1 4 www.vg.no 0 5 python-forum.io99 1 .. ... ... 496 python-forum.io 0 497 youtube.com 0 498 youtube.com99 1 499 www.vg.no 0 500 python-forum.io99 1 [500 rows x 2 columns] File <G:/div_code/egg/ping/ip_list2.txt> used 8.53 sec
Gribouillis likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Open/save file on Android frohr 0 337 Jan-24-2024, 06:28 PM
Last Post: frohr
  file open "file not found error" shanoger 8 1,157 Dec-14-2023, 08:03 AM
Last Post: shanoger
  I cannot able open a file in python ? ted 5 3,383 Feb-11-2023, 02:38 AM
Last Post: ted
  testing an open file Skaperen 7 1,393 Dec-20-2022, 02:19 AM
Last Post: Skaperen
  I get an FileNotFouerror while try to open(file,"rt"). My goal is to replace str decoded 1 1,413 May-06-2022, 01:44 PM
Last Post: Larz60+
  Dynamic File Name to a shared folder with open command in python sjcsvatt 9 6,061 Jan-07-2022, 04:55 PM
Last Post: bowlofred
  Functions to consider for file renaming and moving around directories cubangt 2 1,769 Jan-07-2022, 02:16 PM
Last Post: cubangt
  Open an excel file Newbie1114 1 2,350 Jun-16-2021, 09:11 PM
Last Post: Gribouillis
  How to open MIDI-file and get events in a list? philipbergwerf 7 5,022 May-29-2021, 08:24 AM
Last Post: j.crater
  Error on open of file created with tempfile.TemporaryDirectory() Brian177 4 6,307 Apr-05-2021, 07:12 PM
Last Post: Brian177

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020