Posts: 6,767
Threads: 20
Joined: Feb 2020
Aug-12-2023, 03:20 AM
(This post was last modified: Aug-12-2023, 03:21 AM by deanhystad.)
One of these things is not like the others! The answer is easy to see if you really look and think about the results.
Maybe this will make it easier to see.
def ping(ip):
return (
ip,
subprocess.run(
f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
).returncode,
)
start = time.time()
park = ["www.wallmart.com", "www.walmart.com\n"]
executor = ThreadPoolExecutor(4)
df = pd.DataFrame(executor.map(ping, park), columns=["address", "state"]) Output: address state
0 www.wallmart.com 0
1 www.walmart.com\n 1
Posts: 170
Threads: 43
Joined: May 2019
lol i keep looking at your example and only see the clear different in the string values..
But in my plain text file with all the ips, nothing is in quotes
im guessing that this line is part of the problem
park = file.readlines()
import time
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
def ping(ip):
return (
ip,
subprocess.run(
f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
).returncode,
)
start = time.time()
with open("ip_list.txt") as file:
park = file.readlines()
executor = ThreadPoolExecutor(5)
df = pd.DataFrame(executor.map(ping, park))
df.to_csv(r'ip_output.csv',header=False, index=False, quoting=None)
print(df) the above print(df) returns this output which you can see its the output.. but thats not the input
0 www.google.com\n 1
1 www.abc13.com\n 1
2 www.yahoo.com\n 1
3 www.cnn.com\n 1
4 www.walmart.com 0 my text file im reading in to run against is plain text, 1 ip per line, i just checked and the raw data file has no quotes and no commas and no extra spaces between the numbers...
I do understand that im over looking something, but its clearly not obvious to me. Im not asking for the answer, just explanation on what to look for or area to focus on.
Posts: 7,306
Threads: 122
Joined: Sep 2016
readlines() splits it into separate lines,but also add new line \n .
Change to this.
with open("ip_list.txt") as file:
park = [ip.strip() for ip in file] Or in your first post you try to scan a folder for ip_list text files,then it can be done something like this.
import time
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
import os
def ping(ip):
return (
ip,
subprocess.run(
f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
).returncode,
)
start = time.time()
def scan_files():
directory = '.'
for entry in os.scandir(directory):
if entry.is_file() and entry.name.endswith('.txt'):
if 'ip_list' in entry.name:
pt = directory + '/' + entry.name
with open(pt) as file:
for ip in file:
yield ip.strip()
executor = ThreadPoolExecutor(5)
df = pd.DataFrame(executor.map(ping, list(scan_files())))
df.to_csv(r'ip_output.csv',header=False, index=False, quoting=None)
print(df)
Posts: 6,767
Threads: 20
Joined: Feb 2020
I guess I should apologize for this:
Quote:You need an iterable. Any iterable will do.
My new advice is "Any iterable that does not add extra characters to the address strings will do."
Posts: 170
Threads: 43
Joined: May 2019
lol i really appreciate all the help and examples. i was reading that the strip does add extra time to the processing.. so wont be able to do a real test until monday.. but based on your examples and suggestions, will the strip() be a huge impact on 4000k ip's?
i kinda like that fact that earlier test runs this past week yielded processing time of just under 60 seconds for 4000 records.
Ill be testing all options today and once i get in on monday with the true list..
thank you again for the guidance.
Posts: 170
Threads: 43
Joined: May 2019
What options are there to consider that may help? Would changing the data source help in anyway? Maybe possible to clean the data AFTER its written to the csv?
This is all new project, so if the data source needs to change i can do that as long as it helps streamline the overall process.
Posts: 7,306
Threads: 122
Joined: Sep 2016
Aug-12-2023, 05:03 PM
(This post was last modified: Aug-12-2023, 05:04 PM by snippsat.)
(Aug-12-2023, 02:28 PM)cubangt Wrote: i was reading that the strip does add extra time to the processing.. so wont be able to do a real test until monday.. but based on your examples and suggestions, will the strip() be a huge impact on 4000k ip's? Reading ip_list.txt .strip() and put in park list will use very little/none of time overall.
What use the time is the ping network(TCP/IP) echo request call f"ping {ip} -n 1"
So now is echo request call set to the minimum of 1 and using asynchronous execution ThreadPoolExecutor bye deanhystad,this is what make this fast a approach.
import time
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
def ping(ip):
return (
ip,
subprocess.run(
f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
).returncode,
)
with open("ip_list.txt") as file:
# Testing call of 5000 ip's
park = [ip.strip() for ip in file] * 1000
executor = ThreadPoolExecutor(12)
df = pd.DataFrame(executor.map(ping, park))
df.to_csv(r'ip_output.csv',header=False, index=False, quoting=None)
print(df) G:\div_code\egg\ping
λ ptime python ping_th.py
=== python ping_th.py ===
0 1
0 python-forum.io 0
1 youtube.com 0
2 youtube.com99 1
3 www.vg.no 0
4 python-forum.io99 1
... ... ..
4995 python-forum.io 0
4996 youtube.com 0
4997 youtube.com99 1
4998 www.vg.no 0
4999 python-forum.io99 1
[5000 rows x 2 columns]
Execution time: 215.674 s So for me around 3,5 to 5 minutes to test 5000 ip's.
This will vary because of network speed,if i turn on my VPN it will take over 10-min.
Posts: 6,767
Threads: 20
Joined: Feb 2020
Aug-12-2023, 08:09 PM
(This post was last modified: Aug-12-2023, 08:10 PM by deanhystad.)
Calling strip 4000 times may rake 1/100th of a second.
I take that back. I did an experiment where I read in 40,000 lines and called strip() for each one. That took 0.011 seconds. Just over 1/100th of a second, but that also includes reading the file and is 40,000 lines instead of 4000. Using file.read().split() is faster. That took 0.0025 seconds to do the same thing.
You pick odd things to worry about.
Posts: 170
Threads: 43
Joined: May 2019
Just wanted to report back, after running number of test runs and playing around with the number of threads and code, here is the final script the works against 4000 IP's at a run time of 2.8Minutes to write the results.
I am grateful for the assistance and direction on how to accomplish this. Monday morning ill run this logic against the actual list of IP's on the office network to see what the runtime is.
import os
import time
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
def ping(ip):
return (
ip,
subprocess.run(
f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
).returncode,
)
def scan_files():
directory = '.'
for entry in os.scandir(directory):
if entry.is_file() and entry.name.endswith('.txt'):
if 'ip_list' in entry.name:
pt = directory + '/' + entry.name
with open(pt) as file:
for ip in file:
yield ip.strip()
start = time.time()
executor = ThreadPoolExecutor(125)
df = pd.DataFrame(executor.map(ping, list(scan_files())))
df.to_csv(r'ip_output.csv',header=False, index=False, quoting=None)
end = time.time()
print(end - start)
snippsat and Gribouillis like this post
|