Python Forum
Machine Learning Antivirus [Urgent] - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Homework (https://python-forum.io/forum-9.html)
+--- Thread: Machine Learning Antivirus [Urgent] (/thread-4169.html)



Machine Learning Antivirus [Urgent] - Echoo0o - Jul-27-2017

Bit of a long one, so apologies in advance. I'm at the last stage of my dissertation, and all that's left for me is to use a neural network I co-developed with my supervisor to scan PE files to check whether they are infected or not.

My current to-do list, (not including testing and report write-up):
1. Generate a list of known system calls from a .txt file
2. Scan a drive/directory for PE files
3. Use said scan results to extract system calls from those PE files that were detected.
4. Generate a list of system calls the file makes, removing those which are not on the master list
5. The two lists then need to be compared, with system calls that appear on both lists generating a '1', and system calls that are only on the master list to appear as a '0'.
6. The list then needs to run through the neural network, (currently an .rda file, still need to convert it to .pmml)
7. The end result is the file being flagged as a virus, or not.

I'm not really a Python programmer, (or a programmer as such), hence me asking. Where should I start with all this? Should I be using lists, or try to generate a dictionary; and which libraries should I use for these tasks? From my understanding, the code won't exactly be long, but I genuinely have no clue how to approach this, or even how to begin.

I was given a small section of code to use for extracting the system calls:

import pefile
import sys

value = sys.argv[1]

pe = pefile.PE(value)

for entry in pe.DIRECTORY_ENTRY_IMPORT:
    for imp in entry.imports:
        print(hex(imp.address), imp.name)
It's supposed to extract system calls from PE files, though I have no idea how to get it to work. As mentioned before, any advice/help would be greatly appreciated, as the deadline, (17th August), is closing in, and I would hate to lose months of work over a tiny bit of code.


RE: Machine Learning Antivirus [Urgent] - nilamo - Jul-27-2017

Just by looking at this, isn't imp.name already the name of the system call that's used in the PE file?  What output do you currently get?


RE: Machine Learning Antivirus [Urgent] - Echoo0o - Jul-28-2017

(Jul-27-2017, 07:08 PM)nilamo Wrote: Just by looking at this, isn't imp.name already the name of the system call that's used in the PE file?  What output do you currently get?

My current issue is that the code snippet doesn't give any output, it just gives the following error:

Error:
Traceback (most recent call last): File "C:/.../extract_syscalls.py", line 4, in <module> value = sys.argv[1] IndexError: list index out of range
I'm not entirely sure how to fix this issue. Should I not be using argv?


RE: Machine Learning Antivirus [Urgent] - buran - Jul-28-2017

This error means that you run the script without supplying any command line arguments. i.e. it expects you to supply command line argument when you run the script like this
c:/>python extract_syscalls.py <value>
of course you need to replace <value> with the actual one you want to use in your script (I don't know what it is)

Most probably you run the script from IDE. In more advanced IDEs you can setup CLI arguments for the run/test purposes


RE: Machine Learning Antivirus [Urgent] - nilamo - Jul-28-2017

It looks like with what you have so far, the argument your script is expecting is the name of a pe file.  So you'd call it like python extract_syscalls.py some_lib.dll, though it also looks like it'd need to be in the same folder as your script, unless you passed a fully qualified path.  Which is fine for testing, right?