Machine learning SQL injection detection - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Machine learning SQL injection detection (/thread-6605.html) |
Machine learning SQL injection detection - agocomp - Nov-30-2017 Good day, i am a post graduate student working on "Detecting and preventing SQL injection attack on a database using machine learning approach". My Major challenge right now is generating the dataset and how to write the appropriate code in Python, i will be highly grateful if you can help me out in any way you can, thanks alot. RE: Machine learning SQL injection detection - buran - Nov-30-2017 I moved the thread to News and Discussions sub-forum, because it looks more appropriate for general discussion on possible approach. My understanding is you don't have code/specific questions yet RE: Machine learning SQL injection detection - sparkz_alot - Nov-30-2017 First off, hope you are using the latest version of Python (3.6.3). You might use Python's builtin sqlite3 to create your test database. Once created, make a backup copy so you always have a pristine copy and one you can attack. I've seen people working with databases of thousands of entries, when really all you need is a minimal amount. In your case, probably 2-5 entries would be enough to initially test the actual code for injection/detection. Once satisfied, you can always increase the size of the database or even try it against other databases. If you run into problems, either with the database or the program, we are here to help. Be sure and read the section of our Help document on BBCode before you post your code, errors and output. SQL injection detection code - agocomp - Dec-04-2017 I have a code for url malicious detection, but i want this code rewritten for SQL injection detection, pls can any one in the house help. The code is here below, thanks import pandas as pd import numpy as np import random from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split urls_data = pd.read_csv("data.csv") type(urls_data) urls_data.head() def makeTokens(f): tkns_BySlash = str(f.encode('utf-8')).split('/') total_Tokens = [] for i in tkns_BySlash: tokens = str(i).split('-') tkns_ByDot = [] for j in range(0, len(tokens)): temp_Tokens = str(tokens[j]).split('.') tkns_ByDot = tkns_ByDot + temp_Tokens total_Tokens = total_Tokens + tokens + tkns_ByDot total_Tokens = list(set(total_Tokens)) if 'com' in total_Tokens: total_Tokens.remove('com') return total_Tokens y = urls_data["label"] url_list = urls_data["url"] vectorizer = TfidfVectorizer(tokenizer=makeTokens) x = vectorizer.fit_transform(url_list) x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42) logit = LogisticRegression() logit.fit(x_train, y_train) print ("Accuracy ", logit.score(x_test, y_test)) x_predict = ["http://www.psn.com.pk/", "google.com/search=faizanahmad", "www.radsport-voggel.de/wp-admin/includes/log.exe", "www.radsport-voggel.de/wp-admin/includes/an/log.exe", "www.google.com", "www.google-scholar.com/wp-good"] x_predict = vectorizer.transform(x_predict) New_predict = logit.predict(x_predict) print(New_predict) |