Python Forum
Read CSV data into Pandas DataSet From Variable? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Read CSV data into Pandas DataSet From Variable? (/thread-8546.html)



Read CSV data into Pandas DataSet From Variable? - Oliver - Feb-25-2018

 url = "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
 the_names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
 dataset = pandas.read_csv(url, names=the_names)
Sure, the code above works with the standard Pandas "read_csv".

But, my issue is that I'm POSTing that csv data to a Flask service. The data comes in (as a variable) and I extract if from the Request dict, but I then can't seem to find a compatible method to load that data in a variable into the same pandas dataset.

I've tried read_clipboard, read_csv, read_table ... but they all error out.]

Do I need to do some kind of IO step?

Missing something easy, I'm sure, but did not see the answer online where eveyone seems to be reading the data from a disk file or from a URL directly.

Thanks in advance,


RE: Read CSV data into Pandas DataSet From Variable? - snippsat - Feb-26-2018

After version pandas 0.19.2 --> it can read directly from url.
You can mess with io.StringIO before,but you should really upgrade.
If use Anaconda:
conda update conda
conda update anaconda
>>> import pandas as pd
>>> pd.__version__
'0.20.3'
G:\Anaconda3
λ python -m ptpython
>>> import pandas as pd
...
... url = "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
... the_names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
... dataset = pd.read_csv(url, names=the_names)

>>> dataset
     sepal-length  sepal-width  petal-length  petal-width           class
0             5.1          3.5           1.4          0.2     Iris-setosa
1             4.9          3.0           1.4          0.2     Iris-setosa
2             4.7          3.2           1.3          0.2     Iris-setosa
3             4.6          3.1           1.5          0.2     Iris-setosa
4             5.0          3.6           1.4          0.2     Iris-setosa
5             5.4          3.9           1.7          0.4     Iris-setosa
6             4.6          3.4           1.4          0.3     Iris-setosa



RE: Read CSV data into Pandas DataSet From Variable? - Oliver - Feb-26-2018

(Feb-26-2018, 12:27 AM)snippsat Wrote: After version pandas 0.19.2 --> it can read directly from url.
You can mess with io.StringIO before,but you should really upgrade.
If use Anaconda:
conda update conda
conda update anaconda
>>> import pandas as pd
>>> pd.__version__
'0.20.3'
G:\Anaconda3
λ python -m ptpython
>>> import pandas as pd
...
... url = "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
... the_names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
... dataset = pd.read_csv(url, names=the_names)

>>> dataset
     sepal-length  sepal-width  petal-length  petal-width           class
0             5.1          3.5           1.4          0.2     Iris-setosa
1             4.9          3.0           1.4          0.2     Iris-setosa
2             4.7          3.2           1.3          0.2     Iris-setosa
3             4.6          3.1           1.5          0.2     Iris-setosa
4             5.0          3.6           1.4          0.2     Iris-setosa
5             5.4          3.9           1.7          0.4     Iris-setosa
6             4.6          3.4           1.4          0.3     Iris-setosa

Yes, as I said in my initial posting reading from a URL works fine.

That was not the problem I posted I am trying to solve. :)

My issue is that our application POSTs the data from another application to a Flask web service. I need a way to figure out how to get the POSTed data (in a variable) into the Pandas data set. From my original posting, I cannot find a compatible "read" method that can read a variable into a Pandas dataset.

So, how do you get CSV data, in a variable, (not in a URL, for example) into a Pandas dataset?

In the screenshot below, I tried to use the io.StringIO method, but that still throws 500 errors.

I also tried to just read in the data like pd.DataFrame(.....), but couldn't get the syntax correct.

Thanks,


RE: Read CSV data into Pandas DataSet From Variable? - snippsat - Feb-26-2018

To get get whole file in form of POST you use Uploading Files method.
So need a form tag in html enctype=multipart/form-data.
Here a way that get whole .csv file but can also iterate over from server.
Sample data.
Output:
5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa
from flask import Flask, make_response, request
import io
import csv

app = Flask(__name__)
def transform(text_file_contents):
    return text_file_contents.replace("=", ",")

@app.route('/')
def form():
    return """
        <html>
            <body>
                <h1>Transform a file demo</h1>
                <form action="/transform" method="post" enctype="multipart/form-data">
                    <input type="file" name="data_file" />
                    <input type="submit" />
                </form>
            </body>
        </html>
    """

@app.route('/transform', methods=["POST"])
def transform_view():
    f = request.files['data_file']
    if not f:
        return "No file"
    stream = io.StringIO(f.stream.read().decode("UTF8"), newline=None)
    csv_input = csv.reader(stream)
    for row in csv_input:
        print(row)

    stream.seek(0)
    result = transform(stream.read())
    response = make_response(result)
    response.headers["Content-Disposition"] = "attachment; filename=result.csv"
    return response

if __name__ == "__main__":
    app.run(debug=True)
So get the whole file in and here is print on server.
Output:
E:\1py_div\div_code\flask λ python app.py * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) * Restarting with stat * Debugger is active! * Debugger PIN: 184-514-049 ['5.1', '3.5', '1.4', '0.2', 'Iris-setosa'] ['4.9', '3.0', '1.4', '0.2', 'Iris-setosa'] ['4.7', '3.2', '1.3', '0.2', 'Iris-setosa'] ['4.6', '3.1', '1.5', '0.2', 'Iris-setosa'] 127.0.0.1 - - [26/Feb/2018 13:20:50] "POST /transform HTTP/1.1" 200 -



RE: Read CSV data into Pandas DataSet From Variable? - Oliver - Feb-26-2018

There must be a simple way to read csv "data" without writing an entire method like that. I'm a bit baffled why there isn't just a "pd.read(...type='csv',....)" method that will take "CSV", for example, as an argument, but work the same was as pd.read_csv().

This omission seems glaring to me, yet, again, I'm probably missing something.

thanks,


RE: Read CSV data into Pandas DataSet From Variable? - snippsat - Feb-27-2018

(Feb-26-2018, 12:48 PM)Oliver Wrote: There must be a simple way to read csv "data" without writing an entire method like that.
Have to follow the HTTP protocol and how framework dealing with files over net.
As you explain you want to send data as one variable(i guess this mean all content of csv?),the easiest way is to deal with it like file object.
Try to recreate data from requests.vaules will be difficult.

You know that code over give the whole file uploaded result.csv
So then can open it local with pd.read_csv('result.csv').
If sending back to a view could use tablib,which make a html table.
Example:
[Image: nLslPU.png]
from flask import Flask, make_response, request
import io, os
import csv
import tablib

app = Flask(__name__)
def transform(text_file_contents):
    return text_file_contents.replace("=", ",")

@app.route('/')
def form():
    return """
        <html>
            <body>
                <h1>Transfer a file demo</h1>
                <form action="/transform" method="post" enctype="multipart/form-data">
                    <input type="file" name="data_file" />
                    <input type="submit" />
                </form>
                <br>
                <a href="/read_cvs">Read csv</a>                
            </body>
        </html>
    """    

@app.route('/transform', methods=["POST"])
def transform_view():
    f = request.files['data_file']
    if not f:
        return "No file"
    stream = io.StringIO(f.stream.read().decode("UTF8"), newline=None)
    '''
    csv_input = csv.reader(stream)
    for row in csv_input:
        print(row)'''
    stream.seek(0)
    result = transform(stream.read())
    response = make_response(result)
    response.headers["Content-Disposition"] = "attachment; filename=result.csv"   
    return response

@app.route('/read_cvs', methods=["GET"])
def read_csv():
    dataset = tablib.Dataset()
    with open(os.path.join(os.path.dirname(__file__),'C:/Users/Tom/Downloads/result.csv')) as f:
        dataset.csv = f.read()
    return dataset.html    

if __name__ == "__main__":
    app.run(debug=True)
 



RE: Read CSV data into Pandas DataSet From Variable? - Oliver - Feb-27-2018

OK, I appreciate your help with this.

It really seems that instead of POSTing the data, the database should probably just export the CSV to a temporary disk path so the read_csv works easily.

Thanks again! :)


RE: Read CSV data into Pandas DataSet From Variable? - answerquest - Jul-05-2018

Hi, I'm using Tornado web server and have a similar situation. There's not much to do actually.. the file's contents come as a bytestring. Assuming you've gotten the contents into a variable 'file1' as OP mentioned,

df = pd.read_csv( io.BytesIO(file1) )
should do the job. Do import io at the top of your code.

As to how I got that file's contents into a variable, here's a shortened snippet and I'm guessing the structure should be similar in your framework.

On the HTML side:
<p><input type="file" name="file1"></p>

Python side:
import pandas as pd
import io

# skipping the tornado specific code...

class hydGTFS(tornado.web.RequestHandler):
	def post(self):
		print( self.request.files['file1'][0]['filename'] )

		df = pd.read_csv( io.BytesIO( self.request.files['file1'][0]['body']) )
		print(df.head())
		self.write('ok got it bro')
Output:
Corridor 1 Week days detail..csv Run Id Run Description Trip Id Regulation Period Group Line Id 0 49 4901 6144 Default {new group} 47 1 49 4901 6144 Default {new group} 47 2 49 4901 6144 Default {new group} 47 3 49 4901 6144 Default {new group} 47 4 49 4901 6141 Default {new group} 37