Python Forum
how to edit data frames and convert to a list(pandas, read_html()) ? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: how to edit data frames and convert to a list(pandas, read_html()) ? (/thread-7904.html)



how to edit data frames and convert to a list(pandas, read_html()) ? - donvirte - Jan-29-2018

I used library pandas, read_html() to import a table from a webpage.
I want to insert values from table read_html in ms msl table
but for this I must edit table read_html and convert to list.
This is difficult to do because .read_html() produces a list of dataframes.


my python code:
import requests
import pandas as pd
r = requests.get('URL')
pd.set_option('max_rows',10000) 
df = pd.read_html(r.content)
print(df)
result print(df) - dataframes:
Output:
[ 0 1 2 3 0 Number Name Plan NaN 1 NaN NaN not(selected) NaN 2 53494580 + (53)494580 NP_551 NaN 3 53494581 + (53)494581 NP_551 NaN 4 53494582 + (53)494582 NP_551 NaN 5 55110000 + (53)494583 NP_551 NaN]
I would like the following results to be written to the ms msl table:
Output:
[['1','NaN','NaN','not(selected)','NaN'], ['2','53494580','+ (53)494580','NP_551','NaN'], ['3','53494581','+ (53)494581','NP_551','NaN'], ['4','53494582','+ (53)494582','NP_551','NaN'], ['5','55110000','+ (53)494583','NP_551','NaN]']
how to edit data frames and convert to a list?
I would be grateful for any help.


RE: how to edit data frames and convert to a list(pandas, read_html()) ? - ka06059 - Feb-02-2018

we can use regex by importing re module
then find the string pattern
result = [ 0 1 2 3
0 Number Name Plan NaN
1 NaN NaN not(selected) NaN
2 53494580 + (53)494580 NP_551 NaN
3 53494581 + (53)494581 NP_551 NaN
4 53494582 + (53)494582 NP_551 NaN
5 55110000 + (53)494583 NP_551 NaN]

#make sure result is in string form

import re
convertedlist = re.compile('(\d) (\d*) (\+ \(\d*\)\d*) (NP_551) (NaN)').findall(result)
Output:
[('2', '53494580', '+ (53)494580', 'NP_551', 'NaN'), ('3', '53494581', '+ (53)494581', 'NP_551', 'NaN'), ('4', '53494582', '+ (53)494582', 'NP_551', 'NaN'), ('5', '55110000', '+ (53)494583', 'NP_551', 'NaN')]
convertedlist is in nested tuples instead of nested list. note that only last 4 items listed , the ['1','NaN','NaN','not(selected)','NaN'] left out due to unique patterns.


RE: how to edit data frames and convert to a list(pandas, read_html()) ? - buran - Feb-02-2018

if you want list of lists:
try
my_list = df.values.to_list()
or use BeautifulSoup to parse the site, not pandas.

@ka06059 - please, don't post regex solution for everything. in this case OP is using pandas and the package offers enough tools of its own


RE: how to edit data frames and convert to a list(pandas, read_html()) ? - snippsat - Feb-02-2018

Can skiprows and set name header.
Then use values.tolist() and insert header names.
Example:
>>> import pandas as pd
... from io import StringIO
...
... data = """\
... 1, 2, 3
... 3, 4, 6
... 6, 7, 8
... 9, 10, 11"""
...
... df = pd.read_csv(StringIO(data), skiprows=[0,1], names = ["foo", "bar", "spam"])

>>> df
   foo  bar  spam
0    6    7     8
1    9   10    11

>>> lst = df.values.tolist()
>>> first = list(df.columns)
>>> lst.insert(0, first)
>>> lst
[['foo', 'bar', 'spam'], [6, 7, 8], [9, 10, 11]]