Python Forum

I used library pandas, read_html() to import a table from a webpage.
I want to insert values from table read_html in ms msl table
but for this I must edit table read_html and convert to list.
This is difficult to do because .read_html() produces a list of dataframes.

my python code:

import requests
import pandas as pd
r = requests.get('URL')
pd.set_option('max_rows',10000) 
df = pd.read_html(r.content)
print(df)

result print(df) - dataframes:

Output:[             0                     1              2   3
0        Number                  Name           Plan NaN
1          NaN                   NaN   not(selected) NaN
2     53494580          + (53)494580         NP_551 NaN
3     53494581          + (53)494581         NP_551 NaN
4     53494582          + (53)494582         NP_551 NaN
5     55110000          + (53)494583         NP_551 NaN]

I would like the following results to be written to the ms msl table:

Output:[['1','NaN','NaN','not(selected)','NaN'],
['2','53494580','+ (53)494580','NP_551','NaN'],
['3','53494581','+ (53)494581','NP_551','NaN'],
['4','53494582','+ (53)494582','NP_551','NaN'],
['5','55110000','+ (53)494583','NP_551','NaN]']

how to edit data frames and convert to a list?
I would be grateful for any help.

we can use regex by importing re module
then find the string pattern

result = [ 0 1 2 3
0 Number Name Plan NaN
1 NaN NaN not(selected) NaN
2 53494580 + (53)494580 NP_551 NaN
3 53494581 + (53)494581 NP_551 NaN
4 53494582 + (53)494582 NP_551 NaN
5 55110000 + (53)494583 NP_551 NaN]

#make sure result is in string form

import re
convertedlist = re.compile('(\d) (\d*) (\+ \(\d*\)\d*) (NP_551) (NaN)').findall(result)

Output:[('2', '53494580', '+ (53)494580', 'NP_551', 'NaN'), 
('3', '53494581', '+ (53)494581', 'NP_551', 'NaN'), 
('4', '53494582', '+ (53)494582', 'NP_551', 'NaN'), 
('5', '55110000', '+ (53)494583', 'NP_551', 'NaN')]

convertedlist is in nested tuples instead of nested list. note that only last 4 items listed , the ['1','NaN','NaN','not(selected)','NaN'] left out due to unique patterns.

if you want list of lists:
try

my_list = df.values.to_list()

or use BeautifulSoup to parse the site, not pandas.

@ka06059 - please, don't post regex solution for everything. in this case OP is using pandas and the package offers enough tools of its own

Can skiprows and set name header.
Then use values.tolist() and insert header names.
Example:

>>> import pandas as pd
... from io import StringIO
...
... data = """\
... 1, 2, 3
... 3, 4, 6
... 6, 7, 8
... 9, 10, 11"""
...
... df = pd.read_csv(StringIO(data), skiprows=[0,1], names = ["foo", "bar", "spam"])

>>> df
   foo  bar  spam
0    6    7     8
1    9   10    11

>>> lst = df.values.tolist()
>>> first = list(df.columns)
>>> lst.insert(0, first)
>>> lst
[['foo', 'bar', 'spam'], [6, 7, 8], [9, 10, 11]]

donvirte

ka06059

buran

snippsat