Python Forum
how to edit data frames and convert to a list(pandas, read_html()) ?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to edit data frames and convert to a list(pandas, read_html()) ?
#1
I used library pandas, read_html() to import a table from a webpage.
I want to insert values from table read_html in ms msl table
but for this I must edit table read_html and convert to list.
This is difficult to do because .read_html() produces a list of dataframes.


my python code:
import requests
import pandas as pd
r = requests.get('URL')
pd.set_option('max_rows',10000) 
df = pd.read_html(r.content)
print(df)
result print(df) - dataframes:
Output:
[ 0 1 2 3 0 Number Name Plan NaN 1 NaN NaN not(selected) NaN 2 53494580 + (53)494580 NP_551 NaN 3 53494581 + (53)494581 NP_551 NaN 4 53494582 + (53)494582 NP_551 NaN 5 55110000 + (53)494583 NP_551 NaN]
I would like the following results to be written to the ms msl table:
Output:
[['1','NaN','NaN','not(selected)','NaN'], ['2','53494580','+ (53)494580','NP_551','NaN'], ['3','53494581','+ (53)494581','NP_551','NaN'], ['4','53494582','+ (53)494582','NP_551','NaN'], ['5','55110000','+ (53)494583','NP_551','NaN]']
how to edit data frames and convert to a list?
I would be grateful for any help.
Reply
#2
we can use regex by importing re module
then find the string pattern
result = [ 0 1 2 3
0 Number Name Plan NaN
1 NaN NaN not(selected) NaN
2 53494580 + (53)494580 NP_551 NaN
3 53494581 + (53)494581 NP_551 NaN
4 53494582 + (53)494582 NP_551 NaN
5 55110000 + (53)494583 NP_551 NaN]

#make sure result is in string form

import re
convertedlist = re.compile('(\d) (\d*) (\+ \(\d*\)\d*) (NP_551) (NaN)').findall(result)
Output:
[('2', '53494580', '+ (53)494580', 'NP_551', 'NaN'), ('3', '53494581', '+ (53)494581', 'NP_551', 'NaN'), ('4', '53494582', '+ (53)494582', 'NP_551', 'NaN'), ('5', '55110000', '+ (53)494583', 'NP_551', 'NaN')]
convertedlist is in nested tuples instead of nested list. note that only last 4 items listed , the ['1','NaN','NaN','not(selected)','NaN'] left out due to unique patterns.
swallow osama bin laden
Reply
#3
if you want list of lists:
try
my_list = df.values.to_list()
or use BeautifulSoup to parse the site, not pandas.

@ka06059 - please, don't post regex solution for everything. in this case OP is using pandas and the package offers enough tools of its own
Reply
#4
Can skiprows and set name header.
Then use values.tolist() and insert header names.
Example:
>>> import pandas as pd
... from io import StringIO
...
... data = """\
... 1, 2, 3
... 3, 4, 6
... 6, 7, 8
... 9, 10, 11"""
...
... df = pd.read_csv(StringIO(data), skiprows=[0,1], names = ["foo", "bar", "spam"])

>>> df
   foo  bar  spam
0    6    7     8
1    9   10    11

>>> lst = df.values.tolist()
>>> first = list(df.columns)
>>> lst.insert(0, first)
>>> lst
[['foo', 'bar', 'spam'], [6, 7, 8], [9, 10, 11]]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  convert set to a list type in python firaki12345 2 1,726 Feb-05-2021, 03:45 PM
Last Post: buran
  Pandas tuple list returning html string shansaran 0 1,699 Mar-23-2020, 08:44 PM
Last Post: shansaran
  How to use BeautifulSoup4 with pandas series type of html data? PrateekG 4 4,895 Apr-26-2018, 07:33 AM
Last Post: PrateekG
  convert list compression to for loop Prince_Bhatia 5 5,011 Oct-09-2017, 10:42 PM
Last Post: micseydel

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020