how to edit data frames and convert to a list(pandas, read_html()) ? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: how to edit data frames and convert to a list(pandas, read_html()) ? (/thread-7904.html) |
how to edit data frames and convert to a list(pandas, read_html()) ? - donvirte - Jan-29-2018 I used library pandas, read_html() to import a table from a webpage. I want to insert values from table read_html in ms msl table but for this I must edit table read_html and convert to list. This is difficult to do because .read_html() produces a list of dataframes. my python code: import requests import pandas as pd r = requests.get('URL') pd.set_option('max_rows',10000) df = pd.read_html(r.content) print(df)result print(df) - dataframes: I would like the following results to be written to the ms msl table: how to edit data frames and convert to a list?I would be grateful for any help. RE: how to edit data frames and convert to a list(pandas, read_html()) ? - ka06059 - Feb-02-2018 we can use regex by importing re module then find the string pattern result = [ 0 1 2 3 0 Number Name Plan NaN 1 NaN NaN not(selected) NaN 2 53494580 + (53)494580 NP_551 NaN 3 53494581 + (53)494581 NP_551 NaN 4 53494582 + (53)494582 NP_551 NaN 5 55110000 + (53)494583 NP_551 NaN] #make sure result is in string form import re convertedlist = re.compile('(\d) (\d*) (\+ \(\d*\)\d*) (NP_551) (NaN)').findall(result) convertedlist is in nested tuples instead of nested list. note that only last 4 items listed , the ['1','NaN','NaN','not(selected)','NaN'] left out due to unique patterns.
RE: how to edit data frames and convert to a list(pandas, read_html()) ? - buran - Feb-02-2018 if you want list of lists: try my_list = df.values.to_list()or use BeautifulSoup to parse the site, not pandas. @ka06059 - please, don't post regex solution for everything. in this case OP is using pandas and the package offers enough tools of its own RE: how to edit data frames and convert to a list(pandas, read_html()) ? - snippsat - Feb-02-2018 Can skiprows and set name header.Then use values.tolist() and insert header names.Example: >>> import pandas as pd ... from io import StringIO ... ... data = """\ ... 1, 2, 3 ... 3, 4, 6 ... 6, 7, 8 ... 9, 10, 11""" ... ... df = pd.read_csv(StringIO(data), skiprows=[0,1], names = ["foo", "bar", "spam"]) >>> df foo bar spam 0 6 7 8 1 9 10 11 >>> lst = df.values.tolist() >>> first = list(df.columns) >>> lst.insert(0, first) >>> lst [['foo', 'bar', 'spam'], [6, 7, 8], [9, 10, 11]] |