writing data to a csv-file - Printable Version

writing data to a csv-file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: writing data to a csv-file (/thread-28071.html)

writing data to a csv-file - apollo - Jul-03-2020

Hi there - good day dear python-experts,

i am just trying to get more python-skills.

today i am working on the csv-saving of data.

while i am scraping unstructured data from a website using BeautifulSoup to pull out the data-chunks that I need. Afterwards i want to structure the dataset. I am aiming this coz i think it is best to added the values to a list. And then - afterwards i want to add them to a csv file.

I plan to do this so - with this method, bcause the list gets new values every time the the loop is called.

there is a loop: it is aimed to add permanently new values to the file so that my csv file has values from each turn in the loop.

(the fetching part  that  i arrange with requests )


for i in range(1, 100):
    url = "https://my-website.com/webid={}".format(i)
    s = session.get(url, headers=headers, cookies=cookies)

    soup = bs(s.text, 'html.parser')
    data = soup.find_all('td') 
    t = soup.find_all('td')
    a = t[0]
    b = t[1]
    c = t[2]
    d = t[3]
    e = t[4]
    f = t[5]
    g = t[4]
    info = [a, b, c, d, e, f, g and so on and so forth] <-  [this is my list]
    print(info)

df = pd.DataFrame(info) 				<-  [here  i am writing the stuff to the csv-file]
df.to_csv('a.csv', mode='a', index=False, header=False)

can this be done like so!?

regards yours apollo Smile

RE: writing data to a csv-file - DeaD_EyE - Jul-03-2020

Code like this:

    a = t[0]
    b = t[1]
    c = t[2]
    d = t[3]
    e = t[4]
    f = t[5]
    g = t[4]

Code like this is an indication that the design is not correct. Use data structures to represent your data.
You're assigning elements from a list to names and then you create from this a new list.

Your data is 2-dimensional. The first dimension is the index (rows) and the second dimension are the columns (td-data).

Create an empty list, which is later your whole dataset.
For each tag you need the text or an attribute. Putting a whole tag object into pandas will not work.

td_results = []
for i in range(1, 100):
    url = "https://my-website.com/webid={}".format(i)
    s = session.get(url, headers=headers, cookies=cookies)
 
    soup = bs(s.text, 'html.parser')
    data = soup.find_all('td') 
    td_results.append(column.text for column in soup.find_all('td')) # <- this here is the critical part
    # he could find something or not
    # and the amount of td elements can be different


print(td_results)
df = pdDataFrame(td_results)

So if you know all pages do have the same structure and you know for example, that you need the first 10 element, then you can use the subscription method.

Example to get the first 10 elements:

td_results.append(column.text for column in soup.find_all('td')[:10])