Python Forum
Running A Parser In VSCode - And Write The Results Into A Csv-File
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Running A Parser In VSCode - And Write The Results Into A Csv-File
#1
hi there - good day dear python-experts.


running a parser in VSCode - and write the results into a csv-file
i ve got a tiny error on a

import requests
from bs4 import BeautifulSoup
import re
import csv
from tqdm import tqdm


first = "https://path ?page={}"
second = "https://path /{}_en"


def catch(url):
    with requests.Session() as req:
        pages = []
        print("Loading All IDS\n")
        for item in tqdm(range(0, 347)):
            r = req.get(url.format(item))
            soup = BeautifulSoup(r.content, 'html.parser')
            numbers = [item.get("href").split("/")[-1].split("_")[0] for item in soup.findAll(
                "a", href=re.compile("^path/"), class_="btn btn-default")]
            pages.append(numbers)
        return numbers


def parse(url):
    links = catch(first)
    with requests.Session() as req:
        with open("Data.csv", 'w', newline="", encoding="UTF-8") as f:
            writer = csv.writer(f)
            writer.writerow(["Name", "Address", "Site", "Phone",
                             "Description", "Scope", "Rec", "Send", "PIC", "OID", "Topic"])
            print("\nParsing Now... \n")
            for link in tqdm(links):
                r = req.get(url.format(link))
                soup = BeautifulSoup(r.content, 'html.parser')
                task = soup.find("section", class_="col-sm-12").contents
                name = task[1].text
                add = task[3].find(
                    "i", class_="fa fa-location-arrow fa-lg").parent.text.strip()
                try:
                    site = task[3].find("a", class_="link-default").get("href")
                except:
                    site = "N/A"
                try:
                    phone = task[3].find(
                        "i", class_="fa fa-phone").next_element.strip()
                except:
                    phone = "N/A"
                desc = task[3].find(
                    "h3", class_="eyp-project-heading underline").find_next("p").text
                scope = task[3].findAll("span", class_="pull-right")[1].text
                rec = task[3].select("tbody td")[1].text
                send = task[3].select("tbody td")[-1].text
                pic = task[3].select(
                    "span.vertical-space")[0].text.split(" ")[1]
                oid = task[3].select(
                    "span.vertical-space")[-1].text.split(" ")[1]
                topic = [item.next_element.strip() for item in task[3].select(
                    "i.fa.fa-check.fa-lg")]
                writer.writerow([name, add, site, phone, desc,
                                 scope, rec, send, pic, oid, "".join(topic)])


parse(second)
see the output -

python /home/martin/dev/vscode/euro.py
martin@mx:~
$ python /home/martin/dev/vscode/euro.py
Loading All IDS

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 347/347 [08:01<00:00,  1.39s/it]
Traceback (most recent call last):
  File "/home/martin/dev/vscode/euro.py", line 65, in <module>
    parse(second)
  File "/home/martin/dev/vscode/euro.py", line 29, in parse
    with open("Data.csv", 'w', newline="", encoding="UTF-8") as f:
TypeError: file() takes at most 3 arguments (4 given)
martin@mx:~
well i think that i have an error here

with open("Data.csv", 'w', newline="", encoding="UTF-8") as f:
i guess i need to have a closer look at the arguments here
Reply
#2
Looks like you might be using python3 options but you are running it under python2.

$ python3 -c 'open("in", "w", newline="", encoding="UTF-8")'
$ python2 -c 'open("in", "w", newline="", encoding="UTF-8")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: file() takes at most 3 arguments (4 given)
Reply
#3
Yes indeed. Drop the newline and UTF parameters and I bet it works fine.
Reply
#4
good day dear JeffSummers
many many thanks for the quick answer -great to hear from you . i did as you adviced but i guess that i have gotten some errors doings so..


i run the code like so

def parse(url):
    links = catch(first)
    with requests.Session() as req:
        with open("Data.csv", 'w') as f:
            writer = csv.writer(f)
            writer.writerow(["Name", "Address", "Site", "Phone",
                             "Description", "Scope", "Rec", "Send", "PIC", "OID", "Topic"])
            print("\nParsing Now... \n")
       
but now i have some issues:



martin@mx:~
$ python /home/martin/dev/vscode/euro.py
Loading All IDS

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.56s/it]

Parsing Now... 

  5%|█████▎                                                                                                    | 1/20 [00:02<00:40,  2.14s/it]
Traceback (most recent call last):
  File "/home/martin/dev/vscode/euro.py", line 65, in <module>
    parse(second)
  File "/home/martin/dev/vscode/euro.py", line 62, in parse
    scope, rec, send, pic, oid, "".join(topic)])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 9: ordinal not in range(128)
martin@mx:~
$ 
well i guess i have have a UnicodeEncodeError, seems that my system default encoding isn't utf-8,
therefor, should i do some extra thing to avoid issues here!?


should i try with df.to_csv("data.csv", index=False, encoding="utf-8")

but then it will not work again...

Smile
Reply
#5
hi again

update - well i run Python 2.7.1

u will install and update the system - to run with version 3xy

i hope that i will be successful - i guess that i can do this again

$ python3 -c 'open("in", "w", newline="", encoding="UTF-8")'
$ python2 -c 'open("in", "w", newline="", encoding="UTF-8")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: file() takes at most 3 arguments (4 given)
since i nee to take care for the decoding options..see the above mentioned issues

 a UnicodeEncodeError, seems that my system default encoding isn't utf-8,

therefor, should i do some extra thing to avoid issues here!?
should i try with df.to_csv("data.csv", index=False, encoding="utf-8")
step one: i will update the python to 3.xy
step two: i will add all the arguments - so that we have


$ python3 -c 'open("in", "w", newline="", encoding="UTF-8")'
$ python2 -c 'open("in", "w", newline="", encoding="UTF-8")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: file() takes at most 3 arguments (4 given)
look forward to hear from you
Reply
#6
You should look at setup VS Code and how it work with Python.
It's not hard to see what version you use as it show it always down in left corner.
VS Code from start
Overview image of my setup with Python and Code Runner as the most important extensions.
[Image: vSxNpA.png]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Last record in file doesn't write to newline gonksoup 3 364 Jan-22-2024, 12:56 PM
Last Post: deanhystad
  writing and running code in vscode without saving it akbarza 1 344 Jan-11-2024, 02:59 PM
Last Post: deanhystad
  write to csv file problem jacksfrustration 11 1,367 Nov-09-2023, 01:56 PM
Last Post: deanhystad
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,303 Nov-09-2023, 10:56 AM
Last Post: mg24
  Updating sharepoint excel file odd results cubangt 1 752 Nov-03-2023, 05:13 PM
Last Post: noisefloor
  error "cannot identify image file" part way through running hatflyer 0 612 Nov-02-2023, 11:45 PM
Last Post: hatflyer
  How do I read and write a binary file in Python? blackears 6 5,999 Jun-06-2023, 06:37 PM
Last Post: rajeshgk
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,046 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Read text file, modify it then write back Pavel_47 5 1,498 Feb-18-2023, 02:49 PM
Last Post: deanhystad
  how to read txt file, and write into excel with multiply sheet jacklee26 14 9,497 Jan-21-2023, 06:57 AM
Last Post: jacklee26

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020