Python Forum
Running A Parser In VSCode - And Write The Results Into A Csv-File - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Running A Parser In VSCode - And Write The Results Into A Csv-File (/thread-31530.html)



Running A Parser In VSCode - And Write The Results Into A Csv-File - apollo - Dec-17-2020

hi there - good day dear python-experts.


running a parser in VSCode - and write the results into a csv-file
i ve got a tiny error on a

import requests
from bs4 import BeautifulSoup
import re
import csv
from tqdm import tqdm


first = "https://path ?page={}"
second = "https://path /{}_en"


def catch(url):
    with requests.Session() as req:
        pages = []
        print("Loading All IDS\n")
        for item in tqdm(range(0, 347)):
            r = req.get(url.format(item))
            soup = BeautifulSoup(r.content, 'html.parser')
            numbers = [item.get("href").split("/")[-1].split("_")[0] for item in soup.findAll(
                "a", href=re.compile("^path/"), class_="btn btn-default")]
            pages.append(numbers)
        return numbers


def parse(url):
    links = catch(first)
    with requests.Session() as req:
        with open("Data.csv", 'w', newline="", encoding="UTF-8") as f:
            writer = csv.writer(f)
            writer.writerow(["Name", "Address", "Site", "Phone",
                             "Description", "Scope", "Rec", "Send", "PIC", "OID", "Topic"])
            print("\nParsing Now... \n")
            for link in tqdm(links):
                r = req.get(url.format(link))
                soup = BeautifulSoup(r.content, 'html.parser')
                task = soup.find("section", class_="col-sm-12").contents
                name = task[1].text
                add = task[3].find(
                    "i", class_="fa fa-location-arrow fa-lg").parent.text.strip()
                try:
                    site = task[3].find("a", class_="link-default").get("href")
                except:
                    site = "N/A"
                try:
                    phone = task[3].find(
                        "i", class_="fa fa-phone").next_element.strip()
                except:
                    phone = "N/A"
                desc = task[3].find(
                    "h3", class_="eyp-project-heading underline").find_next("p").text
                scope = task[3].findAll("span", class_="pull-right")[1].text
                rec = task[3].select("tbody td")[1].text
                send = task[3].select("tbody td")[-1].text
                pic = task[3].select(
                    "span.vertical-space")[0].text.split(" ")[1]
                oid = task[3].select(
                    "span.vertical-space")[-1].text.split(" ")[1]
                topic = [item.next_element.strip() for item in task[3].select(
                    "i.fa.fa-check.fa-lg")]
                writer.writerow([name, add, site, phone, desc,
                                 scope, rec, send, pic, oid, "".join(topic)])


parse(second)
see the output -

python /home/martin/dev/vscode/euro.py
martin@mx:~
$ python /home/martin/dev/vscode/euro.py
Loading All IDS

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 347/347 [08:01<00:00,  1.39s/it]
Traceback (most recent call last):
  File "/home/martin/dev/vscode/euro.py", line 65, in <module>
    parse(second)
  File "/home/martin/dev/vscode/euro.py", line 29, in parse
    with open("Data.csv", 'w', newline="", encoding="UTF-8") as f:
TypeError: file() takes at most 3 arguments (4 given)
martin@mx:~
well i think that i have an error here

with open("Data.csv", 'w', newline="", encoding="UTF-8") as f:
i guess i need to have a closer look at the arguments here


RE: Running A Parser In VSCode - And Write The Results Into A Csv-File - bowlofred - Dec-17-2020

Looks like you might be using python3 options but you are running it under python2.

$ python3 -c 'open("in", "w", newline="", encoding="UTF-8")'
$ python2 -c 'open("in", "w", newline="", encoding="UTF-8")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: file() takes at most 3 arguments (4 given)



RE: Running A Parser In VSCode - And Write The Results Into A Csv-File - jefsummers - Dec-17-2020

Yes indeed. Drop the newline and UTF parameters and I bet it works fine.


RE: Running A Parser In VSCode - And Write The Results Into A Csv-File - apollo - Jan-14-2021

good day dear JeffSummers
many many thanks for the quick answer -great to hear from you . i did as you adviced but i guess that i have gotten some errors doings so..


i run the code like so

def parse(url):
    links = catch(first)
    with requests.Session() as req:
        with open("Data.csv", 'w') as f:
            writer = csv.writer(f)
            writer.writerow(["Name", "Address", "Site", "Phone",
                             "Description", "Scope", "Rec", "Send", "PIC", "OID", "Topic"])
            print("\nParsing Now... \n")
       
but now i have some issues:



martin@mx:~
$ python /home/martin/dev/vscode/euro.py
Loading All IDS

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.56s/it]

Parsing Now... 

  5%|█████▎                                                                                                    | 1/20 [00:02<00:40,  2.14s/it]
Traceback (most recent call last):
  File "/home/martin/dev/vscode/euro.py", line 65, in <module>
    parse(second)
  File "/home/martin/dev/vscode/euro.py", line 62, in parse
    scope, rec, send, pic, oid, "".join(topic)])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 9: ordinal not in range(128)
martin@mx:~
$ 
well i guess i have have a UnicodeEncodeError, seems that my system default encoding isn't utf-8,
therefor, should i do some extra thing to avoid issues here!?


should i try with df.to_csv("data.csv", index=False, encoding="utf-8")

but then it will not work again...

Smile


RE: Running A Parser In VSCode - And Write The Results Into A Csv-File - apollo - Jan-14-2021

hi again

update - well i run Python 2.7.1

u will install and update the system - to run with version 3xy

i hope that i will be successful - i guess that i can do this again

$ python3 -c 'open("in", "w", newline="", encoding="UTF-8")'
$ python2 -c 'open("in", "w", newline="", encoding="UTF-8")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: file() takes at most 3 arguments (4 given)
since i nee to take care for the decoding options..see the above mentioned issues

 a UnicodeEncodeError, seems that my system default encoding isn't utf-8,

therefor, should i do some extra thing to avoid issues here!?
should i try with df.to_csv("data.csv", index=False, encoding="utf-8")
step one: i will update the python to 3.xy
step two: i will add all the arguments - so that we have


$ python3 -c 'open("in", "w", newline="", encoding="UTF-8")'
$ python2 -c 'open("in", "w", newline="", encoding="UTF-8")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: file() takes at most 3 arguments (4 given)
look forward to hear from you


RE: Running A Parser In VSCode - And Write The Results Into A Csv-File - snippsat - Jan-14-2021

You should look at setup VS Code and how it work with Python.
It's not hard to see what version you use as it show it always down in left corner.
VS Code from start
Overview image of my setup with Python and Code Runner as the most important extensions.
[Image: vSxNpA.png]