Python Forum
Running A Parser In VSCode - And Write The Results Into A Csv-File
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Running A Parser In VSCode - And Write The Results Into A Csv-File
#1
hi there - good day dear python-experts.


running a parser in VSCode - and write the results into a csv-file
i ve got a tiny error on a

import requests
from bs4 import BeautifulSoup
import re
import csv
from tqdm import tqdm


first = "https://path ?page={}"
second = "https://path /{}_en"


def catch(url):
    with requests.Session() as req:
        pages = []
        print("Loading All IDS\n")
        for item in tqdm(range(0, 347)):
            r = req.get(url.format(item))
            soup = BeautifulSoup(r.content, 'html.parser')
            numbers = [item.get("href").split("/")[-1].split("_")[0] for item in soup.findAll(
                "a", href=re.compile("^path/"), class_="btn btn-default")]
            pages.append(numbers)
        return numbers


def parse(url):
    links = catch(first)
    with requests.Session() as req:
        with open("Data.csv", 'w', newline="", encoding="UTF-8") as f:
            writer = csv.writer(f)
            writer.writerow(["Name", "Address", "Site", "Phone",
                             "Description", "Scope", "Rec", "Send", "PIC", "OID", "Topic"])
            print("\nParsing Now... \n")
            for link in tqdm(links):
                r = req.get(url.format(link))
                soup = BeautifulSoup(r.content, 'html.parser')
                task = soup.find("section", class_="col-sm-12").contents
                name = task[1].text
                add = task[3].find(
                    "i", class_="fa fa-location-arrow fa-lg").parent.text.strip()
                try:
                    site = task[3].find("a", class_="link-default").get("href")
                except:
                    site = "N/A"
                try:
                    phone = task[3].find(
                        "i", class_="fa fa-phone").next_element.strip()
                except:
                    phone = "N/A"
                desc = task[3].find(
                    "h3", class_="eyp-project-heading underline").find_next("p").text
                scope = task[3].findAll("span", class_="pull-right")[1].text
                rec = task[3].select("tbody td")[1].text
                send = task[3].select("tbody td")[-1].text
                pic = task[3].select(
                    "span.vertical-space")[0].text.split(" ")[1]
                oid = task[3].select(
                    "span.vertical-space")[-1].text.split(" ")[1]
                topic = [item.next_element.strip() for item in task[3].select(
                    "i.fa.fa-check.fa-lg")]
                writer.writerow([name, add, site, phone, desc,
                                 scope, rec, send, pic, oid, "".join(topic)])


parse(second)
see the output -

python /home/martin/dev/vscode/euro.py
[email protected]:~
$ python /home/martin/dev/vscode/euro.py
Loading All IDS

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 347/347 [08:01<00:00,  1.39s/it]
Traceback (most recent call last):
  File "/home/martin/dev/vscode/euro.py", line 65, in <module>
    parse(second)
  File "/home/martin/dev/vscode/euro.py", line 29, in parse
    with open("Data.csv", 'w', newline="", encoding="UTF-8") as f:
TypeError: file() takes at most 3 arguments (4 given)
[email protected]:~
well i think that i have an error here

with open("Data.csv", 'w', newline="", encoding="UTF-8") as f:
i guess i need to have a closer look at the arguments here
Reply
#2
Looks like you might be using python3 options but you are running it under python2.

$ python3 -c 'open("in", "w", newline="", encoding="UTF-8")'
$ python2 -c 'open("in", "w", newline="", encoding="UTF-8")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: file() takes at most 3 arguments (4 given)
Reply
#3
Yes indeed. Drop the newline and UTF parameters and I bet it works fine.
Reply
#4
good day dear JeffSummers
many many thanks for the quick answer -great to hear from you . i did as you adviced but i guess that i have gotten some errors doings so..


i run the code like so

def parse(url):
    links = catch(first)
    with requests.Session() as req:
        with open("Data.csv", 'w') as f:
            writer = csv.writer(f)
            writer.writerow(["Name", "Address", "Site", "Phone",
                             "Description", "Scope", "Rec", "Send", "PIC", "OID", "Topic"])
            print("\nParsing Now... \n")
       
but now i have some issues:



[email protected]:~
$ python /home/martin/dev/vscode/euro.py
Loading All IDS

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.56s/it]

Parsing Now... 

  5%|█████▎                                                                                                    | 1/20 [00:02<00:40,  2.14s/it]
Traceback (most recent call last):
  File "/home/martin/dev/vscode/euro.py", line 65, in <module>
    parse(second)
  File "/home/martin/dev/vscode/euro.py", line 62, in parse
    scope, rec, send, pic, oid, "".join(topic)])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 9: ordinal not in range(128)
[email protected]:~
$ 
well i guess i have have a UnicodeEncodeError, seems that my system default encoding isn't utf-8,
therefor, should i do some extra thing to avoid issues here!?


should i try with df.to_csv("data.csv", index=False, encoding="utf-8")

but then it will not work again...

Smile
Reply
#5
hi again

update - well i run Python 2.7.1

u will install and update the system - to run with version 3xy

i hope that i will be successful - i guess that i can do this again

$ python3 -c 'open("in", "w", newline="", encoding="UTF-8")'
$ python2 -c 'open("in", "w", newline="", encoding="UTF-8")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: file() takes at most 3 arguments (4 given)
since i nee to take care for the decoding options..see the above mentioned issues

 a UnicodeEncodeError, seems that my system default encoding isn't utf-8,

therefor, should i do some extra thing to avoid issues here!?
should i try with df.to_csv("data.csv", index=False, encoding="utf-8")
step one: i will update the python to 3.xy
step two: i will add all the arguments - so that we have


$ python3 -c 'open("in", "w", newline="", encoding="UTF-8")'
$ python2 -c 'open("in", "w", newline="", encoding="UTF-8")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: file() takes at most 3 arguments (4 given)
look forward to hear from you
Reply
#6
You should look at setup VS Code and how it work with Python.
It's not hard to see what version you use as it show it always down in left corner.
VS Code from start
Overview image of my setup with Python and Code Runner as the most important extensions.
[Image: vSxNpA.png]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Read and write active Excel file euras 4 359 Jun-29-2021, 11:16 PM
Last Post: Pedroski55
  Read file, reformat and write new file bryanmartin113 1 337 Jun-08-2021, 09:27 PM
Last Post: Larz60+
  How to save some results in .txt file with Python? Melcu54 4 578 May-26-2021, 08:15 AM
Last Post: snippsat
  write mariadb table rows query to each file? shams 1 573 Feb-02-2021, 04:10 PM
Last Post: buran
  Read/Write binary file deanhystad 3 726 Feb-01-2021, 10:29 AM
Last Post: Larz60+
  Writing unit test results into a text file ateestructural 3 720 Nov-15-2020, 05:41 PM
Last Post: ateestructural
  capture pytest results to a file maiya 2 771 Oct-17-2020, 03:42 AM
Last Post: maiya
  How do I write a single 8-bit byte to a file? MysticLord 2 869 Sep-03-2020, 12:27 PM
Last Post: MysticLord
  Search Results Web results Printing the number of days in a given month and year afefDXCTN 1 621 Aug-21-2020, 12:20 PM
Last Post: DeaD_EyE
  unable to write to log file Mekala 2 707 Aug-12-2020, 11:02 PM
Last Post: Mekala

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020