Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
what is wrong with my code?
#1
Hello;
I am learning python and tried the following code which is posted here,
the code should work to scrape details of given video URL in CSV file,
the video URLs are in a CSV file, tab-delimited, it has a header (urls), for testing purposes I put only 4 youtube URLs in the CSV file.
when I run the following code on my machine ( Windows 7 64bit, running python 3.8, using visual studio code), I get no results, no error either, it supposed to export the result in a CSV file, no CSV file is created either.
I think the indentation is correct.
Does anyone have an idea why it does not work?
I am grateful for your help.
Here is the code:

from fake_useragent import UserAgent
from bs4 import BeautifulSoup
import requests
import pandas as pd
import time
import re
import numpy as np


def get_youtube_info(url, ua, crawl_delay):
    header = {"user-agent": ua.random}
    request = requests.get(url, headers=header, verify=True)
    soup = BeautifulSoup(request.content, "html.parser")
    tags = soup.find_all("meta", property="og:video:tag")
    titles = soup.find("title").text

    try:
        getdesc = re.search('description":{"simpleText":".*"', request.text)
        desc = getdesc.group(0)
        desc = desc.replace('description":{"simpleText":"', "")
        desc = desc.replace('"', "")
        desc = desc.replace("\n", "")
    except:
        desc = "n/a"

    getdate = re.search("[a-zA-z]{3}\s[0-9]{1,2},\s[0-9]{4}", request.text)
    vid_date = getdate.group(0)

    return tags, titles, vid_date, desc


def tag_matches(desc, vid_tag_list):
    vid_tag_list = vid_tag_list.split(",")
    matches = ""

    for x in vid_tag_list:
        if desc.find(x) != -1:
            matches += x + ","

            return matches

        df = pd.read_csv("video-urls-4.csv")
        urls_list = df["urls"].to_list()

        ua = UserAgent()
        delays = [*range(10, 22, 1)]

        df2 = pd.DataFrame(
            columns=["URL", "Title", "Date", "Views", "Tags", "Tag Matches in Desc"]
        )

        for x in urls_list:
            crawl_delay = np.random.choice(delays)
            vid_tags, title, vid_date, desc, views = get_youtube_info(
                x, ua, crawl_delay
            )

            vid_tag_list = ""
            for i in vid_tags:
                vid_tag_list += i["content"] + ", "

                matches = tag_matches(desc, vid_tag_list)
                title = title.replace(" - YouTube", "")

                dict1 = {
                    "URL": x,
                    "Title": title,
                    "Date": vid_date,
                    "Views": views,
                    "Tags": vid_tag_list,
                    "Tag Matches in Desc": matches,
                }
                df2 = df2.append(dict1, ignore_index=True)
                df2.to_csv("vid-detail.csv")
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Django serving wrong template at the wrong address with malformed urls.py (redactor a Drone4four 2 2,528 Aug-17-2020, 01:09 PM
Last Post: Drone4four
  What i do wrong? In response i get home page code aruzo 1 1,546 Feb-23-2020, 11:32 PM
Last Post: micseydel
  what is wrong with this code? markayala 1 1,892 Aug-06-2019, 06:18 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020