Python Forum
I am trying to scrape data to broadcast it on Telegram
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
I am trying to scrape data to broadcast it on Telegram
#1
Hi all, I am one day old in Python development. I am using AI which already got me much further than I have ever expected to get. I got the bot to work basically. It scrapes data and prints it to my telegram channel. But it seems like it just posts one instance and after that, I get errors I don't understand, and shuts down the loops.

I am looking for tips and a better understanding for the code/error

import os
import re
import requests
import schedule
import time
from bs4 import BeautifulSoup
from dotenv import load_dotenv
from urllib.parse import urljoin, urlsplit
from telegram import Bot, InputFile
import asyncio

load_dotenv()

# Keep track of scraped data and downloaded images
scraped_data = set()
downloaded_images = set()

# Read desired symbols from a text file
with open("desired_symbols.txt", "r") as file:
    desired_symbols = [symbol.strip() for symbol in file.readlines() if symbol.strip()]

# Set up Telegram bot
telegram_token = "[hidden]"
telegram_channel_id = "[hidden]"
bot = Bot(token=telegram_token)

async def send_message_async(chat_id, text):
    await bot.send_message(chat_id=chat_id, text=text)


def scrape_website():
    global scraped_data

    cookies = {
    }

    headers = {
        'authority': 'www.tradingview.com',
    }

    response = requests.get('https://www.tradingview.com/ideas/followed-authors/', cookies=cookies, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")

    # Find all the idea titles and store them
    idea_titles = soup.find_all("a", class_="tv-widget-idea__title")
    title_list = [title.text.strip() for title in idea_titles]

    # Find all the idea symbols and store them
    idea_symbols = soup.find_all("a", class_="tv-widget-idea__symbol")
    symbol_list = [symbol.text.strip() for symbol in idea_symbols]

    # Find all the thumbnail image URLs and store them
    thumbnail_images = soup.find_all("img", class_="tv-widget-idea__cover")
    image_url_list = [image["data-src"] for image in thumbnail_images]

    # Create a directory to save the images
    directory = "thumbnails"
    if not os.path.exists(directory):
        os.makedirs(directory)

    # Download the thumbnail images and send them to Telegram
    for i, (image_url, title, symbol) in enumerate(zip(image_url_list, title_list, symbol_list)):
        if image_url in downloaded_images:
            print(f"Skipping image {i}. Already downloaded.")
            print()
            continue

        response = requests.get(image_url)
        if response.status_code == 200:
            # Get the file extension from the image URL
            file_extension = os.path.splitext(urlsplit(image_url).path)[1]
            # Generate a unique filename for each image
            filename = f"thumbnail_{i}{file_extension}"
            file_path = os.path.join(directory, filename)
            with open(file_path, "wb") as file:
                file.write(response.content)
                print(f"Thumbnail image {i} downloaded successfully.")
                # Add the image URL to the set of downloaded images
                downloaded_images.add(image_url)
        else:
            print(f"Failed to download thumbnail image {i}.")
        print()

        # Check if the symbol matches any desired symbol pattern
        if any(re.match(pattern, symbol) for pattern in desired_symbols):
            # Generate a unique identifier for the data
            data_id = f"{title}_{symbol}_{i}"
            # Check if the data has already been sent
            if data_id not in scraped_data:
                # Send the data to Telegram
                message = f"Title: {title}\nSymbol: {symbol}\nImage: {image_url}"
                asyncio.run(send_message_async(chat_id=telegram_channel_id, text=message))
                print(f"Data sent to Telegram: {message}")
                # Add the data ID to the set of scraped data
                scraped_data.add(data_id)
        print()


# Schedule the scraping job
schedule.every(1).minute.do(scrape_website)

while True:
    schedule.run_pending()
    time.sleep(1)
Error:
C:\Users\PC\PycharmProjects\scrape\venv\Scripts\python.exe C:\Users\PC\PycharmProjects\scrape\scrapingtradingview.py Thumbnail image 0 downloaded successfully. Thumbnail image 1 downloaded successfully. Data sent to Telegram: Title: Bitcoin - Crash is ready! Bear Flag is confirmed (must see) Symbol: BTCUSDT Image: https://s3.tradingview.com/l/ltjOHdi6_mid.png Thumbnail image 2 downloaded successfully. Traceback (most recent call last): File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\_async\http11.py", line 88, in handle_async_request await self._send_request_headers(**kwargs) File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\_async\http11.py", line 134, in _send_request_headers await self._send_event(event, timeout=timeout) File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\_async\http11.py", line 152, in _send_event await self._network_stream.write(bytes_to_send, timeout=timeout) File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\backends\asyncio.py", line 51, in write await self._stream.send(item=buffer) File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\anyio\streams\tls.py", line 203, in send await self._call_sslobject_method(self._ssl_object.write, item) File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\anyio\streams\tls.py", line 169, in _call_sslobject_method await self.transport_stream.send(self._write_bio.read()) File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 1238, in send self._transport.write(item) File "C:\Users\PC\AppData\Local\Programs\Python\Python311\Lib\asyncio\proactor_events.py", line 365, in write self._loop_writing(data=bytes(data)) File "C:\Users\PC\AppData\Local\Programs\Python\Python311\Lib\asyncio\proactor_events.py", line 401, in _loop_writing self._write_fut = self._loop._proactor.send(self._sock, data) ^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'send' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\telegram\request\_baserequest.py", line 277, in _request_wrapper code, payload = await self.do_request( ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\telegram\request\_httpxrequest.py", line 216, in do_request res = await self._client.request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpx\_client.py", line 1530, in request return await self.send(request, auth=auth, follow_redirects=follow_redirects) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpx\_client.py", line 1617, in send response = await self._send_handling_auth( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpx\_client.py", line 1645, in _send_handling_auth response = await self._send_handling_redirects( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpx\_client.py", line 1682, in _send_handling_redirects response = await self._send_single_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpx\_client.py", line 1719, in _send_single_request response = await transport.handle_async_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpx\_transports\default.py", line 353, in handle_async_request resp = await self._pool.handle_async_request(req) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\_async\connection_pool.py", line 261, in handle_async_request raise exc File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\_async\connection_pool.py", line 245, in handle_async_request response = await connection.handle_async_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\_async\connection.py", line 96, in handle_async_request return await self._connection.handle_async_request(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\_async\http11.py", line 119, in handle_async_request await self._response_closed() File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\_async\http11.py", line 232, in _response_closed await self.aclose() File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\_async\http11.py", line 240, in aclose await self._network_stream.aclose() File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\httpcore\backends\asyncio.py", line 54, in aclose await self._stream.aclose() File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\anyio\streams\tls.py", line 193, in aclose await self.transport_stream.aclose() File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 1261, in aclose self._transport.close() File "C:\Users\PC\AppData\Local\Programs\Python\Python311\Lib\asyncio\proactor_events.py", line 109, in close self._loop.call_soon(self._call_connection_lost, None) File "C:\Users\PC\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 761, in call_soon self._check_closed() File "C:\Users\PC\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 519, in _check_closed raise RuntimeError('Event loop is closed') RuntimeError: Event loop is closed The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\PC\PycharmProjects\scrape\scrapingtradingview.py", line 129, in <module> schedule.run_pending() File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\schedule\__init__.py", line 822, in run_pending default_scheduler.run_pending() File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\schedule\__init__.py", line 100, in run_pending self._run_job(job) File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\schedule\__init__.py", line 172, in _run_job ret = job.run() ^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\schedule\__init__.py", line 693, in run ret = self.job_func() ^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\scrapingtradingview.py", line 118, in scrape_website asyncio.run(send_message_async(chat_id=telegram_channel_id, text=message)) File "C:\Users\PC\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "C:\Users\PC\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\scrapingtradingview.py", line 28, in send_message_async await bot.send_message(chat_id=chat_id, text=text) File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\telegram\_bot.py", line 381, in decorator result = await func(self, *args, **kwargs) # skipcq: PYL-E1102 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\telegram\_bot.py", line 807, in send_message return await self._send_message( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\telegram\_bot.py", line 559, in _send_message result = await self._post( ^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\telegram\_bot.py", line 469, in _post return await self._do_post( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\telegram\_bot.py", line 497, in _do_post return await request.post( ^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\telegram\request\_baserequest.py", line 168, in post result = await self._request_wrapper( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\scrape\venv\Lib\site-packages\telegram\request\_baserequest.py", line 293, in _request_wrapper raise NetworkError(f"Unknown error in HTTP implementation: {repr(exc)}") from exc telegram.error.NetworkError: Unknown error in HTTP implementation: RuntimeError('Event loop is closed') Process finished with exit code 1
This is what I got already, very happy with the progress:
[Image: XhmoLvA.png]
Reply
#2
The url should well be?
https://www.tradingview.com/ideas/followed-authors/ # No
https://www.tradingview.com/ideas/ # Ok
You or let sat The AI🤯 dos a good job until the boot sending stuff.
Just to comment out stuff so out work,so it downloads images and work to line 95.
import os
import re
import requests
#import schedule
import time
from bs4 import BeautifulSoup
#from dotenv import load_dotenv
from urllib.parse import urljoin, urlsplit
#from telegram import Bot, InputFile
import asyncio

#load_dotenv()

# Keep track of scraped data and downloaded images
scraped_data = set()
downloaded_images = set()

'''
# Read desired symbols from a text file
with open("desired_symbols.txt", "r") as file:
    desired_symbols = [symbol.strip() for symbol in file.readlines() if symbol.strip()]

# Set up Telegram bot
telegram_token = "[hidden]"
telegram_channel_id = "[hidden]"
bot = Bot(token=telegram_token)

async def send_message_async(chat_id, text):
    await bot.send_message(chat_id=chat_id, text=text)'''

desired_symbols = ['XAUUSD', 'BTCUSDT']

def scrape_website():
    global scraped_data

    cookies = {
    }

    headers = {
        'authority': 'www.tradingview.com',
    }

    response = requests.get('https://www.tradingview.com/ideas/', cookies=cookies, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")

    # Find all the idea titles and store them
    idea_titles = soup.find_all("a", class_="tv-widget-idea__title")
    title_list = [title.text.strip() for title in idea_titles]

    # Find all the idea symbols and store them
    idea_symbols = soup.find_all("a", class_="tv-widget-idea__symbol")
    symbol_list = [symbol.text.strip() for symbol in idea_symbols]

    # Find all the thumbnail image URLs and store them
    thumbnail_images = soup.find_all("img", class_="tv-widget-idea__cover")
    image_url_list = [image["data-src"] for image in thumbnail_images]

    # Create a directory to save the images
    directory = "thumbnails"
    if not os.path.exists(directory):
        os.makedirs(directory)

    # Download the thumbnail images and send them to Telegram
    for i, (image_url, title, symbol) in enumerate(zip(image_url_list, title_list, symbol_list)):
        if image_url in downloaded_images:
            print(f"Skipping image {i}. Already downloaded.")
            print()
            continue

        response = requests.get(image_url)
        if response.status_code == 200:
            # Get the file extension from the image URL
            file_extension = os.path.splitext(urlsplit(image_url).path)[1]
            # Generate a unique filename for each image
            filename = f"thumbnail_{i}{file_extension}"
            file_path = os.path.join(directory, filename)
            with open(file_path, "wb") as file:
                file.write(response.content)
                #print(f"Thumbnail image {i} downloaded successfully.")
                # Add the image URL to the set of downloaded images
                downloaded_images.add(image_url)

        else:
            print(f"Failed to download thumbnail image {i}.")

        # Check if the symbol matches any desired symbol pattern
        if any(re.match(pattern, symbol) for pattern in desired_symbols):
            # Generate a unique identifier for the data
            data_id = f"{title}_{symbol}_{i}"
            #print(data_id)
            # Check if the data has already been sent
            if data_id not in scraped_data:
                # Send the data to Telegram
                message = f"Title: {title}\nSymbol: {symbol}\nImage: {image_url}"
                print(message) # Works
                '''
                asyncio.run(send_message_async(chat_id=telegram_channel_id, text=message))
                print(f"Data sent to Telegram: {message}")
                # Add the data ID to the set of scraped data
                scraped_data.add(data_id)'''

scrape_website()
Output:
Title: XAUUSD Symbol: XAUUSD Image: https://s3.tradingview.com/h/Hum0gcR9_mid.png Title: Bitcoin - Crash is ready! Bear Flag is confirmed (must see) Symbol: BTCUSDT Image: https://s3.tradingview.com/l/ltjOHdi6_mid.png Title: BTC: Both Moves Are Possible Symbol: BTCUSDT Image: https://s3.tradingview.com/8/8OSvPnZ0_mid.png Title: GOLD → Gold continues to trade within the range Symbol: XAUUSD Image: https://s3.tradingview.com/l/LVZp77FL_mid.png Title: Lingrid | GOLD correction to SUPPORT Symbol: XAUUSD Image: https://s3.tradingview.com/s/sMz8FLb3_mid.png Title: GOLD Local Short From Resistance! Sell! Symbol: XAUUSD Image: https://s3.tradingview.com/2/2lKjU35s_mid.png Title: ✏️ Gold will Reach $2000 Again ? READ THE CAPTION Symbol: XAUUSD Image: https://s3.tradingview.com/w/WTebCpOM_mid.png Title: Bitcoin was created by the banks! (Proof) Symbol: BTCUSDT Image: https://s3.tradingview.com/l/l5xUSS4d_mid.png Title: Bitcoin will go UP by Falling Wedge&Symmetrical Triangle(15-Min) Symbol: BTCUSDT Image: https://s3.tradingview.com/b/b89Z05M1_mid.png Title: Gold (XAUUSD): Before The Market Closed... 🟡 Symbol: XAUUSD Image: https://s3.tradingview.com/h/HRrkVYNP_mid.png
So a advice to take one message and try to get the bot to work.
Doing stuff/test in a loop is hard.
BarryBoos Wrote:Hi all, I am one day old in Python development.
This also make it difficult to troubleshoot as the basic is not there yet👀
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to scrape data from HTML with no identifiers pythonpaul32 2 872 Dec-02-2023, 03:42 AM
Last Post: pythonpaul32
  How can I target and scrape a data-stat never5000 5 2,822 Feb-11-2022, 07:59 PM
Last Post: snippsat
  Is it possible to scrape this data from Google Searches rosjo 1 2,203 Nov-06-2020, 06:51 PM
Last Post: Larz60+
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,211 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  Crawler Telegram sysx90 0 2,413 Nov-30-2019, 04:32 PM
Last Post: sysx90
  Want to scrape a table data and export it into CSV format tahir1990 9 5,299 Oct-22-2019, 08:03 AM
Last Post: buran
  webscrapping links and then enter those links to scrape data kirito85 2 3,225 Jun-13-2019, 02:23 AM
Last Post: kirito85
  Scrape ASPX data with python... hoff1022 0 4,546 Feb-26-2019, 06:16 PM
Last Post: hoff1022

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020