Beautifulsoup and JavaScript Values - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Beautifulsoup and JavaScript Values (/thread-24613.html) |
Beautifulsoup and JavaScript Values - Reldaing - Feb-22-2020 Hi, I'm trying to get a value from the soup that i created. Here's what I found: var next_chapter = "https://www.lelscan-vf.com/manga/vinland-saga/165"But the problem is that i would like to get the value from the "var next_chapter" Here's what I tried(even if it's nonsense): a= soup.find('var next_chapter') NextUrl = a['var next_chapter']Thanks for helping RE: Beautifulsoup and JavaScript Values - Larz60+ - Feb-23-2020 Did you fetch the page? and then create soup from the html? Please show complete code RE: Beautifulsoup and JavaScript Values - Reldaing - Feb-23-2020 Here it is: import requests import urllib.request import time from bs4 import BeautifulSoup import os if __name__ == '__main__': def dossier(): os.chdir("C://Users//Ridha//Desktop//Web-Scrapper") dossier() moit="https://www.lelscan-vf.com/manga/" url="https://www.lelscan-vf.com/manga/vinland-saga/164" def Navigate(url): response = requests.get(url) soup = BeautifulSoup(response.text, "html.parser") L= Img(soup) return soup def Img(soup): L = [] images = soup.findAll('img') for lien in images: if 'class' in lien.attrs: if lien['class']==['img-responsive']: a= lien['data-src'] L.append(a) return L def Next(soup): a= soup.find('var next_chapter') NextUrl = a['var next_chapter'] if NextUrl=="" NextUrl="End" else: NextUrl=a['var next_chapter'] return NextUrlIf you want to try the code , just type in the console Navigate("https://www.lelscan-vf.com/manga/vinland-saga/164" ) It ll give you the soup. And in the soup is the variable. I would like to get it and return it. Thanks for helping mate RE: Beautifulsoup and JavaScript Values - Larz60+ - Feb-23-2020 I attempted to try your code. The URL https://www.lelscan-vf.com/manga/vinland-saga/164 doesn't appear to be valid, or at least at 11:47 EST. I'll try again in the AM. Also, I'll reformat your code, it needs it. RE: Beautifulsoup and JavaScript Values - snippsat - Feb-23-2020 url works for me Larz60+. You are doing unnecessary stuff Reldaing. in Img(soup) function it could just be image = soup.find('img', class_="img-responsive scan-page")Here go directly to the to the img tag and take out url from src attribute.Example. import requests from bs4 import BeautifulSoup url = 'https://www.lelscan-vf.com/manga/vinland-saga/164' response = requests.get(url) soup = BeautifulSoup(response.content, 'lxml') image = soup.find('img', class_="img-responsive scan-page") image_url = image.get('src').strip() print(image_url) The next_chapter here can just use url and increment bye one.import requests from bs4 import BeautifulSoup def manga(start_img, stop_img): for page in range(start_img, stop_img): url = f'https://www.lelscan-vf.com/manga/vinland-saga/{page}' response = requests.get(url) soup = BeautifulSoup(response.content, 'lxml') image = soup.find('img', class_="img-responsive scan-page") image_url = image.get('src').strip() print(image_url) if __name__ == '__main__': start_img = 164 stop_img = 169 manga(start_img, stop_img) So maybe your or task is to downloads these images,try that or if just want url then this should do it.
RE: Beautifulsoup and JavaScript Values - Reldaing - Feb-23-2020 Hey , sorry for bothering you, but y teacher watns it o be like this. Then , It ll be referenced in the main file "Download.py" He made many websites with mangas like mangareader etc... And he used them in a class. the ur that I pasted you is onl an example, It could be any of the mangas in the website. Here's the main code . He made other imports but I m going to give you ny the one that I tried . If you need the others to try , just tell me. Thanks a lot.(ps: You only have to instantiate the class and then type self.Initsoup() import requests import urllib.request import os from PIL import Image import shutil if __name__ == '__main__': def dossier(): os.chdir("C://Users//Ridha//Desktop//Web-Scrapper") dossier() import lelscan as LS path = r"C:\Users\Ridha\Desktop\MangaScrapper" def Download(download_url,name): if download_url!="End" req = urllib.request.Request(download_url, headers={'User-Agent': 'Mozilla/5.0'}) web_byte = urllib.request.urlopen(req).read() open(name + '.jpg','wb').write(web_byte) class Site: def __init__(self,url,Titre): self.url = url self.soup = "" self.ListeLiens = [] self.Titre = Titre self.chapter = "" self.compteur = 0 def Initialisation(self): os.chdir(path) if self.Titre not in os.listdir(): os.mkdir(self.Titre) os.chdir(self.Titre) with open('Titre.txt','w+') as file: file.write(self.Titre) def Navigate(self): if 'lelscan' in self.url: [self.soup,self.ListeLiens] = LS.Navigate(self.url) def Next(self): if self.url != "End": with open("LastUrl.txt","w+") as file: file.write(self.url) if 'lelscan' in self.url: self.url = LS.Next(self.soup) def DownloadListe(self): for lien in self.ListeLiens: self.compteur +=1 Download(lien,f"{self.compteur:05d}") def InitSoup(self): while self.url != "End": self.Initialisation() self.Navigate() self.DownloadListe() self.Next() |