from List to BeautifulSoup , Homework - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: from List to BeautifulSoup , Homework (/thread-11290.html) |
from List to BeautifulSoup , Homework - RPC - Jul-02-2018 Hello everyone, I hope not to be annoying. I am taking basic python lectures and I have a homework where I am asked to return a Json Dump of some information i have to look from 4 pages (html files) I need to write a function but they give me a code that i am not allowed to modify: with open('GivenPage.htm') as f: file1 = f.readlines()With this code i have to write a function( in my notebook for python) to look for information and return a Json dump def MyFunction(file1): import json import requests from bs4 import BeautifulSoup print(type(file1)) print(file1)First i printed to see what information i had: # I go a list type # I got the list html, example... #### <class 'list'> ['<!DOCTYPE html><html><head><script>(function(){(function(){function e(a){this.t={};this.tick=function(a,c,b){var d=void 0!=b?b:(new Date).getTime();this.t[a]=[d,c];if(void 0==b)try{window.console.timeStamp("CSI/"+a)}catch(h){}};this.tick("start",null,a)}var a;if(window.performance)var d=(a=window.performance.timi ##### So i decided to convert my list into a BeatifulSoup in order to be able to use .find_all and .find , to look for the nested tags, but as i did not find anything online to convert, i decided to convert it into a string and then try to convert the string into beautiful soup NewString="".join(file1)But right now i feel i bit stuck, I am looking online for finding a way, but meanwhile, I would like to know if you know a manner. Thanks for your time. I really do not want to be annoying. RE: from List to BeautifulSoup , Homework - buran - Jul-02-2018 I think you are on right track to convert the list into one string and now you need to pass it to BeautifulSoup and get the "soup". Then use available tools from bs to extract the info. We don't see what the info is.... RE: from List to BeautifulSoup , Homework - RPC - Jul-02-2018 I did not include the information in the forum because, there are almost 1500 lines of code. in the HTML But my question is how to convert either from a list to a BeautifulSoup,... or from a string to a BeautifulSoup.... in order to use find_all and be able to parse . I really appreciate your help. I am looking for it on the web,, if i find something i will post it. But if anyone here has a previous experience with it, it will be very useful. Thanks in advance RE: from List to BeautifulSoup , Homework - buran - Jul-02-2018 But you have it in your post - the line with "".join() from bs4 import BeautifulSoup with open('GivenPage.htm') as f: file1 = f.readlines() html = ''.join(file1) soup = BeautifulSoup(html, 'html.parser') print(soup.prettify()) RE: from List to BeautifulSoup , Homework - RPC - Jul-02-2018 Thank you !! I just had not converted into beautiful soup , I will continue doing it , there is a long way to finish it yet. I really appreciate your help RE: from List to BeautifulSoup , Homework - RPC - Jul-02-2018 I have for example this part or html code, Link to html code I am trying to find the words in bold... # i made them bold to show which words, but they are not in bold in the file.They have to be a json like a dictionary I am using:def my_function(file1): import json import requests from bs4 import BeautifulSoup dictionary= dict() html="".join(file1) soup = BeautifulSoup(html, 'html.parser') div_container=soup.find_all("div",class_="id-secperf sfe-section-major") for element in div_container: try element.parent print(element.text)but I have a question on how to look for nested tags .. mine seems to not work RE: from List to BeautifulSoup , Homework - snippsat - Jul-03-2018 Fixed your html code as it was without code tag and indentation,can be better to use a external site like CodePen,JsFiddel... that can display html code and also fix indentation. You most find a (link) and span tag in div_container.>>> div_container[0].find('a').text 'Energy' >>> div_container[0].find('a').find_next('span').text '-1.48%'As loop over this is the result,no code look at hint over and try to figure it out.
|