Python Forum

Full Version: from List to BeautifulSoup , Homework
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone, I hope not to be annoying.
I am taking basic python lectures and I have a homework where I am asked to return a Json Dump of some information i have to look from 4 pages (html files)

I need to write a function but they give me a code that i am not allowed to modify:


with open('GivenPage.htm') as f:
    file1 = f.readlines()

With this code i have to write a function( in my notebook for python) to look for information and return a Json dump
def MyFunction(file1):
    import json
    import requests
    from bs4 import BeautifulSoup
    
   print(type(file1))
   print(file1)
    
First i printed to see what information i had:
# I go a list type
# I got the list html, example...
####
<class 'list'>
['<!DOCTYPE html><html><head><script>(function(){(function(){function e(a){this.t={};this.tick=function(a,c,b){var d=void 0!=b?b:(new Date).getTime();this.t[a]=[d,c];if(void 0==b)try{window.console.timeStamp("CSI/"+a)}catch(h){}};this.tick("start",null,a)}var a;if(window.performance)var d=(a=window.performance.timi

#####


So i decided to convert my list into a BeatifulSoup in order to be able to use .find_all and .find , to look for the nested tags, but as i did not find anything online to convert, i decided to convert it into a string and then try to convert the string into beautiful soup
NewString="".join(file1)
But right now i feel i bit stuck, I am looking online for finding a way, but meanwhile, I would like to know if you know a manner.

Thanks for your time.
I really do not want to be annoying.

Wall
I think you are on right track to convert the list into one string and now you need to pass it to BeautifulSoup and get the "soup". Then use available tools from bs to extract the info. We don't see what the info is....
I did not include the information in the forum because, there are almost 1500 lines of code. in the HTML

But my question is how to convert either from a list to a BeautifulSoup,...
or from a string to a BeautifulSoup.... in order to use find_all and be able to parse .

I really appreciate your help.

I am looking for it on the web,, if i find something i will post it. But if anyone here has a previous experience with it, it will be very useful.

Thanks in advance
But you have it in your post - the line with "".join()
from bs4 import BeautifulSoup

with open('GivenPage.htm') as f:
    file1 = f.readlines()
    
html = ''.join(file1)
soup = BeautifulSoup(html, 'html.parser')
print(soup.prettify())
 
Thank you !! I just had not converted into beautiful soup , I will continue doing it , there is a long way to finish it yet. I really appreciate your help
I have for example this part or html code,

Link to html code

I am trying to find the words in bold... # i made them bold to show which words, but they are not in bold in the file.They have to be a json like a dictionary

Output:
{ "result": { "Energy": { "change": -1.48 } "Basic Materials":{ "change": -0.35 } "Industrials": { "change": -0.46 } "Cyclical Cons. Goods ...": { "change" : 0.07 } .....# and so on until finish } #close result }#close the first parenthesis
I am using:

def my_function(file1):
    import json
    import requests
    from bs4 import BeautifulSoup
    
    dictionary= dict()
    
    html="".join(file1)
    soup = BeautifulSoup(html, 'html.parser')
    div_container=soup.find_all("div",class_="id-secperf sfe-section-major")
    
    for element in div_container:
        try element.parent
        
        print(element.text)

but I have a question on how to look for nested tags .. mine seems to not work
Fixed your html code as it was without code tag and indentation,can be better to use a external site like CodePen,JsFiddel...
that can display html code and also fix indentation.

You most find a(link) and span tag in div_container.
>>> div_container[0].find('a').text
'Energy'
>>> div_container[0].find('a').find_next('span').text
'-1.48%'
As loop over this is the result,no code look at hint over and try to figure it out.
Output:
Energy -1.48% ----------- Basic Materials -0.35% ----------- Industrials -0.46% ----------- Cyclical Cons. Goods ... +0.07% ----------- Non-Cyclical Cons. Goods... +0.08% ----------- Financials -0.52% ----------- Healthcare +1.24% ----------- Technology +0.54% ----------- Telecommunications Servi... -0.56% ----------- Utilities -0.61% -----------