Python Forum
from List to BeautifulSoup , Homework
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
from List to BeautifulSoup , Homework
#1
Hello everyone, I hope not to be annoying.
I am taking basic python lectures and I have a homework where I am asked to return a Json Dump of some information i have to look from 4 pages (html files)

I need to write a function but they give me a code that i am not allowed to modify:

with open('GivenPage.htm') as f:
    file1 = f.readlines()
With this code i have to write a function( in my notebook for python) to look for information and return a Json dump
def MyFunction(file1):
    import json
    import requests
    from bs4 import BeautifulSoup
    
   print(type(file1))
   print(file1)
    
First i printed to see what information i had:
# I go a list type
# I got the list html, example...
####
<class 'list'>
['<!DOCTYPE html><html><head><script>(function(){(function(){function e(a){this.t={};this.tick=function(a,c,b){var d=void 0!=b?b:(new Date).getTime();this.t[a]=[d,c];if(void 0==b)try{window.console.timeStamp("CSI/"+a)}catch(h){}};this.tick("start",null,a)}var a;if(window.performance)var d=(a=window.performance.timi

#####


So i decided to convert my list into a BeatifulSoup in order to be able to use .find_all and .find , to look for the nested tags, but as i did not find anything online to convert, i decided to convert it into a string and then try to convert the string into beautiful soup
NewString="".join(file1)
But right now i feel i bit stuck, I am looking online for finding a way, but meanwhile, I would like to know if you know a manner.

Thanks for your time.
I really do not want to be annoying.

Wall
Reply
#2
I think you are on right track to convert the list into one string and now you need to pass it to BeautifulSoup and get the "soup". Then use available tools from bs to extract the info. We don't see what the info is....
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
I did not include the information in the forum because, there are almost 1500 lines of code. in the HTML

But my question is how to convert either from a list to a BeautifulSoup,...
or from a string to a BeautifulSoup.... in order to use find_all and be able to parse .

I really appreciate your help.

I am looking for it on the web,, if i find something i will post it. But if anyone here has a previous experience with it, it will be very useful.

Thanks in advance
Reply
#4
But you have it in your post - the line with "".join()
from bs4 import BeautifulSoup

with open('GivenPage.htm') as f:
    file1 = f.readlines()
    
html = ''.join(file1)
soup = BeautifulSoup(html, 'html.parser')
print(soup.prettify())
 
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
Thank you !! I just had not converted into beautiful soup , I will continue doing it , there is a long way to finish it yet. I really appreciate your help
Reply
#6
I have for example this part or html code,

Link to html code

I am trying to find the words in bold... # i made them bold to show which words, but they are not in bold in the file.They have to be a json like a dictionary

Output:
{ "result": { "Energy": { "change": -1.48 } "Basic Materials":{ "change": -0.35 } "Industrials": { "change": -0.46 } "Cyclical Cons. Goods ...": { "change" : 0.07 } .....# and so on until finish } #close result }#close the first parenthesis
I am using:

def my_function(file1):
    import json
    import requests
    from bs4 import BeautifulSoup
    
    dictionary= dict()
    
    html="".join(file1)
    soup = BeautifulSoup(html, 'html.parser')
    div_container=soup.find_all("div",class_="id-secperf sfe-section-major")
    
    for element in div_container:
        try element.parent
        
        print(element.text)
but I have a question on how to look for nested tags .. mine seems to not work
Reply
#7
Fixed your html code as it was without code tag and indentation,can be better to use a external site like CodePen,JsFiddel...
that can display html code and also fix indentation.

You most find a(link) and span tag in div_container.
>>> div_container[0].find('a').text
'Energy'
>>> div_container[0].find('a').find_next('span').text
'-1.48%'
As loop over this is the result,no code look at hint over and try to figure it out.
Output:
Energy -1.48% ----------- Basic Materials -0.35% ----------- Industrials -0.46% ----------- Cyclical Cons. Goods ... +0.07% ----------- Non-Cyclical Cons. Goods... +0.08% ----------- Financials -0.52% ----------- Healthcare +1.24% ----------- Technology +0.54% ----------- Telecommunications Servi... -0.56% ----------- Utilities -0.61% -----------
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python BeautifulSoup IndexError: list index out of range rhat398 1 6,138 May-28-2021, 09:09 PM
Last Post: Daring_T
  Getting 'list index out of range' while fetching product details using BeautifulSoup? PrateekG 8 7,987 Jun-06-2018, 12:15 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020