Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Strange Python Error
#1
Good evening,

When running my code, it is giving me an error that I am not understanding as it has lines referenced that I don't have. My code only has 234 lines, but the error is referencing things that I do not have listed in my code. How do I figure out what the problem is?

This is where it is saying my error is. But I am not sure why:
 def set_content_to_tag(self, tag, tag_id=None):
        """Changes _content to the text within a specific element of an HTML document.
            Keyword arguments:
                tag (str) -- Tag to read
                tag_id (str) -- ID of tag to read
            It's possible the HTML does not contain the tag being searched. 
            You should use exception handling to catch any errors."""
             
        soup = BeautifulSoup(self._content, 'html.parser')
        content = soup.find('{}'.format(tag),{'id':'{}'.format(tag_id)})
        if content == None:
            raise Exception ("Tag or attribute does not exist")
        self._content = content.getText()
        print(content)
This is the error it shows, and I am completely lost. Can someone help me understand why it is referencing lines of code I don't have, and how it got there?
Error:
TypeError Traceback (most recent call last) <ipython-input-3-6ba650ec7019> in <module>() 218 print(ta._orig_content) 219 print(ta._orig_content) --> 220 ta.set_content_to_tag('div','content-main') 221 ta.set_content_to_tag('div' ,'device-xs visible-xs') 222 print(ta._content) <ipython-input-3-6ba650ec7019> in set_content_to_tag(self, tag, tag_id) 58 You should use exception handling to catch any errors.""" 59 ---> 60 soup = BeautifulSoup(self._content, 'html.parser') 61 content = soup.find('{}'.format(tag),{'id':'{}'.format(tag_id)}) 62 if content == None: ~\Anaconda3\lib\site-packages\bs4\__init__.py in __init__(self, markup, features, builder, parse_only, from_encoding, exclude_encodings, **kwargs) 277 self.contains_replacement_characters) in ( 278 self.builder.prepare_markup( --> 279 markup, from_encoding, exclude_encodings=exclude_encodings)): 280 self.reset() 281 try: ~\Anaconda3\lib\site-packages\bs4\builder\_htmlparser.py in prepare_markup(self, markup, user_specified_encoding, document_declared_encoding, exclude_encodings) 235 try_encodings = [user_specified_encoding, document_declared_encoding] 236 dammit = UnicodeDammit(markup, try_encodings, is_html=True, --> 237 exclude_encodings=exclude_encodings) 238 yield (dammit.markup, dammit.original_encoding, 239 dammit.declared_html_encoding, ~\Anaconda3\lib\site-packages\bs4\dammit.py in __init__(self, markup, override_encodings, smart_quotes_to, is_html, exclude_encodings) 364 365 u = None --> 366 for encoding in self.detector.encodings: 367 markup = self.detector.markup 368 u = self._convert_from(encoding) ~\Anaconda3\lib\site-packages\bs4\dammit.py in encodings(self) 255 if self.declared_encoding is None: 256 self.declared_encoding = self.find_declared_encoding( --> 257 self.markup, self.is_html) 258 if self._usable(self.declared_encoding, tried): 259 yield self.declared_encoding ~\Anaconda3\lib\site-packages\bs4\dammit.py in find_declared_encoding(cls, markup, is_html, search_entire_document) 313 314 declared_encoding = None --> 315 declared_encoding_match = xml_encoding_re.search(markup, endpos=xml_endpos) 316 if not declared_encoding_match and is_html: 317 declared_encoding_match = html_meta_re.search(markup, endpos=html_endpos) TypeError: expected string or bytes-like object
Reply
#2
your error derives from this line
 soup = BeautifulSoup(self._content, 'html.parser')
and the end result is basically
Error:
TypeError: expected string or bytes-like object
So my first guess would be its not a string or or bytes-like object like you expect it is
So my question then would be what is self._content?
Just before this line do a

print(type(self._content))
and report what it says
Recommended Tutorials:
Reply
#3
metulburr - self._content is a part of the class I am writing code for. When I put the print(type(self._content)) and ran it, it gave me no error.
I have never used BeautifulSoup before, and after looking at other posts for help through other websites it seemed as if that was the correct way to reference the multiple types of files/text that could be used.

- I am currently having an issue with the def_discover part. My test code is not recognizing it as what it should be. It's seeing it as 'discover' instead of the correct text/url/file it should be read as.

Here is the beginning of the code where self._content comes from:

import requests, re
from bs4 import BeautifulSoup
from collections import Counter
import statistics as stats
import string
import operator
import matplotlib.pyplot as plt
plt.rcdefaults()

class TextAnalyzer():
    "A Text Analyzer"
    def __init__(self, src, src_type='discover'):   
        """Creates a object for analyzing text
    
        Keyword arguments:
        src (str) -- text, path to file, or url
        src_type (str) -- The type of input (text, path, url, discover)"""

       
        if isinstance(src, str) == False or len(src) <= 0:
            raise exception('Source must be a valid string, filepath or a valid URL')

        self._src = src
        self._src_type = src_type
        self._content = []
        self._orig_content = []


    def discover_url(self):
        self._src.startswith('http')
        self._src_type = 'url'
        url = 'https://www.webucator.com/how-to/address-by-bill-clinton-1997.cfm'
        r = requests.get(self._src)
        res = r.content
        self._orig_content = r.text
        self._content = res

    def discover_path(self):
        self._src.endswith('.txt')
        src_type = 'path'
        with open('pride-and-prejudice.txt') as f:
            self._content = self._orig_content
     

    def discover_text(self):
        src_type = 'text'
        text = ("The outlook wasn't brilliant for the Mudville Nine that day;the score stood four to two, with but one inning more to play. And then when Cooney died at first, and Barrows did the same, a sickly silence fell upon the patrons of the game.")
        self._orig_content = self._src
        self._content = self._src

       
    def set_content_to_tag(self, tag, tag_id=None):
        """Changes _content to the text within a specific element of an HTML document.
            Keyword arguments:
                tag (str) -- Tag to read
                tag_id (str) -- ID of tag to read
            It's possible the HTML does not contain the tag being searched. 
            You should use exception handling to catch any errors."""
             
        soup = BeautifulSoup(self._content, 'html.parser')
        print(type(self._content))
        content = soup.find('{}'.format(tag),{'id':'{}'.format(tag_id)})
        if content == None:
            raise Exception ("Tag or attribute does not exist")
        self._content = content.getText()
        print(content)
Reply
#4
As I said in your other thread, you are never processing the inputs, at least not in __init__. So if you are going straight to set_contents_to_tag, self._contents is an empty list, which isn't HTML that can be processed.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020