Jul-27-2018, 10:21 AM
Hi guys, since I didn't get any clear response to the scrapy thread. I shifted and trying my luck with HTMLparser.
Here's the problem. Whenever I call for 'a' it reads
For those who are very familiar with htmlparser, I hope you can help me out, tried finding some clear solution on the internet with no luck.
Here's the code:
Here's the problem. Whenever I call for 'a' it reads
Quote:<a and also includes the </a>of course, I call for a and didn't state to read only the opening a-tag, tried adding "<" on my a, but it didn't read and output right, it just output's nothing. It's a mystery to me at first why I'm getting 4 outputs/prints on only two hyperlink I had created then finally figured it out.
For those who are very familiar with htmlparser, I hope you can help me out, tried finding some clear solution on the internet with no luck.
Here's the code:
def handle_starttag(self, tag, attrs): if tag == 'a': for (attribute, value) in attrs: if value == 'nofollow': print(value) else: print('dofollow') finder = myHtmlParser() finder.feed('<html><head></head><title>Test</title><body><h1>Parse me!</h1><a rel="nofollow" href="http://sampledomain.com">sample anchor text</a><a rel="author" href="/video-page.html"></a></body></html>')
Output:nofollow
dofollow
dofollow
dofollow