I would say that you are using wrong tools,can look at my tutorials here part-1, part-2.
In part-2 i talk a little about concurrency.
lxml is one of the fastest parser in any language(has C library as core).
Can be used trough BeautifulSoup
Use Requests,then you can remove decode stuff you get correct encoded page back.
Regex is a really bad tool for html,a funny answer.
Scrapy is fast,it has build concurrency with Twisted.
In part-2 i talk a little about concurrency.
lxml is one of the fastest parser in any language(has C library as core).
Can be used trough BeautifulSoup
BeautifulSoup(url, 'lxml')
or alone.Use Requests,then you can remove decode stuff you get correct encoded page back.
Regex is a really bad tool for html,a funny answer.
Scrapy is fast,it has build concurrency with Twisted.