Python Forum

module...:

#! python 3
#  I wonder who the five most popular mathematicians are?

from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup


def simple_get(url):
    """
    Attempts to get the content at `url` by making an HTTP GET request.
    If the content-type of response is some kind of HTML/XML, return the
    text content, otherwise return None.
    """
    try:
        with closing(get(url, stream=True)) as resp:
            if is_good_response(resp):
                return resp.content
            else:
                return None

    except RequestException as e:
        log_error('Error during requests to {0} : {1}'.format(url, str(e)))
        return None


def is_good_response(resp):
    """
    Returns True if the response seems to be HTML, False otherwise.
    """
    content_type = resp.headers['Content-Type'].lower()
    return (resp.status_code == 200 
            and content_type is not None 
            and content_type.find('html') > -1)


def log_error(e):
    """
    It is always a good idea to log errors. 
    This function just prints them, but you can
    make it do anything.
    """
    print(e)

def get_names():
    """
    Downloads the page where the list of mathematicians is found
    and returns a list of strings, one per mathematician
    """
    url = 'http://www.fabpedigree.com/james/mathmen.htm'
    response = simple_get(url)

    if response is not None:
        html = BeautifulSoup(response, 'html.parser')
        names = set()       # set ensures that you don’t end up with duplicate names.
        for li in html.select('li'):
            for name in li.text.split('\n'):
                if len(name) > 0:
                    names.add(name.strip())
        return list(names)
	# Raise an exception if we failed to get any data from the url
    raise Exception(f'Error retrieving contents at {format(url)}')

...used to run program...:

from bs4 import BeautifulSoup
from mathematicians import get_names

raw_html = get_names('http://www.fabpedigree.com/james/mathmen.htm')
html = BeautifulSoup(raw_html, 'html.parser')
for i, li in enumerate(html.select('li')):
	print(i, li.text)

...but error appears:

Error:Traceback (most recent call last):
  File "C:\Python37\kodovi\mathlist1.py", line 4, in <module>
    raw_html = get_names('http://www.fabpedigree.com/james/mathmen.htm')
TypeError: get_names() takes 0 positional arguments but 1 was given

I tried to add an argument in module in definition of function get_names() but then other error appears.
Don't know how to improve the code to start the program. The desired outcome is a list where one string is per one mathematicians.

get_names() most take argumet.

def get_names(url):
    """
    Downloads the page where the list of mathematicians is found
    and returns a list of strings, one per mathematician         """

    response = simple_get(url)
    if response is not None:
        html = BeautifulSoup(response, 'html.parser')
        names = set()       # set ensures that you don’t end up with duplicate names.
        for li in html.select('li'):
            for name in li.text.split('\n'):
                if len(name) > 0:
                    names.add(name.strip())
        return list(names)
    # Raise an exception if we failed to get any data from the url
    raise Exception(f'Error retrieving contents at {format(url)}')

The parsing is already done so (html.select('li')): will not work.

from bs4 import BeautifulSoup
from mathematicians import get_names

math_list = get_names('http://www.fabpedigree.com/james/mathmen.htm')
print(math_list[:5])

Output:
['Ernst E. Kummer', 'Siméon-Denis Poisson', 'Hipparchus  of Nicaea', 'Hermann Minkowski', 'F. Gotthold Eisenstein']

There is a problem because of set(),to solve the task.

Quote:# I wonder who the five most popular mathematicians are?

Sure can wonder if run again Confused

Output:
['William R. Hamilton', 'Adrien M. Legendre', 'M. E. Camille Jordan', 'Hipparchus  of Nicaea', "Jean le Rond d'Alembert"]

[quote='snippsat' pid='55382' dateline='1534295409']
Sure can wonder if run again Confused

[/quote]

Just to mention that I still haven't finished the whole program. The main work is still to be done!

Will use your suggestions. Thank you!

[quote='snippsat' pid='55382' dateline='1534295409']
The parsing is already done so (html.select('li')): will not work.
[/output]

I don't understand this. I thought that parsing should make selectors available.

Also, I'm not quite satisfied with your solution ( sorry for that, not trying to be inpolite! Big Grin

). The goal of this exercise is to enlist all mathematicians, one in each row. Adding limitation to 5 is not what should be done here.

I made some modifications...

def get_names(url):
    """
    Downloads the page where the list of mathematicians is found
    and returns a list of strings, one per mathematician
    """
    
    response = simple_get(url)

    if response is not None:
        html = BeautifulSoup(response, 'html.parser')
        names = set()       # set ensures that you don’t end up with duplicate names.
        for li in html.select('li'):
            for name in li.text.split('\n'):
                if len(name) > 0:
                    names.add(name.strip())
        return str(names)

	# Raise an exception if we failed to get any data from the url
    raise Exception(f'Error retrieving contents at {format(url)}')

from bs4 import BeautifulSoup
from mathematicians import get_names
 
math_list = get_names('http://www.fabpedigree.com/james/mathmen.htm')
print((math_list.split('\n')))

Output:
['{\'Felix Christian Klein\', \'Felix Hausdorff\', \'Johannes Kepler\', \'Eudoxus  of Cnidus\', \'Leonhard Euler\', \'James J. Sylvester\', \'Gaspard Monge\', \'Blaise Pascal\', \'Joseph Liouville\', \'Pythagoras  of Samos\', "Jean le Rond d\'Alembert", \'Hipparchus  of Nicaea\', \'Christiaan Huygens\', \'Omar al-Khayyám\', \'Georg Cantor\', \'Joseph-Louis Lagrange\', \'Pierre-Simon Laplace\', \'Isaac Newton\', \'William R. Hamilton\', \'Bernhard Riemann\', \'Élie Cartan\', \'Girolamo Cardano\', \'Stefan Banach\', \'M. E. Camille Jordan\', "Leonardo `Fibonacci\'", \'Siméon-Denis Poisson\', \'Bonaventura Cavalieri\', \'Archytas  of Tarentum\', \'Gottfried W. Leibniz\', \'Johann Bernoulli\', \'Archimedes\', \'F. L. Gottlob Frege\', \'Giuseppe Peano\', \'Adrien M. Legendre\', \'Alfred Tarski\', \'Pappus  of Alexandria\', \'George Pólya\', \'Pafnuti Chebyshev\', \'Karl W. T. Weierstrass\', \'Pierre de Fermat\', \'Emmy Noether\', \'Albert Einstein\', \'Michael F. Atiyah\', \'Srinivasa Ramanujan\', \'Charles Hermite\', \'Ernst E. Kummer\', \'Galileo Galilei\', \'Diophantus  of Alexandria\', \'Arthur Cayley\', \'L.E.J. Brouwer\', \'Johann H. Lambert\', \'Richard Dedekind\', \'Carl F. Gauss\', \'Peter G. L. Dirichlet\', \'Alan M. Turing\', \'David Hilbert\', \'Aryabhata\', \'Carl G. J. Jacobi\', \'Panini  of Shalatula\', \'Brahmagupta\', \'Atle Selberg\', \'Hermann K. H. Weyl\', \'Jean-Pierre Serre\', \'François Viète\', \'Godfrey H. Hardy\', \'Henri Poincaré\', \'Apollonius  of Perga\', \'Hermann Minkowski\', \'Aristotle\', \'Carl Ludwig Siegel\', \'Évariste Galois\', \'George D. Birkhoff\', \'Jakob Steiner\', \'Marius Sophus Lie\', \'Bháscara (II) Áchárya\', \'Hermann G. Grassmann\', \'F. Gotthold Eisenstein\', \'Jacques Hadamard\', \'Euclid  of Alexandria\', \'Joseph Fourier\', \'John E. Littlewood\', \'Shiing-Shen Chern\', \'Muhammed al-Khowârizmi\', \'Jean-Victor Poncelet\', \'Jacob Bernoulli\', \'Alhazen ibn al-Haytham\', \'Henri Léon Lebesgue\', \'F.E.J. Émile Borel\', \'Alexandre Grothendieck\', \'Julius Plücker\', \'André Weil\', \'John von Neumann\', \'Liu Hui\', \'Thales  of Miletus\', \'Andrey N. Kolmogorov\', \'René Descartes\', \'Kurt Gödel\', \'Niels Abel\', \'Augustin Cauchy\', \'James C. Maxwell\'}']

why math_list.split('\n')) doesn't bring desired outcome ( one line for one name )?

Truman

snippsat

Truman