Python Forum

Full Version: Scrapping - A tool?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Hi guys.

At my work, I was asked for a solution for this problem. Basically, I work at a newspaper, who wish to scrap some urls for data, where the data lies in a tables, which you download as csv. It so some of the journalist, can keep track on suddens numbers on suddens subjects. The things is, they are asking for somekind of tool, which can scrap the data and where they can also see the data - Basically, a tool which is quite user friendly.

I hope I explained myself well enough, so it can be understood. Do you know of any solution, to scrap the internet, without writing a script in python ?

Have super awesome day!

//Kasper
It depends a lot on the particular website (i.e. it could be as easy as using Data->fro Web in Excel), your goals, etc.
Also you can check https://scrapinghub.com/
I though about using excel actually, because that is a program that everybody knows. But more, if there were any other tools out there.

Thank you for your reply!
Writing a script to scrape some data on a site is not that hard. If you can download the data as csv files, then you dont need to scrape the website at all. Actually cant you import csv files directly to Excel/OpenOffice Calc?

Of course you can easily create a python csv reader and put it to GUI to be user friendly.
(Oct-10-2019, 01:10 PM)kasper1903 Wrote: [ -> ]who wish to scrap some urls for data, where the data lies in a tables,
One simplest way is to use Pandas.
Will find any table on a web-site,an can easy read it to excel df.to_excel().
import pandas as pd
 
wiki_timeline = pd.read_html('https://en.wikipedia.org/wiki/Timeline_of_programming_languages', match='Guido Van Rossum')
wiki_timeline[0].tail()
[Image: 1p48av.jpg]
Here use JupyterLab the the display look just like if would in excel.
I totally agree! The problem is, that the constraint was, that they needed a tool they could change the things themselves in. Hence, I cant write my own script or use pandas - That is why, I probably will try out Excel, since that is something everybody knows.

Did not know of JupyterLab. I need to check that out for sure!

Thank you!
Hi guys! Me again. I hope you can help me with something. As I was trying to scrape a specific url, it doesn't recognize any tables. Looking into the html, I can see div have been used alot, and it seems like it cannot not understand the structure of the raw data. I also tried with pandas, where I get the error "no tables found".

Both snippsat suggestions, but also with

import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.sdk.dk/sdkbrugt/#/"
df = pd.read_html(url)


pd.read_html(requests.get(url).text)
Do you know, how to solve the problem ?
The url I am using is: http://www.traktor-hostspecialisten.dk/b...er.html/#/ (You probably wont be able to understand the language ;) )
That is because that site is using ul/li instead of tables
But i am unsure of how to use pandas to obtain that
Well, I got one step closer. But I expected to be something with that
- Anybody who know how to bypass ul/li ?

But if you look at this url: http://semleragro.dk/brugte-maskiner/brugte-maskiner/
- If is structured with tables. But the same errors occurs

Thank you for the help guys. Each answer have given me knew knowledge! :)
that site has numerous tables, so you might be getting the wrong one from what you are expecting
Pages: 1 2