Hyperlinks

DPaul · (This post was last modified: Feb-10-2023, 07:08 AM by DPaul.)

This time I have been given a truckload of pdfs.
They are scans of historical articles. Quality is good, OCR is OK.
I can search the texts programatically.

Here is where I could use some advice/ideas:
A researcher comes in and e.g. wants to find out which articles (pdfs)
contain the word "Napoleon" at least once. He wants to read the whole article for context.
I can do that, say, i find 10 'hits'.
Now I have to present the researcher with some type of media
with 10 hyperlinks to the 10 pdfs.
My initial thought is to make a html page on-the-fly,
start the default browser, somehow show the page if all that is possible?
When he/she wants to ask another question, get rid of the temporary html page, and do the same again.
We are NOT talking about an internet environment, just a desktop happening.
Any suggestions?
thx,
Paul

**Gribouillis** · (This post was last modified: Feb-10-2023, 08:26 AM by Gribouillis.)

(Feb-10-2023, 07:06 AM)DPaul Wrote: We are NOT talking about an internet environment, just a desktop happening.

You could start a Flask application serving html files on the local host for example (or Bottle, or whatever).

from bottle import route, run, template

@route('/hello/<name>')
def index(name):
    return template('<b>Hello {{name}}</b>!', name=name)

run(host='localhost', port=8080)

or

from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello_world():
    return "<p>Hello, World!</p>"

DPaul · Feb-10-2023, 09:13 AM

(Feb-10-2023, 08:26 AM)Gribouillis Wrote: You could start a Flask application serving html files on the local host for example (or Bottle, or whatever).

Wow, need to explore that possibility.
Thanks.
Paul

DPaul · Feb-10-2023, 02:01 PM

(Feb-10-2023, 08:26 AM)Gribouillis Wrote: You could start a Flask application serving html files on the local host for example (or Bottle, or whatever).

Just to see if I get this right: (Internet Information Services are enabled)
1) I do a search with a python program and I come up with 10 "hits".
2) I create an on the fly webpage (e.g. index.html) with 10 hyperlinks.
3) This page is posted to ..\wwwroot\index.html, and obviously replaces the old one.
4) In python : webbrowser.open('http://localhost', new=2)

This might no be entirely what you suggested, but I'm trying to KISS.
thx,
Paul

**Gribouillis** · (This post was last modified: Feb-10-2023, 03:19 PM by Gribouillis.)

No I mean a UI in a web browser such as this one for example, created with the help of the bottle tutorial. Start the program in a terminal, then open http://localhost:8080 in the web browser

from bottle import route, request, run

def search_form():
    return '''
        <form action="/search" method="post">
            Search: <input name="search" type="text" />
            <input value="Search" type="submit" />
        </form>
    '''

@route('/')
@route('/search')
def search():
    return search_form()

@route('/search', method='POST')
def do_search():
    search = request.forms.get('search')
    return f'''
        {search_form()}
        <h3>Results</h3>
        <ul>
        <li>spam {search}</li>
        <li>ham {search}</li>
        <li>eggs {search}</li>
        </ul>
    '''

run(host='localhost', port=8080, debug=True)

DPaul · (This post was last modified: Feb-11-2023, 07:10 AM by DPaul.)

(Feb-10-2023, 03:19 PM)Gribouillis Wrote: No I mean a UI in a web browser

Ok, I got half way there iusing bottle, and I downloaded the user guide.
But I have no experience running searches through a html page.
The example makes it a lot clearer. Thanks.
I will look into it, and eventually it needs to happen when released to the public via internet.

Just F.Y.I. I discovered a stumbling block, a lot of the texts seem to be in German,
and they are using "special" fonts. Only tesseract is good enough to figure those out,
but tesseract means images and not pdf texts. So now it will be either hyperlinks to
pdfs or we could show pages as images like prayer cards.
Still trying to KISS.
Paul

Edit: users just told me they prefer hyperlinks.... :-(

DPaul · (This post was last modified: Feb-11-2023, 09:41 AM by DPaul.)

I think I have this covered in a very "raw" format,
but I'm getting links to pdfs.
One more thing though:
I'm sort of mimicing "Google books" with those pdfs.
Only, "google books" shows not only the pages of your "hits",
but also highlights the search-word in yellow. Every occurence !
Probably they are not showing text, but a picture.
Pinpointing a word on a picture containing text, is something else.

Are the algorithms the "Google Books" are using open source ? Smile

Paul

Hyperlinks

User Panel Messages

Announcements