Python Forum
Looking for good doc on Scraping coverage algorithms - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Looking for good doc on Scraping coverage algorithms (/thread-15136.html)



Looking for good doc on Scraping coverage algorithms - Larz60+ - Jan-05-2019

I'm looking for documents describing (math and/or python) how to query sites for coverage of data.
My example search site for company names:
Conditions:
  • site limits results from any query to 1000 rows.
  • site allows '*' wildcard
  • query options:
    1. Exact words in exact word order.
    2. Exact words in any word order.
    3. Soundex words exact order.
    4. Soundex words any order.
    5. Extended Search in any word order.
  • Site allows query by registry number (company id), but does not allow wild cards or ranges for this option.
If I use A* obviously exceed query return limit
AA* excludes A by itself
How can I get next 1000 and so on records for A*?

Should be relatively simple, but can't wrap my mind around it.