Python Forum

Full Version: Looking for good doc on Scraping coverage algorithms
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm looking for documents describing (math and/or python) how to query sites for coverage of data.
My example search site for company names:
Conditions:
  • site limits results from any query to 1000 rows.
  • site allows '*' wildcard
  • query options:
    1. Exact words in exact word order.
    2. Exact words in any word order.
    3. Soundex words exact order.
    4. Soundex words any order.
    5. Extended Search in any word order.
  • Site allows query by registry number (company id), but does not allow wild cards or ranges for this option.
If I use A* obviously exceed query return limit
AA* excludes A by itself
How can I get next 1000 and so on records for A*?

Should be relatively simple, but can't wrap my mind around it.