getting unique values and counting amounts

**Larz60+** · (This post was last modified: Mar-02-2019, 02:27 AM by Larz60+.)

I ran your code (with added dictionary display) and gor a count of 6772
The only mod that I made was the addition of function of display_dict

#  getting unique ngrams out of duplicates and counting how many times ngram appears
import requests
from bs4 import BeautifulSoup
import re
import string
from collections import OrderedDict
 
def cleanInput(input):
    input = re.sub('\n', " ", input)
    input = re.sub('\[[0-9]*\]', "", input)
    input = re.sub(' +', " ", input)
    input = bytes(input, "UTF-8")
    input = input.decode("ascii", "ignore")
    input = input.upper()
    cleanInput = []
    input = input.split(' ')
    for item in input:
        item = item.strip(string.punctuation)
        if len(item) > 1 or (item.lower() == 'a' or item.lower() == 'i'):
            cleanInput.append(item)
    return cleanInput
 
def getNgrams(input, n):
    input = cleanInput(input)
    output = dict()
    for i in range(len(input)-n+1):
        newNGram = " ".join(input[i:i+n])
        if newNGram in output:
            output[newNGram] += 1
        else:
            output[newNGram] = 1
    return output

def display_dict(thedict):
    for key, value in thedict.items():
        if isinstance(value, dict):
            print(f'{key}:')
            display_dict(value)
        else:
            print(f'    {key}: {value}')

html = requests.get("http://en.wikipedia.org/wiki/Python_(programming_language)")
bsObj = BeautifulSoup(html.content, 'html.parser')
content = bsObj.find("div", {"id": "mw-content-text"}).get_text()
ngrams = getNgrams(content, 2)
ngrams = OrderedDict(sorted(ngrams.items(), key=lambda t: t[1], reverse=True))
display_dict(ngrams)
# print(ngrams)
print("2-grams count is: "+str(len(ngrams)))

results will add attachment

.txt

ngrams.txt (Size: 140.46 KB / Downloads: 266)

Sorry for so many edits, gave me a bit of trouble attaching

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Get an average of the unique values of a column with group by condition and assign it	klllmmm	0	486	Feb-17-2024, 05:53 PM Last Post: klllmmm
	Counting the values in the dictionary	Inkanus	7	3,848	Oct-26-2020, 01:28 PM Last Post: Inkanus
	5 variants to invert dictionaries with non-unique values	Drakax1	2	2,700	Aug-31-2020, 11:40 AM Last Post: snippsat
	Finding Max and Min Values Associated with Unique Identifiers in Python	ubk046	1	2,163	May-08-2020, 12:04 PM Last Post: anbu23
	How to compare two columns and highlight the unique values of column two using pandas	shubhamjainj	0	4,372	Feb-24-2020, 06:19 AM Last Post: shubhamjainj
	Getting Unique values from LISTS	aankrose	2	2,332	Oct-17-2019, 05:33 PM Last Post: aankrose
	count unique values of a list with a list	3Pinter	2	4,939	Jul-05-2018, 11:52 AM Last Post: 3Pinter
	code that takes inputs for user name amounts etc and then sends custom message	shaumyabrata	5	5,446	Feb-12-2017, 11:37 AM Last Post: ichabod801

getting unique values and counting amounts

User Panel Messages

Announcements