Python Forum

What is the distinction between 'sent3' and 'set(sent3)', 'sent2' and 'set(sent2)', etc.? 'sent3' generates the tokens entailed in sentence 3, which makes sense, but 'set(sent3)' is unusually ordered. Here is the code (caveat, I am a novice at Python tags):

>>> sent3
[output]['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.'][/output]
>>> set(sent3)
[output]{'created', '.', 'and', 'the', 'heaven', 'earth', 'God', 'In', 'beginning'}[/output]
>>> sent2
[output]['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', 'Sussex', '.'][/output]
>>> set(sent2)
[output]{'settled', 'long', 'been', 'of', 'The', 'Dashwood', 'had', 'in', 'Sussex', '.', 'family'}[/output]

sets are unordered. That is why they are not subsrcriptable. Try this:

a_set = {1, 2, 3, 4, 5}
a_set[0]

OK. Understood. The question extends to the following operations. If 'set(sent3)' generates the distinct token total for sentence 3, and 'set(text1)' generates the distinct token total for text 1, clearly the former operation is less than the latter. However, when I substitute 'sent3' for 'sent2' [i.e. set(sent2) < set(text1)] why is the output 'false', when the former operation [i.e. set(sent3) < set(text1)] is 'true'? Irrespective of which sentence, the vocabulary total for these 2 sentences (or any singular sentence as a general rule) will always be less than the vocabulary total of an entire text [i.e. set(text1)]. The code is below:

>>> set(sent2) < set(text1)
[output]False[/output]
>>> set(sent3) < set(text1)
[output]True[/output]
>>> set(sent4) < set(text1)
[output]False[/output]
>>> set(sent5) < set(text1)
[output]False[/output]
>>> set(sent6) < set(text1)
[output]False[/output]
>>> set(sent7) < set(text1)
[output]False[/output]
>>> set(sent8) < set(text1)
[output]False[/output]
>>> set(sent9) < set(text1)
[output]False [/output]

What does set_a > set_b even mean? Lists and tuples compare element by element, but since a tuple has no order, it makes no sense to compare elements. You say "distinct token total", but what does that mean? setA > setB if setA has more things? That is not the basis of comparison for any of the other collection types.

To be honest I am surprised that > and < don't throw an error when used with sets. The result is meaningless. Try this:

x = {'a', 'b', 'c', 'd', 'e'}
y = {'f', 'd', 'c', 'b', 'a'}
print(x > y)
print(y > x)
print(x == y)

Equal works if both sets have the same elements, but the code above returns:

Output:False
False
False

Maybe you mean to check the amount of items

list1 = [1, 2, 1, 3, 4]
set1 = set(list1)
print(len(list1), len(set1))
print(len(list1) > len(set1))

Output:5 4
True

(Jul-09-2020, 03:51 AM)deanhystad Wrote: [ -> ]To be honest I am surprised that > and < don't throw an error when used with sets. The result is meaningless.

The operators when used with sets test for subset and superset.

>>> set([3]) < set([4, 5])  # first is not a subset of the second
False
>>> set([5]) < set([4, 5])  # first is a subset of the second
True

AOCL1234

deanhystad

AOCL1234

deanhystad

Yoriz

bowlofred