Python Forum

Full Version: What is distinction between 'sent3' and 'set(sent3)'?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
What is the distinction between 'sent3' and 'set(sent3)', 'sent2' and 'set(sent2)', etc.? 'sent3' generates the tokens entailed in sentence 3, which makes sense, but 'set(sent3)' is unusually ordered. Here is the code (caveat, I am a novice at Python tags):

>>> sent3
[output]['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.'][/output]
>>> set(sent3)
[output]{'created', '.', 'and', 'the', 'heaven', 'earth', 'God', 'In', 'beginning'}[/output]
>>> sent2
[output]['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', 'Sussex', '.'][/output]
>>> set(sent2)
[output]{'settled', 'long', 'been', 'of', 'The', 'Dashwood', 'had', 'in', 'Sussex', '.', 'family'}[/output]
sets are unordered. That is why they are not subsrcriptable. Try this:
a_set = {1, 2, 3, 4, 5}
a_set[0]
OK. Understood. The question extends to the following operations. If 'set(sent3)' generates the distinct token total for sentence 3, and 'set(text1)' generates the distinct token total for text 1, clearly the former operation is less than the latter. However, when I substitute 'sent3' for 'sent2' [i.e. set(sent2) < set(text1)] why is the output 'false', when the former operation [i.e. set(sent3) < set(text1)] is 'true'? Irrespective of which sentence, the vocabulary total for these 2 sentences (or any singular sentence as a general rule) will always be less than the vocabulary total of an entire text [i.e. set(text1)]. The code is below:

>>> set(sent2) < set(text1)
[output]False[/output]
>>> set(sent3) < set(text1)
[output]True[/output]
>>> set(sent4) < set(text1)
[output]False[/output]
>>> set(sent5) < set(text1)
[output]False[/output]
>>> set(sent6) < set(text1)
[output]False[/output]
>>> set(sent7) < set(text1)
[output]False[/output]
>>> set(sent8) < set(text1)
[output]False[/output]
>>> set(sent9) < set(text1)
[output]False [/output]  
What does set_a > set_b even mean? Lists and tuples compare element by element, but since a tuple has no order, it makes no sense to compare elements. You say "distinct token total", but what does that mean? setA > setB if setA has more things? That is not the basis of comparison for any of the other collection types.

To be honest I am surprised that > and < don't throw an error when used with sets. The result is meaningless. Try this:
x = {'a', 'b', 'c', 'd', 'e'}
y = {'f', 'd', 'c', 'b', 'a'}
print(x > y)
print(y > x)
print(x == y)
Equal works if both sets have the same elements, but the code above returns:
Output:
False False False
Maybe you mean to check the amount of items
list1 = [1, 2, 1, 3, 4]
set1 = set(list1)
print(len(list1), len(set1))
print(len(list1) > len(set1))
Output:
5 4 True
(Jul-09-2020, 03:51 AM)deanhystad Wrote: [ -> ]To be honest I am surprised that > and < don't throw an error when used with sets. The result is meaningless.

The operators when used with sets test for subset and superset.


>>> set([3]) < set([4, 5])  # first is not a subset of the second
False
>>> set([5]) < set([4, 5])  # first is a subset of the second
True