Python Forum
Computer science can you help me with the last part of the code after mentionedWords.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Computer science can you help me with the last part of the code after mentionedWords.
#1
My code currently:
def cleanedup(s):
    alphabet='abcdefghijklmnopqrstuvwxyz@_0123456789'
    cleantext= ''
    for character in s.lower():
           if character in alphabet:
                 cleantext+= character
           else:
                 cleantext+=''
    return cleantext

import os
counts={}
mentionedWords=[]
def contains(filename):
    with open(filename) as file:
        for word in cleanedup(line).split():
            if filename[2]=='@':
                if word in counts:
                    counts[word]+=1
                else:
                    counts[word]=1
                print("fn=",filename)
    
for word in counts:
    mentionedWords.append([counts[word],word])


for filename in os.listdir('.'):
    if filename[-3:]=='.tweets':
        contains (filename)
                
mentionedWords.sort()
print (mentionedWords[3:])
So far my code should be correct. I just need to write a loop after mentionedWords.sort() This is the feedback from my professor:
Example:
To print the username followed by its count, remember that each item in mentionedWords in your program is a list, and the unit 4 "Learn" PDF starting on p. 8 explains that items in a list can be accessed by index. For example, suppose you have a list x = [123, 'abc']. Then, x[0] is 123 and x[1] is 'abc'. If you want to print the second item in the list followed by the first item, i.e. abc followed by 123, you would just do print(x[1], x[0]). So, in your program, write a loop that goes through each item of the slice that represents the 3 most frequently mentioned usernames in mentionedWords. (I explained how to get the slice in my feedback. The slice is just mentionedWords[-5:], but change the -5 to -3.) Then, in the body of the loop, write a print statement similar to the one I just mentioned.
Reply
#2
(Nov-26-2020, 11:16 PM)shirleylam852 Wrote: So far my code should be correct.

As I have very little knowledge what you want to accomplish I can't say for certain that your statement is incorrect. However, it is safe to say that either split() on line # 16 is redundant or your code does something which you didn't intend.

How your function cleanedup() performs: it eliminates all characters (including non-printing i.e. whitespaces, newlines, CR etc) which are not in alphabet. So result is this:


>>> cleanedup('Monty Python Flying Circus\n')
montypythonflyingcircus                                             # returns one word string
>>> cleanedup('Monty Python Flying Circus\n').split()
['montypythonflyingcircus']                                         # nothing to split on, so one item list
>>> for item in cleanedup('Monty Python Flying Circus\n').split():   
...     print(item)
...
montypythonflyingcircus
Despite the fact that I don't know what your objective is I doubt that this can be called 'correct'. To be more assistive you need to provide more information.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
@perfringo

I assumed it was correct bc my professor gave me changes and that was the only part left I needed to write. I guess you could be right.

Task Step-by-step:

Modify function cleanedup so that it keeps not only letters, but also digits 0123456789 and symbols @ and _

Write a new function findMentions that takes a filename as a parameter and reports 3 usernames most frequently mentioned in that file. The function should create a dictionary of counts for all username mentions (words starting with @). After reading through the file and accumulating the counts for all mentioned usernames, use the dictionary to create a list like this:

[[15, '@alice'], [20, '@bob'], [7, '@carol'], ... ]

Use sort to sort the above list and print out 3 most frequently mentioned usernames.

Check each file in the current folder (using os.listdir('.')), if the file name ends with .tweets, call findMentions on the file to find its most frequent mentions.
Reply
#4
How files look like? Some sample rows would do.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
def cleanedup(s):
    alphabet='abcdefghijklmnopqrstuvwxyz@_0123456789'
    cleantext= ''
    for character in s.lower():
           if character in alphabet:
                 cleantext+= character
           else:
                 cleantext+=' '
    return cleantext

import os

def contains(filename):
    counts={}
    mentionedWords=[]
    with open(filename) as file:
             for word in cleanedup(line).split():
                if filename[2] =='@':
                    if word in counts:
                        counts[word]+=1
                    else:
                        counts[word]=1
                print("fn=",filename)

    for word in counts:
           mentionedWords.append([counts[word],word])

    for filename in os.listdir('.'):
        if filename[-3:]=='.tweets':
            contains (filename)
                
    mentionedWords.sort()

    
    for item in mentionedWords[-3:]:
        print ('  ', item [1], item[0])
@perfringo

Okay, my code as of right now. I'm missing something in between line 32-36.
The only hint I was given to fix my code was this:

To print the username followed by its count, remember that each item in mentionedWords in your program is a list, and the unit 4 "Learn" PDF starting on p. 8 explains that items in a list can be accessed by index. For example, suppose you have a list x = [123, 'abc']. Then, x[0] is 123 and x[1] is 'abc'. If you want to print the second item in the list followed by the first item, i.e. abc followed by 123, you would just do print(x[1], x[0]). So, in your program, write a loop that goes through each item of the slice that represents the 3 most frequently mentioned usernames in mentionedWords.

If code was written correctly output should run like this:

nytimes.tweets
@caityweaver 3
@nytmag 5
@nytparenting 5

justinbieber.tweets
@applemusic 15
@theellenshow 15
@skrillex 20

aoc.tweets
@rashidatlaib 5
@ayannapressley 6
@ilhanmn 9

espn.tweets
@nba 21
@thecheckdown 29
@kingjames 32

rihanna.tweets
@rihanna 21
@savagexfenty 29
@fentybeauty 48

amyschumer.tweets
@bridgeteverett 14
@rachelfeinstein 15
@comedycentral 49

ladygaga.tweets
@ahsfx 10
@btwfoundation 11
@applemusic 13

BillGates.tweets
@theeconomist 11
@warrenbuffett 15
@melindagates 18

BarackObama.tweets
@ofa 5
@vp 5
@michelleobama 9

ID_AA_Carmack.tweets
@boztank 3
@JoeRogan 3
@elonmusk 5

Kaepernick7.tweets
@mikailsprice 26
@darthkaepernick 28
@kaepernick7 138

doctorow.tweets
@cbc 3
@doctorow 3
@sensanders 7
Reply
#6
I would like to understand how input files look like (the ones you need to process).

In your code line # 18 why you look for @ sign in filename and not in a row or word?
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#7
@perfringo

Because my professor gave us a : all-tweets.zip file containing 12 .tweet files we have to run through. W have to run through each file simultaneously and not input one at a time so my professor said use filename.
Use sort to sort the above list and print out 3 most frequently mentioned usernames.

Check each file in the current folder (using os.listdir('.')), if the file name ends with .tweets, call findMentions on the file to find its most frequent mentions.
Reply
#8
I get that. But if one wants to get mentions out of files then it is necessary to know how data is structured in the file. Without it it's guessing game.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#9
@perfringo


This was an example mentioned in class and I think it's similar to I'm asked to do. I think instead of returning something I need to print something. But the loop information in between, I need to change the code for it to match mine to give instructions on what to do. I got lost in my code and I don't know what to input for the code in bold python.

Example from class:

We’ve seen that entries in a dictionary can be accessed using keys. In the same way,
items in a list can be accessed by index, the numerical position of the item, counting
the first position as 0, the next as 1 and so on
. If papers is the list [17, 0, 8, 3, ..., 14],
then papers[0] is 17, papers[1] is 0 and so on, up through papers[29] which is 14. In
this example, the key fact is that papers[3] is 3, showing that Student 3 got his or her
own paper.
Once papers has been shuffled, the following code does the checking we want:
[b]for student in range(classSize):
    if papers[student] == student:
        return 'warning'
return 'okay'[/b]
This checks if papers[0]—the paper given back to Student 0—is 0; then if papers[1]
is 1 and so on
. If a match is found, the function paperStatus immediately
stops—without finishing the for statement—and returns the answer 'warning'. If the
for statement completes, it must be because no match was found. In this case,
paperStatus returns 'okay'.
Reply
#10
@perfringo

https://ww3.hunter.cuny.edu/screencasts/...?video=5.5

Okay he told me to remove the last line from that video and You will need to replace that line with a loop that goes through the 3 most frequently mentioned usernames and prints the username followed by its count. Which I think is the above code I need to edit and change to match my other code? but I forgot what to input bc looking at it for so long got me lost.

I think I understand your first question :
You are given a collection of text files, each containing 1000 recent tweets posted by several popular Twitter accounts. Each line in a file is one tweet, so if we read the files the usual way, each line will corresponds to a separate tweet:

with open('nytimes.tweets') as lines:
for line in lines:
print(line)

but we don't want to open 12 files one by one instead we go through filename for @ and mention the top 3 most mentioned
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Having trouble with my Computer Science task Dunxx 1 1,782 Oct-07-2021, 12:32 PM
Last Post: DeaD_EyE
  computer science coursework, read the text please and tell me if theres any specifics sixcray 4 2,623 Nov-11-2020, 03:17 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020