I never worked with Python before, but now there is a task. There are several Python projects and I need to extract comments from the source code of these projects.
For comments in Python, either # or strings that are not used anywhere are used. If everything is clear with #, then about strings it's not so simple. Since it is necessary to distinguish those strings that are used (for example, for variables or in expressions) from unused strings.
After conducting several experiments in an online compiler, I think about the following algorithm.
1. IF there are no characters (except whitespaces) before the opening quotes (on the line of code where these quotes are)
2. AND IF there are no characters (except whitespaces) after the closing quotes (on the line where these quotes are)
3. AND IF the line is not between parentheses ()
4. AND IF the previous line of code does not end with \
then this is a comment string.
Example:
For comments in Python, either # or strings that are not used anywhere are used. If everything is clear with #, then about strings it's not so simple. Since it is necessary to distinguish those strings that are used (for example, for variables or in expressions) from unused strings.
After conducting several experiments in an online compiler, I think about the following algorithm.
1. IF there are no characters (except whitespaces) before the opening quotes (on the line of code where these quotes are)
2. AND IF there are no characters (except whitespaces) after the closing quotes (on the line where these quotes are)
3. AND IF the line is not between parentheses ()
4. AND IF the previous line of code does not end with \
then this is a comment string.
Example:
a = "This is NOT a comment! " b = (a + """ This is NOT a comment! """ ) c = a + \ """ This is NOT a comment!! """ ''' And this is already a comment '''Please tell me, is this algorithm correct or not? Maybe it needs to be adjusted in some way?"