Python Forum
RegExp: returning 2nd loop in new document
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
RegExp: returning 2nd loop in new document
#1
Hello, I am new to this forum. I joined so that I could ask for help with this program. First, I will detail what it does.

I get these search files that look a bit like this:
!sfshine Robert Silverberg & Karen Haber (ed) - Science Fiction-The Best of 2001.epub ::INFO:: 344.3KB
!sfshine Robert Silverberg & Karen Haber (ed) - Science Fiction-The Best of 2001.zip ::INFO:: 373.7KB
!sfshine Robert Silverberg & Karen Haber (ed) - Science Fiction-The Best of 2002.epub ::INFO:: 493.9KB
!sfshine Robert Silverberg & Karen Haber (ed) - Science Fiction-The Best of 2002.mobi ::INFO:: 546.6KB
!sfshine Robert Silverberg & Karen Haber (ed) - Science Fiction-The Best of 2002.zip ::INFO:: 297.2KB

I only have a use for certain types of files. Now, I can get my script to filter one or several at a time by using a for in loop. I thought that if I added a second loop, I could get them sorted into different piles. So, here is the code:
*****************************
import re

# input the name of the file
input_file_name = input("Enter file name: ")
input_file_name = input_file_name + ".txt"
input_open = open(input_file_name)

# create new file name
file_type_removed = input_file_name[:-4] #  removes .txt from input file name
add_modified_to_new_file = "_modified.txt" #  creates the end part of the new file name
new_text = file_type_removed + add_modified_to_new_file #  puts the file extension and modified to the new file

new_file_create = open(new_text, "w+") # should create a new file with this name

#  finds all the lines with mobi in them and writes those lines to the new file
for line in input_open.readlines():
    if re.search('mobi|azw', line):
        new_file_create.write(line)

#  This second loop will not run or return the results as I expected. Only the first loop 
#  returns a text file with the expected results. I would like to understand why and how 
#  I might fix it.
for line in input_open.readlines():
    if re.search('pdf', line):
        new_file_create.write(line)


# Close the files
new_file_create.close()
input_open.close()
***********************************
I also have a second question that I have been thinking about: I have thought about building a library of authors that
sometimes I want to search for. So, I wondered if using a similar formula to what I have already but as a function where
the loop iterates through the dictionary looking for and returning lines which contain the name of the given author. Is
that a valid way of going about it, or should I be seeking a different path?

Any help is appreciated,
Steven
Reply
#2
You keep on rereading the same file. This is an unscalable code. This one is (I did not provide file names)

with open(<original_list>) as origin, \
       open(<pdfs_list>, 'w') as pdfs,
       open(<mobile_list>, 'w') as mobile:
    target_files = {'mobi': mobile_list, 'azw': mobile_list, 'pdf': pdfs_list}
    for line in origin:
        target = line(r'\.(\w+) ::INFO', line)
        if target:
            target_file = target_files.get(target.group(0))
            if target_file is not None:
                target_file.write(line)
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#3
(May-02-2018, 07:25 AM)volcano63 Wrote: You keep on rereading the same file. This is an unscalable code. This one is (I did not provide file names)

I understood and expected that. What I expected:

read the file first time - output all lines containing mobi/azw
read the file second time - output all lines containing pdf

Quote:
with open(<original_list>) as origin, \
       open(<pdfs_list>, 'w') as pdfs,
       open(<mobile_list>, 'w') as mobile:

Does that not open several files at once? I'm just trying to open one file and have the first results in one pile and the second results after.
Reply
#4
(May-02-2018, 09:06 AM)syoung Wrote: Does that not open several files at once? I'm just trying to open one file and have the first results in one pile and the second results after.

And that is a problem because? Would you rather have inefficient unscalable code ..... ?
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#5
(May-02-2018, 09:06 AM)syoung Wrote: I understood and expected that. What I expected:

read the file first time - output all lines containing mobi/azw
read the file second time - output all lines containing pdf

...

Does that not open several files at once? I'm just trying to open one file and have the first results in one pile and the second results after.
Seems to be you are falling between two stools, on the one hand, not writing 'unix' style where you have small 'functions' each doing small specific jobs, nor, on the other hand, writing efficient code that scales well.

What's wrong with opening several files?

When you attempt to read the first file for the second time, you do not appear to have reset the cursor back to the beginning of the file.

Also, I'm curious why you add .txt to the file name, then strip it off. Why not retain the root name and add to it as required? You seem to have some redundant steps anyway.
I am trying to help you, really, even if it doesn't always seem that way
Reply
#6
(May-02-2018, 12:19 PM)gruntfutuk Wrote: Seems to be you are falling between two stools, on the one hand, not writing 'unix' style where you have small 'functions' each doing small specific jobs, nor, on the other hand, writing efficient code that scales well.

What's wrong with opening several files?

When you attempt to read the first file for the second time, you do not appear to have reset the cursor back to the beginning of the file.

Also, I'm curious why you add .txt to the file name, then strip it off. Why not retain the root name and add to it as required? You seem to have some redundant steps anyway.

"you do not appear to have reset the cursor back to the beginning of the file"

ok, so how would I do that?

For now, I'm not really into turning it into a function. I take your criticism of it having some redundancy. I will fiddle with it.

I was planning on turning it into a function for a kind of next stage: creating a dictionary of names that I would be searching for. But I didn't know about the 'reset cursor' thing and figured if what I had already done wasn't working my idea for making a library of authors to check more comprehensive downloaded library lists. But I wasn't even sure if that would be the right way to do it: using loops with regexpressions to go through an entire txt file looking for one author at a time. The comprehensive library txt files can be quite long. Anyways, I'll see if I can Google this 'reset cursor' thing.

Thanks for the information, btw. It's very useful for me.

(May-02-2018, 12:19 PM)gruntfutuk Wrote: [quote='syoung' pid='45809' dateline='1525251983'] Seems to be you are falling between two stools, on the one hand, not writing 'unix' style where you have small 'functions' each doing small specific jobs, nor, on the other hand, writing efficient code that scales well.

Your 'reset the cursor' worked like a charm.

I am fairly new to Python and programming. I started learning it last January.

Thanks again, and if you have advice concerning the library idea above, let me know.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  WHILE LOOP NOT RETURNING USER INPUT AFTER ZerroDivisionError! HELP! ayodele_martins1 7 990 Oct-01-2023, 07:36 PM
Last Post: ayodele_martins1
  For Loop Returning 3 Results When There Should Be 1 knight2000 12 3,981 Sep-27-2021, 03:18 AM
Last Post: SamHobbs
  returning values in for loop Nickd12 4 11,844 Dec-17-2020, 03:51 AM
Last Post: snippsat
  Check for funny characters with a regexp bertilow 4 2,711 Jan-19-2020, 10:16 AM
Last Post: bertilow
  Returning true or false in a for loop bbop1232012 3 8,036 Nov-22-2018, 04:44 PM
Last Post: bbop1232012
  Regexp that won't match anything Ofnuts 4 4,046 Mar-17-2017, 02:48 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020