Python Forum
read a text file, find all integers, append to list
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
read a text file, find all integers, append to list
#1
Hello everyone,

I have multiple text files with a lot of lines.
Inside I have numbers separate with space or multiple spaces.
For example :

1 2 3 4 5 9
7 10 15 8 87
14 58 69 10 100

To simplify, let's say we have 3 files. I would like to do the following steps :
  1. Open the first file
  2. Read line by line and find all numbers.
  3. Append these numbers to a list of integers, but keep the same order.
  4. Then attach this list to a dictionary

  5. Do the same tasks for the 2 others files and then save the dictionary which contains the 3 lists in a text file.

Thank you for your help
Reply
#2
What have you tried so far, the forum will help you with your code but will not code it for you.
See the Homework and No Effort Questions link in my signature.
Reply
#3
Reading a file, line by line, is not too hard:

with open('file_one.txt', 'r') as reader:
    for line in reader:
        print(line, end='')
Why don't you try and code something up and post it back. That way OPs can see what kind of skills you have and advise you, based on what your skill level appears to be.
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#4
import os, inquirer, glob, shutil, datetime, pandas, re
from subprocess import *
from typing import List
from inquirer.themes import GreenPassion
from pathlib import Path

Odb_File_Path = str(os.getcwd())

Path = {}
Inp_File_Selected_WOext = ['File_001.inp', 'File_002.inp', 'File_003.inp']
Inp_Short_Names_File = ["Path_1.inp","Path_2.inp","Path_2.inp" ]

for i in range(len(Inp_Short_Names_File)):
    Numbers = []
    inFile = open(Inp_File_Selected_WOext[i])
    
    outFile = open(Inp_Short_Names_File[i], "w")
    
    keepCurrentSet = False
    for line in inFile:
        if line.startswith("*"):
            keepCurrentSet = False

        if keepCurrentSet:
            outFile.write(line)

        if line.startswith("*Nset, nset=PATH, unsorted"):
            keepCurrentSet = True
    inFile.close()
    outFile.close()

    with open(Inp_Short_Names_File[i]) as f:
        lines = f.read()
        
    with open(Inp_Short_Names_File[i], "w") as f:
        for line in lines:
            f.write(re.sub(',', '', line))    

    with open(Inp_Short_Names_File[i]) as f:
        lines = f.read()
        for z in lines.split():
           if z.isdigit():
              Numbers.append(int(z))
          
    with open(Inp_Short_Names_File[i], "w") as f:
        for line in lines:
            f.write(str(Numbers))             
   
    with open(Inp_Short_Names_File[i],"r") as f:
        lines = f.readlines()            
        Path[i] = lines

          

     
OutputFile = open(r'output.inp',"w")
OutputFile.write(str(Path))
OutputFile.close

it seems that when I'm extracting the numbers he creates a list correctly, but it writes this list 3 times in each file
The files "File_001.inp" in my code look like this, and I want to copy just the numbers between the 2 *

Text
..
..
...
..
..
*Nset, nset=PATH, unsorted
13, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735
736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751
752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767
3434, 3435, 3436, 3437, 128, 128, 3357, 3358, 3359, 3360, 3361, 3362, 3363, 122, 122, 3243
3244, 3245, 3246, 3247, 121, 121, 112, 112, 3099, 3100, 3101, 3102, 99, 99, 2831, 2832
2833, 2834, 2835, 2836, 2837, 2838, 2839, 2840, 2841, 2842, 2843, 2844, 2845, 2846, 2847, 2848
2849, 2850, 2851, 2852, 2853, 2854, 2855, 2856, 2857, 2858, 2859, 2860, 2861, 2862, 2863, 2864
2865, 2866, 2867, 2868, 2869, 2870, 2871, 2872, 2873, 2874, 2875, 2876, 2877, 2878, 2879, 2880
2881, 2882, 2883, 2884, 2885, 2886, 2887, 2888, 2889, 88, 88, 2506, 2507, 2508, 2509, 2510
2511, 2512, 2513, 2514, 2515, 2516, 2517, 2518, 2519, 2520, 2521, 2522, 2523, 2524, 2525, 2526
2527, 2528, 2529, 2530, 2531, 2532, 2533, 2534, 2535, 2536, 2537, 2538, 2539, 2540, 2541, 2542
2543, 2544, 2545, 2546, 2547, 2548, 2549, 2550, 2551, 2552, 2553, 2554, 2555, 2556, 2557, 2558
2559, 2560, 2561, 2562, 2563, 2564, 72
*Text
....
...
...
..
Text
Reply
#5
I don't understand what you mean by "save the dictionary in a text file"? What file format do you want to use? What are the keys in the dictionary?

This code saves the dictionary as a json format file. For keys I use the filename of the input file.
import json
import re

integer_pattern = re.compile("[+-]?[0-9]+")

def get_numberes_from_file(filename):
    numbers = []
    with open(filename, "r") as file:
        for line in file:
            if line.startswith("*Nset"):
                break
        for line in file:
            if line.startswith("*Text"):
                break
            numbers += map(int, re.findall(integer_pattern, line))
    print(numbers)
    return numbers

input_files = ["test.txt", "test2.txt", "test3.txt"]
numbers = {}
for filename in input_files:
    numbers[filename] = get_numberes_from_file(filename)

with open("output.inp", "w") as file:
    json.dump(numbers, file, indent=4)
I don't understand what you were doing with the short name output files.
oldtrafford and rob101 like this post
Reply
#6
@deanhystad I often copy stuff from you experts here and try it out at home. It's a good way to learn.

I made a text file with some text. Added some numbers on each line, then copied all the numbers from above in as well.

But when I try your code, it returns an empty list. Obviously, I'm doing something wrong, but I can't see what. Could you help?

Quote:>>> for filename in input_files:
numbers[filename] = get_numberes_from_file(filename)


[]

import json
import re

path2text = '/home/pedro/temp/'
myfile = 'test_number_finder.txt'
 
integer_pattern = re.compile("[+-]?[0-9]+")
 
def get_numberes_from_file(filename):
    numbers = []
    with open(path2text + filename, "r") as file:
        for line in file:
            if line.startswith("*Nset"):
                break
        for line in file:
            if line.startswith("*Text"):
                break
            numbers += map(int, re.findall(integer_pattern, line))
    print(numbers)
    return numbers
 
input_files = ['test_number_finder.txt']
numbers = {}
for filename in input_files:
    numbers[filename] = get_numberes_from_file(filename)
Reply
#7
The file needs to look like the example posted by oldtrafford.
Output:
Text .. .. ... .. .. *Nset, nset=PATH, unsorted 13, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735 ... 2559, 2560, 2561, 2562, 2563, 2564, 72 *Text .... ... ... .. Text
I wouldn't be surprised if the starting mark should be Nset, not *Nset, and the ending mark Text, not *Text. Looking at it again I think maybe the numbers start after the Nset line and continue until there is a line that starts with text. Maybe this is a better fit:
import json
import re
 
integer_pattern = re.compile("[+-]?[0-9]+")
 
def get_numberes_from_file(filename):
    numbers = []
    with open(filename, "r") as file:
        for line in file:
            # Marks the beginning of int data
            if line.startswith("Nset"):
                break
        for line in file:
            # Read lines until encounter line without numbers
            matches = map(int, re.findall(integer_pattern, line))
            if matches:
                numbers += matches
            else:
                break
    return numbers
 
input_files = ["test.txt", "test2.txt", "test3.txt"]
numbers = {}
for filename in input_files:
    numbers[filename] = get_numberes_from_file(filename)
 
with open("output.inp", "w") as file:
    json.dump(numbers, file, indent=4)
Reply
#8
(Aug-07-2022, 09:45 PM)deanhystad Wrote: I don't understand what you mean by "save the dictionary in a text file"? What file format do you want to use? What are the keys in the dictionary?

This code saves the dictionary as a json format file. For keys I use the filename of the input file.
import json
import re

integer_pattern = re.compile("[+-]?[0-9]+")

def get_numberes_from_file(filename):
    numbers = []
    with open(filename, "r") as file:
        for line in file:
            if line.startswith("*Nset"):
                break
        for line in file:
            if line.startswith("*Text"):
                break
            numbers += map(int, re.findall(integer_pattern, line))
    print(numbers)
    return numbers

input_files = ["test.txt", "test2.txt", "test3.txt"]
numbers = {}
for filename in input_files:
    numbers[filename] = get_numberes_from_file(filename)

with open("output.inp", "w") as file:
    json.dump(numbers, file, indent=4)
I don't understand what you were doing with the short name output files.

Thank you very much :), it's exactly what I needed to make my program work. you save my day :)
Reply
#9
Figured it out!
I did not have *Nset or *Text

When you read the lines like this:

with open(path2text + filename, "r") as file:
        for line in file:
            #print(line)
            if line.startswith("*Nset"):
                break
you have one of those 1 time use things, like csv.reader(), use it then lose it. (Don't know exactly why that happens, maybe someone could explain??)

Because the *Nset was not found, it read the whole of file, then file was dead.

The next loop had nothing to read.

This reduced function found all the numbers:

def get_numberes_from_file(filename):
    numbers = []
    with open(path2text + filename, "r") as file:        
        for line in file:
            print(line)
            if line.startswith("*Text"):
                break
            numbers += map(int, re.findall(integer_pattern, line))
    print(numbers)
    return numbers
Reply
#10
I don't know what you mean by
Quote:you have one of those 1 time use things, like csv.reader(), use it then lose it
Are you talking about the context manager?
with open(filename, "r") as file:
If so, you can read about context managers online.

https://book.pythontips.com/en/latest/co...agers.html

Essentially this:
with open(filename, "r") as file:
    # do stuff with file
is the same as
file = open(filename, "r")
try:
    # do stuff with file
finally:
    file.close()
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Connecting to Remote Server to read contents of a file ChaitanyaSharma 1 127 Today, 07:23 AM
Last Post: Pedroski55
  PyYAML read list of int zisco 2 352 Apr-02-2024, 12:36 PM
Last Post: zisco
  append str to list in dataclass flash77 6 541 Mar-14-2024, 06:26 PM
Last Post: flash77
  Recommended way to read/create PDF file? Winfried 3 2,934 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,524 Nov-09-2023, 10:56 AM
Last Post: mg24
  How to read module/class from list of strings? popular_dog 1 497 Oct-04-2023, 03:08 PM
Last Post: deanhystad
  No matter what I do I get back "List indices must be integers or slices, not list" Radical 4 1,226 Sep-24-2023, 05:03 AM
Last Post: deanhystad
  Program to find Mode of a list PythonBoy 6 1,161 Sep-12-2023, 09:31 AM
Last Post: PythonBoy
  FileNotFoundError: [WinError 2] The system cannot find the file specified NewBiee 2 1,617 Jul-31-2023, 11:42 AM
Last Post: deanhystad
  read file txt on my pc to telegram bot api Tupa 0 1,154 Jul-06-2023, 01:52 AM
Last Post: Tupa

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020