Python Forum

Full Version: read a text file, find all integers, append to list
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Hello everyone,

I have multiple text files with a lot of lines.
Inside I have numbers separate with space or multiple spaces.
For example :

1 2 3 4 5 9
7 10 15 8 87
14 58 69 10 100

To simplify, let's say we have 3 files. I would like to do the following steps :
  1. Open the first file
  2. Read line by line and find all numbers.
  3. Append these numbers to a list of integers, but keep the same order.
  4. Then attach this list to a dictionary

  5. Do the same tasks for the 2 others files and then save the dictionary which contains the 3 lists in a text file.

Thank you for your help
What have you tried so far, the forum will help you with your code but will not code it for you.
See the Homework and No Effort Questions link in my signature.
Reading a file, line by line, is not too hard:

with open('file_one.txt', 'r') as reader:
    for line in reader:
        print(line, end='')
Why don't you try and code something up and post it back. That way OPs can see what kind of skills you have and advise you, based on what your skill level appears to be.
import os, inquirer, glob, shutil, datetime, pandas, re
from subprocess import *
from typing import List
from inquirer.themes import GreenPassion
from pathlib import Path

Odb_File_Path = str(os.getcwd())

Path = {}
Inp_File_Selected_WOext = ['File_001.inp', 'File_002.inp', 'File_003.inp']
Inp_Short_Names_File = ["Path_1.inp","Path_2.inp","Path_2.inp" ]

for i in range(len(Inp_Short_Names_File)):
    Numbers = []
    inFile = open(Inp_File_Selected_WOext[i])
    
    outFile = open(Inp_Short_Names_File[i], "w")
    
    keepCurrentSet = False
    for line in inFile:
        if line.startswith("*"):
            keepCurrentSet = False

        if keepCurrentSet:
            outFile.write(line)

        if line.startswith("*Nset, nset=PATH, unsorted"):
            keepCurrentSet = True
    inFile.close()
    outFile.close()

    with open(Inp_Short_Names_File[i]) as f:
        lines = f.read()
        
    with open(Inp_Short_Names_File[i], "w") as f:
        for line in lines:
            f.write(re.sub(',', '', line))    

    with open(Inp_Short_Names_File[i]) as f:
        lines = f.read()
        for z in lines.split():
           if z.isdigit():
              Numbers.append(int(z))
          
    with open(Inp_Short_Names_File[i], "w") as f:
        for line in lines:
            f.write(str(Numbers))             
   
    with open(Inp_Short_Names_File[i],"r") as f:
        lines = f.readlines()            
        Path[i] = lines

          

     
OutputFile = open(r'output.inp',"w")
OutputFile.write(str(Path))
OutputFile.close

it seems that when I'm extracting the numbers he creates a list correctly, but it writes this list 3 times in each file
The files "File_001.inp" in my code look like this, and I want to copy just the numbers between the 2 *

Text
..
..
...
..
..
*Nset, nset=PATH, unsorted
13, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735
736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751
752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767
3434, 3435, 3436, 3437, 128, 128, 3357, 3358, 3359, 3360, 3361, 3362, 3363, 122, 122, 3243
3244, 3245, 3246, 3247, 121, 121, 112, 112, 3099, 3100, 3101, 3102, 99, 99, 2831, 2832
2833, 2834, 2835, 2836, 2837, 2838, 2839, 2840, 2841, 2842, 2843, 2844, 2845, 2846, 2847, 2848
2849, 2850, 2851, 2852, 2853, 2854, 2855, 2856, 2857, 2858, 2859, 2860, 2861, 2862, 2863, 2864
2865, 2866, 2867, 2868, 2869, 2870, 2871, 2872, 2873, 2874, 2875, 2876, 2877, 2878, 2879, 2880
2881, 2882, 2883, 2884, 2885, 2886, 2887, 2888, 2889, 88, 88, 2506, 2507, 2508, 2509, 2510
2511, 2512, 2513, 2514, 2515, 2516, 2517, 2518, 2519, 2520, 2521, 2522, 2523, 2524, 2525, 2526
2527, 2528, 2529, 2530, 2531, 2532, 2533, 2534, 2535, 2536, 2537, 2538, 2539, 2540, 2541, 2542
2543, 2544, 2545, 2546, 2547, 2548, 2549, 2550, 2551, 2552, 2553, 2554, 2555, 2556, 2557, 2558
2559, 2560, 2561, 2562, 2563, 2564, 72
*Text
....
...
...
..
Text
I don't understand what you mean by "save the dictionary in a text file"? What file format do you want to use? What are the keys in the dictionary?

This code saves the dictionary as a json format file. For keys I use the filename of the input file.
import json
import re

integer_pattern = re.compile("[+-]?[0-9]+")

def get_numberes_from_file(filename):
    numbers = []
    with open(filename, "r") as file:
        for line in file:
            if line.startswith("*Nset"):
                break
        for line in file:
            if line.startswith("*Text"):
                break
            numbers += map(int, re.findall(integer_pattern, line))
    print(numbers)
    return numbers

input_files = ["test.txt", "test2.txt", "test3.txt"]
numbers = {}
for filename in input_files:
    numbers[filename] = get_numberes_from_file(filename)

with open("output.inp", "w") as file:
    json.dump(numbers, file, indent=4)
I don't understand what you were doing with the short name output files.
@deanhystad I often copy stuff from you experts here and try it out at home. It's a good way to learn.

I made a text file with some text. Added some numbers on each line, then copied all the numbers from above in as well.

But when I try your code, it returns an empty list. Obviously, I'm doing something wrong, but I can't see what. Could you help?

Quote:>>> for filename in input_files:
numbers[filename] = get_numberes_from_file(filename)


[]

import json
import re

path2text = '/home/pedro/temp/'
myfile = 'test_number_finder.txt'
 
integer_pattern = re.compile("[+-]?[0-9]+")
 
def get_numberes_from_file(filename):
    numbers = []
    with open(path2text + filename, "r") as file:
        for line in file:
            if line.startswith("*Nset"):
                break
        for line in file:
            if line.startswith("*Text"):
                break
            numbers += map(int, re.findall(integer_pattern, line))
    print(numbers)
    return numbers
 
input_files = ['test_number_finder.txt']
numbers = {}
for filename in input_files:
    numbers[filename] = get_numberes_from_file(filename)
The file needs to look like the example posted by oldtrafford.
Output:
Text .. .. ... .. .. *Nset, nset=PATH, unsorted 13, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735 ... 2559, 2560, 2561, 2562, 2563, 2564, 72 *Text .... ... ... .. Text
I wouldn't be surprised if the starting mark should be Nset, not *Nset, and the ending mark Text, not *Text. Looking at it again I think maybe the numbers start after the Nset line and continue until there is a line that starts with text. Maybe this is a better fit:
import json
import re
 
integer_pattern = re.compile("[+-]?[0-9]+")
 
def get_numberes_from_file(filename):
    numbers = []
    with open(filename, "r") as file:
        for line in file:
            # Marks the beginning of int data
            if line.startswith("Nset"):
                break
        for line in file:
            # Read lines until encounter line without numbers
            matches = map(int, re.findall(integer_pattern, line))
            if matches:
                numbers += matches
            else:
                break
    return numbers
 
input_files = ["test.txt", "test2.txt", "test3.txt"]
numbers = {}
for filename in input_files:
    numbers[filename] = get_numberes_from_file(filename)
 
with open("output.inp", "w") as file:
    json.dump(numbers, file, indent=4)
(Aug-07-2022, 09:45 PM)deanhystad Wrote: [ -> ]I don't understand what you mean by "save the dictionary in a text file"? What file format do you want to use? What are the keys in the dictionary?

This code saves the dictionary as a json format file. For keys I use the filename of the input file.
import json
import re

integer_pattern = re.compile("[+-]?[0-9]+")

def get_numberes_from_file(filename):
    numbers = []
    with open(filename, "r") as file:
        for line in file:
            if line.startswith("*Nset"):
                break
        for line in file:
            if line.startswith("*Text"):
                break
            numbers += map(int, re.findall(integer_pattern, line))
    print(numbers)
    return numbers

input_files = ["test.txt", "test2.txt", "test3.txt"]
numbers = {}
for filename in input_files:
    numbers[filename] = get_numberes_from_file(filename)

with open("output.inp", "w") as file:
    json.dump(numbers, file, indent=4)
I don't understand what you were doing with the short name output files.

Thank you very much :), it's exactly what I needed to make my program work. you save my day :)
Figured it out!
I did not have *Nset or *Text

When you read the lines like this:

with open(path2text + filename, "r") as file:
        for line in file:
            #print(line)
            if line.startswith("*Nset"):
                break
you have one of those 1 time use things, like csv.reader(), use it then lose it. (Don't know exactly why that happens, maybe someone could explain??)

Because the *Nset was not found, it read the whole of file, then file was dead.

The next loop had nothing to read.

This reduced function found all the numbers:

def get_numberes_from_file(filename):
    numbers = []
    with open(path2text + filename, "r") as file:        
        for line in file:
            print(line)
            if line.startswith("*Text"):
                break
            numbers += map(int, re.findall(integer_pattern, line))
    print(numbers)
    return numbers
I don't know what you mean by
Quote:you have one of those 1 time use things, like csv.reader(), use it then lose it
Are you talking about the context manager?
with open(filename, "r") as file:
If so, you can read about context managers online.

https://book.pythontips.com/en/latest/co...agers.html

Essentially this:
with open(filename, "r") as file:
    # do stuff with file
is the same as
file = open(filename, "r")
try:
    # do stuff with file
finally:
    file.close()
Pages: 1 2