Posts: 9
Threads: 4
Joined: Feb 2022
Hello everyone,
I have multiple text files with a lot of lines.
Inside I have numbers separate with space or multiple spaces.
For example :
1 2 3 4 5 9
7 10 15 8 87
14 58 69 10 100
To simplify, let's say we have 3 files. I would like to do the following steps :
- Open the first file
- Read line by line and find all numbers.
- Append these numbers to a list of integers, but keep the same order.
- Then attach this list to a dictionary
- Do the same tasks for the 2 others files and then save the dictionary which contains the 3 lists in a text file.
Thank you for your help
Posts: 2,168
Threads: 35
Joined: Sep 2016
Aug-07-2022, 02:57 PM
(This post was last modified: Aug-07-2022, 03:00 PM by Yoriz.)
What have you tried so far, the forum will help you with your code but will not code it for you.
See the Homework and No Effort Questions link in my signature.
Posts: 453
Threads: 16
Joined: Jun 2022
Reading a file, line by line, is not too hard:
with open('file_one.txt', 'r') as reader:
for line in reader:
print(line, end='') Why don't you try and code something up and post it back. That way OPs can see what kind of skills you have and advise you, based on what your skill level appears to be.
Sig:
>>> import this
The UNIX philosophy: "Do one thing, and do it well."
"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse
"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Posts: 9
Threads: 4
Joined: Feb 2022
import os, inquirer, glob, shutil, datetime, pandas, re
from subprocess import *
from typing import List
from inquirer.themes import GreenPassion
from pathlib import Path
Odb_File_Path = str(os.getcwd())
Path = {}
Inp_File_Selected_WOext = ['File_001.inp', 'File_002.inp', 'File_003.inp']
Inp_Short_Names_File = ["Path_1.inp","Path_2.inp","Path_2.inp" ]
for i in range(len(Inp_Short_Names_File)):
Numbers = []
inFile = open(Inp_File_Selected_WOext[i])
outFile = open(Inp_Short_Names_File[i], "w")
keepCurrentSet = False
for line in inFile:
if line.startswith("*"):
keepCurrentSet = False
if keepCurrentSet:
outFile.write(line)
if line.startswith("*Nset, nset=PATH, unsorted"):
keepCurrentSet = True
inFile.close()
outFile.close()
with open(Inp_Short_Names_File[i]) as f:
lines = f.read()
with open(Inp_Short_Names_File[i], "w") as f:
for line in lines:
f.write(re.sub(',', '', line))
with open(Inp_Short_Names_File[i]) as f:
lines = f.read()
for z in lines.split():
if z.isdigit():
Numbers.append(int(z))
with open(Inp_Short_Names_File[i], "w") as f:
for line in lines:
f.write(str(Numbers))
with open(Inp_Short_Names_File[i],"r") as f:
lines = f.readlines()
Path[i] = lines
OutputFile = open(r'output.inp',"w")
OutputFile.write(str(Path))
OutputFile.close
it seems that when I'm extracting the numbers he creates a list correctly, but it writes this list 3 times in each file
The files "File_001.inp" in my code look like this, and I want to copy just the numbers between the 2 *
Text
..
..
...
..
..
*Nset, nset=PATH, unsorted
13, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735
736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751
752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767
3434, 3435, 3436, 3437, 128, 128, 3357, 3358, 3359, 3360, 3361, 3362, 3363, 122, 122, 3243
3244, 3245, 3246, 3247, 121, 121, 112, 112, 3099, 3100, 3101, 3102, 99, 99, 2831, 2832
2833, 2834, 2835, 2836, 2837, 2838, 2839, 2840, 2841, 2842, 2843, 2844, 2845, 2846, 2847, 2848
2849, 2850, 2851, 2852, 2853, 2854, 2855, 2856, 2857, 2858, 2859, 2860, 2861, 2862, 2863, 2864
2865, 2866, 2867, 2868, 2869, 2870, 2871, 2872, 2873, 2874, 2875, 2876, 2877, 2878, 2879, 2880
2881, 2882, 2883, 2884, 2885, 2886, 2887, 2888, 2889, 88, 88, 2506, 2507, 2508, 2509, 2510
2511, 2512, 2513, 2514, 2515, 2516, 2517, 2518, 2519, 2520, 2521, 2522, 2523, 2524, 2525, 2526
2527, 2528, 2529, 2530, 2531, 2532, 2533, 2534, 2535, 2536, 2537, 2538, 2539, 2540, 2541, 2542
2543, 2544, 2545, 2546, 2547, 2548, 2549, 2550, 2551, 2552, 2553, 2554, 2555, 2556, 2557, 2558
2559, 2560, 2561, 2562, 2563, 2564, 72
*Text
....
...
...
..
Text
Posts: 6,778
Threads: 20
Joined: Feb 2020
Aug-07-2022, 09:45 PM
(This post was last modified: Aug-07-2022, 09:45 PM by deanhystad.)
I don't understand what you mean by "save the dictionary in a text file"? What file format do you want to use? What are the keys in the dictionary?
This code saves the dictionary as a json format file. For keys I use the filename of the input file.
import json
import re
integer_pattern = re.compile("[+-]?[0-9]+")
def get_numberes_from_file(filename):
numbers = []
with open(filename, "r") as file:
for line in file:
if line.startswith("*Nset"):
break
for line in file:
if line.startswith("*Text"):
break
numbers += map(int, re.findall(integer_pattern, line))
print(numbers)
return numbers
input_files = ["test.txt", "test2.txt", "test3.txt"]
numbers = {}
for filename in input_files:
numbers[filename] = get_numberes_from_file(filename)
with open("output.inp", "w") as file:
json.dump(numbers, file, indent=4) I don't understand what you were doing with the short name output files.
oldtrafford and rob101 like this post
Posts: 1,088
Threads: 143
Joined: Jul 2017
Aug-07-2022, 11:24 PM
(This post was last modified: Aug-07-2022, 11:25 PM by Pedroski55.)
@ deanhystad I often copy stuff from you experts here and try it out at home. It's a good way to learn.
I made a text file with some text. Added some numbers on each line, then copied all the numbers from above in as well.
But when I try your code, it returns an empty list. Obviously, I'm doing something wrong, but I can't see what. Could you help?
Quote:>>> for filename in input_files:
numbers[filename] = get_numberes_from_file(filename)
[]
import json
import re
path2text = '/home/pedro/temp/'
myfile = 'test_number_finder.txt'
integer_pattern = re.compile("[+-]?[0-9]+")
def get_numberes_from_file(filename):
numbers = []
with open(path2text + filename, "r") as file:
for line in file:
if line.startswith("*Nset"):
break
for line in file:
if line.startswith("*Text"):
break
numbers += map(int, re.findall(integer_pattern, line))
print(numbers)
return numbers
input_files = ['test_number_finder.txt']
numbers = {}
for filename in input_files:
numbers[filename] = get_numberes_from_file(filename)
Posts: 6,778
Threads: 20
Joined: Feb 2020
Aug-08-2022, 02:34 AM
(This post was last modified: Aug-08-2022, 03:03 AM by deanhystad.)
The file needs to look like the example posted by oldtrafford.
Output: Text
..
..
...
..
..
*Nset, nset=PATH, unsorted
13, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735
...
2559, 2560, 2561, 2562, 2563, 2564, 72
*Text
....
...
...
..
Text
I wouldn't be surprised if the starting mark should be Nset, not *Nset, and the ending mark Text, not *Text. Looking at it again I think maybe the numbers start after the Nset line and continue until there is a line that starts with text. Maybe this is a better fit:
import json
import re
integer_pattern = re.compile("[+-]?[0-9]+")
def get_numberes_from_file(filename):
numbers = []
with open(filename, "r") as file:
for line in file:
# Marks the beginning of int data
if line.startswith("Nset"):
break
for line in file:
# Read lines until encounter line without numbers
matches = map(int, re.findall(integer_pattern, line))
if matches:
numbers += matches
else:
break
return numbers
input_files = ["test.txt", "test2.txt", "test3.txt"]
numbers = {}
for filename in input_files:
numbers[filename] = get_numberes_from_file(filename)
with open("output.inp", "w") as file:
json.dump(numbers, file, indent=4)
Posts: 9
Threads: 4
Joined: Feb 2022
(Aug-07-2022, 09:45 PM)deanhystad Wrote: I don't understand what you mean by "save the dictionary in a text file"? What file format do you want to use? What are the keys in the dictionary?
This code saves the dictionary as a json format file. For keys I use the filename of the input file.
import json
import re
integer_pattern = re.compile("[+-]?[0-9]+")
def get_numberes_from_file(filename):
numbers = []
with open(filename, "r") as file:
for line in file:
if line.startswith("*Nset"):
break
for line in file:
if line.startswith("*Text"):
break
numbers += map(int, re.findall(integer_pattern, line))
print(numbers)
return numbers
input_files = ["test.txt", "test2.txt", "test3.txt"]
numbers = {}
for filename in input_files:
numbers[filename] = get_numberes_from_file(filename)
with open("output.inp", "w") as file:
json.dump(numbers, file, indent=4) I don't understand what you were doing with the short name output files.
Thank you very much :), it's exactly what I needed to make my program work. you save my day :)
Posts: 1,088
Threads: 143
Joined: Jul 2017
Figured it out!
I did not have *Nset or *Text
When you read the lines like this:
with open(path2text + filename, "r") as file:
for line in file:
#print(line)
if line.startswith("*Nset"):
break you have one of those 1 time use things, like csv.reader(), use it then lose it. (Don't know exactly why that happens, maybe someone could explain??)
Because the *Nset was not found, it read the whole of file, then file was dead.
The next loop had nothing to read.
This reduced function found all the numbers:
def get_numberes_from_file(filename):
numbers = []
with open(path2text + filename, "r") as file:
for line in file:
print(line)
if line.startswith("*Text"):
break
numbers += map(int, re.findall(integer_pattern, line))
print(numbers)
return numbers
Posts: 6,778
Threads: 20
Joined: Feb 2020
I don't know what you mean by
Quote:you have one of those 1 time use things, like csv.reader(), use it then lose it
Are you talking about the context manager?
with open(filename, "r") as file: If so, you can read about context managers online.
https://book.pythontips.com/en/latest/co...agers.html
Essentially this:
with open(filename, "r") as file:
# do stuff with file is the same as
file = open(filename, "r")
try:
# do stuff with file
finally:
file.close()
|