Python for syntax conversion

**Gribouillis** · (This post was last modified: Dec-23-2019, 05:13 PM by Gribouillis.)

The __repr__ gives a nice representation of the object when you try to display it, for example

        
          
          
              
              >>> class Statement:
...     def __init__(self, data):
...         self.data = data
...     def __repr__(self):
...         return "{}({})".format(self.__class__.__name__, self.data)
... 
>>> class Storey(Statement):
...     pass
... 
>>> element = Storey(('STOREY', 'foo', 'BAR', 'baz'))
>>> 
>>> element
Storey(('STOREY', 'foo', 'BAR', 'baz'))

            

        
      

If I don't include the __repr__ in the class definition, the element is displayed as so

        
              >>> class Statement:
...     def __init__(self, data):
...         self.data = data
... 
>>> class Storey(Statement):
...     pass
... 
>>> element = Storey(('STOREY', 'foo', 'BAR', 'baz'))
>>> element
<__main__.Storey object at 0x7ff163ad3898>

Using classes for the various statements will help you in the second step when you will transform these statements in the second language or in intermediary steps if you want to manipulate the data easily. To start with. you can simply define a subclass of Statement for each of the kinds of statements that you meet in the input file. For example if there are statements for doors, then use a class

        
              class Door(Statement):
    pass

These classes are empty for now but you will be free to add features to them later.

It would be a good idea to include an example of a typical input file if you can do that.

kingsman · (This post was last modified: Dec-24-2019, 11:55 AM by kingsman.)

(Dec-23-2019, 05:13 PM)Gribouillis Wrote: These classes are empty for now but you will be free to add features to them later.

It would be a good idea to include an example of a typical input file if you can do that.

There is an obstacle again. I would like to deal with the materials first.
Here are the material statement in a software.

TABLE: "MATERIAL PROPERTIES 01 - GENERAL"
Material=4000Psi Type=Concrete SymType=Isotropic TempDepend=No Color=Magenta Notes="Customary f'c 4000 psi 23/12/2019 2:17:43 pm"
Material=A615Gr60 Type=Rebar SymType=Uniaxial TempDepend=No Color=White Notes="ASTM A615 Grade 60 23/12/2019 2:18:28 pm"
Material=A992Fy50 Type=Steel SymType=Isotropic TempDepend=No Color=Red Notes="ASTM A992 Grade 50 23/12/2019 2:17:43 pm"
Material=C30 Type=Concrete SymType=Isotropic TempDepend=No Color=Blue Notes="Concrete added 23/12/2019 2:18:37 pm"
Material=C45 Type=Concrete SymType=Isotropic TempDepend=No Color=Blue Notes="Concrete added 23/12/2019 2:20:37 pm"
Material=C60 Type=Concrete SymType=Isotropic TempDepend=No Color=Blue Notes="Concrete added 23/12/2019 2:21:13 pm"

TABLE: "MATERIAL PROPERTIES 02 - BASIC MECHANICAL PROPERTIES"
Material=4000Psi UnitWeight=2.40276966513304E-06 UnitMass=2.45014307299925E-10 E1=2534.56354148831 G12=1056.0681422868 U12=0.2 A1=0.0000099
Material=A615Gr60 UnitWeight=7.84904757236607E-06 UnitMass=8.0038007068661E-10 E1=20389.0191580383 A1=0.0000117
Material=A992Fy50 UnitWeight=7.84904757236607E-06 UnitMass=8.0038007068661E-10 E1=20389.0191580383 G12=7841.93044539935 U12=0.3 A1=0.0000117
Material=C30 UnitWeight=2.49830467094493E-06 UnitMass=2.54756172606745E-10 E1=2263.76994673377 G12=943.237477805739 U12=0.2 A1=0.0000099
Material=C45 UnitWeight=2.49830467094493E-06 UnitMass=2.54756172606745E-10 E1=2692.05074746719 G12=1121.68781144466 U12=0.2 A1=0.0000099
Material=C60 UnitWeight=2.49830467094493E-06 UnitMass=2.54756172606745E-10 E1=3059.14857666726 G12=1274.64524027803 U12=0.2 A1=0.0000099

The first three materials are default inside the software so I would not extract the information inside it. Also, a same material has divided into two table so I would like to deal with 'material properties 01' first.

        
          
          
              
              SAP = open('Frame SAP.$2k', 'r')
 
#Using the 'class' for the material statement 01
class SAP_Material_Statement01:
    def __init__(self, Name, Type, Symtype, Tempdepend, Colour, Notes):
        self.Name = Name
        self.Type = Type
        self.Symtype= Symtype
        self.Tempdepend = Tempdepend
        self.Colour = Colour
        self.Notes = Notes
 
    def __repr__(self):
        return '{}(   Material={}   Type={}   SymType={}   TempDepend={}   Color={}   Notes="{}")'.format(self.__class__.__name__, self.Name, self.Type, self.Symtype, self.Tempdepend, self.Colour, self.Notes)
 
#Try to apply the things in it and see whether it is same as the statement inside the file
test = SAP_Material_Statement01('C30', 'Concrete', 'Isotropic', 'No', 'Blue', 'Concrete added 23/12/2019 2:18:37 pm')
#print(test)
 
#empty class (In fact, I dont know how to use it)
class material_(SAP_Material_Statement01):
    pass
 
#Since there are many other kinds of statements, so I would like to extract the things in material prop 01
SAP_material_statement_01 = []
for line in SAP:
    split_01 = line.split('=')     #From the above statement, it start with Material=C30 with 3 space bar infront of it
    #print(split_01)               #Thus, I should split the '='
    split_02 = line.split('   ')   #Since I am dealing with material prop 01, the TempDepend is always =No in the majority cases
    #print(split_02)               #Thus, I split the '   ' and use TempDepend=No for an indication to let me extract the things only in material prop 01
    if split_01[0] == '   Material' and split_02[4] == 'TempDepend=No':
        SAP_material_statement_01.append(line)
print(SAP_material_statement_01)
for line in SAP_material_statement_01:
    print(line.split())

            

        
      

I have stopped in this step. I need all the things after '=' (e.g. C30, Concrete, Isotropic).
However, I do not know the next step

**Gribouillis** · Dec-24-2019, 01:03 PM

Don't give too much structures to the classes at first. The priority is to parse the file. You will only add the necessary code in the classes when you want to actually do something with the data. Here the simple key=value form of the input makes it easy to parse with regular expressions. See this example

        
          
          
              
              import io
import re
 
SAP = io.StringIO('''\
Material=4000Psi Type=Concrete SymType=Isotropic TempDepend=No Color=Magenta Notes="Customary f'c 4000 psi 23/12/2019 2:17:43 pm"
Material=A615Gr60 Type=Rebar SymType=Uniaxial TempDepend=No Color=White Notes="ASTM A615 Grade 60 23/12/2019 2:18:28 pm"
Material=A992Fy50 Type=Steel SymType=Isotropic TempDepend=No Color=Red Notes="ASTM A992 Grade 50 23/12/2019 2:17:43 pm"
Material=C30 Type=Concrete SymType=Isotropic TempDepend=No Color=Blue Notes="Concrete added 23/12/2019 2:18:37 pm"
Material=C45 Type=Concrete SymType=Isotropic TempDepend=No Color=Blue Notes="Concrete added 23/12/2019 2:20:37 pm"
Material=C60 Type=Concrete SymType=Isotropic TempDepend=No Color=Blue Notes="Concrete added 23/12/2019 2:21:13 pm"
''')
 
class Statement:
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return "{}({})".format(self.__class__.__name__, self.data)
 
class SAP_Material_01(Statement):
    pass
 
parsed_file = []
 
for line in SAP:
    L = re.split(r'(\w+)[=]', line.strip())
    assert L[0] == ''
    pairs = {}
    for i in range(1, len(L), 2):
        pairs[L[i]] = L[i+1].strip()
    item = SAP_Material_01(pairs)
    parsed_file.append(item)
     
for item in parsed_file:
    print(item)

            

        
      

Output:SAP_Material_01({'TempDepend': 'No', 'Color': 'Magenta', 'SymType': 'Isotropic', 'Material': '4000Psi', 'Type': 'Concrete', 'Notes': '"Customary f\'c 4000 psi 23/12/2019 2:17:43 pm"'})
SAP_Material_01({'TempDepend': 'No', 'Color': 'White', 'SymType': 'Uniaxial', 'Material': 'A615Gr60', 'Type': 'Rebar', 'Notes': '"ASTM A615 Grade 60 23/12/2019 2:18:28 pm"'})
SAP_Material_01({'TempDepend': 'No', 'Color': 'Red', 'SymType': 'Isotropic', 'Material': 'A992Fy50', 'Type': 'Steel', 'Notes': '"ASTM A992 Grade 50 23/12/2019 2:17:43 pm"'})
SAP_Material_01({'TempDepend': 'No', 'Color': 'Blue', 'SymType': 'Isotropic', 'Material': 'C30', 'Type': 'Concrete', 'Notes': '"Concrete added 23/12/2019 2:18:37 pm"'})
SAP_Material_01({'TempDepend': 'No', 'Color': 'Blue', 'SymType': 'Isotropic', 'Material': 'C45', 'Type': 'Concrete', 'Notes': '"Concrete added 23/12/2019 2:20:37 pm"'})
SAP_Material_01({'TempDepend': 'No', 'Color': 'Blue', 'SymType': 'Isotropic', 'Material': 'C60', 'Type': 'Concrete', 'Notes': '"Concrete added 23/12/2019 2:21:13 pm"'})

kingsman · (This post was last modified: Dec-26-2019, 12:48 PM by kingsman.)

(Dec-24-2019, 01:03 PM)Gribouillis Wrote:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

import io
import re

SAP = io.StringIO('''\
Material=4000Psi Type=Concrete SymType=Isotropic TempDepend=No Color=Magenta Notes="Customary f'c 4000 psi 23/12/2019 2:17:43 pm"
Material=A615Gr60 Type=Rebar SymType=Uniaxial TempDepend=No Color=White Notes="ASTM A615 Grade 60 23/12/2019 2:18:28 pm"
Material=A992Fy50 Type=Steel SymType=Isotropic TempDepend=No Color=Red Notes="ASTM A992 Grade 50 23/12/2019 2:17:43 pm"
Material=C30 Type=Concrete SymType=Isotropic TempDepend=No Color=Blue Notes="Concrete added 23/12/2019 2:18:37 pm"
Material=C45 Type=Concrete SymType=Isotropic TempDepend=No Color=Blue Notes="Concrete added 23/12/2019 2:20:37 pm"
Material=C60 Type=Concrete SymType=Isotropic TempDepend=No Color=Blue Notes="Concrete added 23/12/2019 2:21:13 pm"
''')

class Statement:
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return "{}({})".format(self.__class__.__name__, self.data)

class SAP_Material_01(Statement):
    pass

parsed_file = []

for line in SAP:
    L = re.split(r'(\w+)[=]', line.strip())
    assert L[0] == ''
    pairs = {}
    for i in range(1, len(L), 2):
        pairs[L[i]] = L[i+1].strip()
    item = SAP_Material_01(pairs)
    parsed_file.append(item)

for item in parsed_file:
    print(item)

I get what the code is doing here. However, there are many other information in the file.
The L[0] is not ''. L[0] is all the title of the tables. I can't get the data inside it.
Maybe I can show you all the data inside so that we can discuss clearer.
https://textuploader.com/1oyau

**Gribouillis** · (This post was last modified: Dec-26-2019, 08:26 PM by Gribouillis.)

As @buran said before, the input language is very structured. It is a list of tables which rows consist of pairs key/value. Apparently, a table row that terminates with an underscore _ means that the row is continued on the next line.

I created a code below that transform the input file into a python script containing simple structured data: A list of items each of which represents a table. The items are pairs with a table name and a list of rows. Each row is a list of pairs of python strings representing a key and a value.

Here is the code that does this transformation. It is short but it is not yet well documented. I suggest that you try it with the input files that you have to see if it works. Its name is base_parsing.py. I execute it in a terminal with the command

Output:
python3 base_parsing.py PATH_TO_THE_INPUT_FILE

currently it prints is output to the console but you can redirect it to a file foo.py (I don't know how you do that in windows)

        
          
          
              
              from collections import namedtuple
from pprint import pformat
import re
import sys
 
TableRecord = namedtuple('TableRecord', ('table_name', 'rows'))
 
def table_data_lines(infile):
    for lineno, line in enumerate(infile, 1):
        if line.startswith('END TABLE DATA'):
            break
        line = line.strip()
        if line:
            yield lineno, line
 
def read_table(table_name, sequence, parsed_file):
    name = None
    rows = []
    last_row = []
    for lineno, line in sequence:
        if line.startswith('TABLE:'):
            name = line[6:].strip().strip('"')
            break
        continued = line.endswith('_')
        if continued:
            line = line[:-1].rstrip()
        L = re.split(r'(\w+)[=]', line)
        assert L[0] == ''
        last_row.extend((L[i], L[i+1].strip()) for i in range(1, len(L), 2))
        if not continued:
            rows.append(last_row)
            last_row = []
    parsed_file.append(TableRecord(table_name, rows))
    return name
 
def parse_file(infile):
    parsed_file = []
    sequence = table_data_lines(infile)
    try:
        lineno, line = next(sequence)
    except StopIteration:
        return parsed_file
    assert line.startswith('TABLE:')
    table_name = line[6:].strip().strip('"')
    while table_name:
        table_name = read_table(table_name, sequence, parsed_file)
    return parsed_file
 
def create_python_script(parsed_file, outfile=sys.stdout):
    from functools import partial
    print = partial(__builtins__.print, file=outfile)
    print("from collections import namedtuple")
    print("\nTableRecord = namedtuple('TableRecord', ('table_name', 'rows'))")
    print('\nparsed_file = [')
    for tr in parsed_file:
        print('    TableRecord(table_name={}, rows=['.format(
                                            repr(tr.table_name)))
        for row in tr.rows:
            print('{},'.format(pformat(row)))
        print(']), # end of TableRecord')
    print('] # end of parsed_file')
     
 
def main():
    if len(sys.argv) != 2:
        print('Error: Usage: program filename.$2k')
        sys.exit(-1)
 
    with open(sys.argv[1]) as infile:
        parsed_file = parse_file(infile)
     
    create_python_script(parsed_file)
     
if __name__ == '__main__':
    main()

            

        
      

The output is a python module that contains the data and that can be directly imported. That way, you can transform your data file into intermediary python files, which can be a huge step towards the translation

The result looks like this

        
          
          
              
              from collections import namedtuple
 
TableRecord = namedtuple('TableRecord', ('table_name', 'rows'))
 
parsed_file = [
    TableRecord(table_name='ACTIVE DEGREES OF FREEDOM', rows=[
[('UX', 'Yes'),
 ('UY', 'Yes'),
 ('UZ', 'Yes'),
 ('RX', 'Yes'),
 ('RY', 'Yes'),
 ('RZ', 'Yes')],
]), # end of TableRecord
    TableRecord(table_name='ANALYSIS OPTIONS', rows=[
[('Solver', 'Advanced'),
 ('SolverProc', 'Auto'),
 ('Force32Bit', 'No'),
 ('StiffCase', 'None'),
 ('GeomMod', 'No')],
]), # end of TableRecord
    TableRecord(table_name='AUTO WAVE 3 - WAVE CHARACTERISTICS - GENERAL', rows=[
[('WaveChar', 'Default'),
 ('WaveType', '"From Theory"'),
 ('KinFactor', '1'),
 ('SWaterDepth', '45000'),
 ('WaveHeight', '18000'),
 ('WavePeriod', '12'),
 ('WaveTheory', 'Linear')],
]), # end of TableRecord
...
...
...
[('RebarID', 'N24'),
 ('Area', '452.00001881498'),
 ('Diameter', '24.0000003604438')],
[('RebarID', 'N28'),
 ('Area', '616.000025641654'),
 ('Diameter', '28.0000004205178')],
[('RebarID', 'N32'),
 ('Area', '804.000033467353'),
 ('Diameter', '32.0000004805918')],
[('RebarID', 'N36'),
 ('Area', '1020.00004245858'),
 ('Diameter', '36.0000005406658')],
]), # end of TableRecord
] # end of parsed_file

            

        
      

If you save the result in a file foo.py, you can then directly import the data like

        
              from foo import TableRecord, parsed_file

and you can then start working on producing the translated file.

kingsman · Dec-27-2019, 01:07 PM

(Dec-26-2019, 05:50 PM)Gribouillis Wrote:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

from collections import namedtuple
from pprint import pformat
import re
import sys

TableRecord = namedtuple('TableRecord', ('table_name', 'rows'))

def table_data_lines(infile):
    for lineno, line in enumerate(infile, 1):
        if line.startswith('END TABLE DATA'):
            break
        line = line.strip()
        if line:
            yield lineno, line

def read_table(table_name, sequence, parsed_file):
    name = None
    rows = []
    last_row = []
    for lineno, line in sequence:
        if line.startswith('TABLE:'):
            name = line[6:].strip().strip('"')
            break
        continued = line.endswith('_')
        if continued:
            line = line[:-1].rstrip()
        L = re.split(r'(\w+)[=]', line)
        assert L[0] == ''
        last_row.extend((L[i], L[i+1].strip()) for i in range(1, len(L), 2))
        if not continued:
            rows.append(last_row)
            last_row = []
    parsed_file.append(TableRecord(table_name, rows))
    return name

I get what the table_data_lines is doing but I don't know the things in read_table.
what is the parameter inside (table_name, sequence, parsed_file)?

**Gribouillis** · (This post was last modified: Dec-27-2019, 01:47 PM by Gribouillis.)

kingsman Wrote:I get what the table_data_lines is doing but I don't know the things in read_table.
what is the parameter inside (table_name, sequence, parsed_file)?

Well, 'table_name' is the name of the table to read. It is known because this function is called immediately after the parser has read a line starting with TABLE:. The 'sequence' argument is the sequence produced by table_data_line(), that is to say a sequence of pairs (lineno, line) read from the file. When the sequence is passed to read_table(), the lines that come next in the sequence are the rows of the table. The argument 'parsed_file' is the list to which we add the TableRecord's that we produce. This list is created in the function parse_file().

The function read_table() read lines in the file and create a TableRecord which rows are extracted from these lines. It stops when it meets a line starting with TABLE:, which indicates the beginning of a new table. In that case, it returns the name of the next table.

I ignored the line File C:... at the beginning of the file that you linked. Is this line a part of the file? In that case, we may need to modify parse_file() in order to ignore all the lines that come before the first TABLE:... line.

kingsman · Dec-27-2019, 02:05 PM

(Dec-27-2019, 01:47 PM)Gribouillis Wrote: The function read_table() read lines in the file and create a TableRecord which rows are extracted from these lines. It stops when it meets a line starting with TABLE:, which indicates the beginning of a new table. In that case, it returns the name of the next table.

I ignored the line File C:... at the beginning of the file that you linked. Is this line a part of the file? In that case, we may need to modify parse_file() in order to ignore all the lines that come before the first TABLE:... line.

Yes, File C:... at the beginning of the file is part of the file.
I am a beginner of Python still absorbing the knowledge inside it.
This is my final year project, why it is so difficult Wall

**Gribouillis** · (This post was last modified: Dec-27-2019, 02:15 PM by Gribouillis.)

Here is the modified parse_file() that skips the lines before the first TABLE:

        
          
          
              
              def parse_file(infile):
    parsed_file = []
    sequence = table_data_lines(infile)
    # find first line starting with TABLE:
    for lineno, line in sequence:
        if line.startswith('TABLE:'):
            break
    else:
        return parsed_file
    table_name = line[6:].strip().strip('"')
    while table_name:
        table_name = read_table(table_name, sequence, parsed_file)
    return parsed_file

            

        
      

It is important that you try to parse several input files in order to discover potential issues that we haven't seen yet in the parsing phase.

kingsman · Dec-27-2019, 03:49 PM

I have done something similar to yours and extract the information as you said before. Is this also be okay?

        
          
          
              
              import re
SAP = open('Frame SAP.$2k', 'r')
 
class Statement:
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return "{}({})".format(self.__class__.__name__, self.data)
 
class SAP_Material_01 (Statement):
    pass
 
parsed_file = []
material_data_01 = []
 
for line in SAP:
    e = re.compile(r'\s{3}Material=\w+\s{3}Type=\w+\s{3}')
    d = re.match(e, line)
    if d != None:
        parsed_file.append(line)
 
for line in parsed_file:
    L = re.split(r'(\w+)[=]', line.strip())
    assert L[0] == ''
    pairs = {}
    for i in range(1, len(L), 2):
        pairs[L[i]] = L[i + 1].strip()
    item = SAP_Material_01(pairs)
    material_data_01.append(item)
 
for item in material_data_01:
    print(item)

            

        
      

Output:SAP_Material_01({'Material': '4000Psi', 'Type': 'Concrete', 'SymType': 'Isotropic', 'TempDepend': 'No', 'Color': 'Magenta', 'Notes': '"Customary f\'c 4000 psi 23/12/2019 2:17:43 pm"'})
SAP_Material_01({'Material': 'A615Gr60', 'Type': 'Rebar', 'SymType': 'Uniaxial', 'TempDepend': 'No', 'Color': 'White', 'Notes': '"ASTM A615 Grade 60 23/12/2019 2:18:28 pm"'})
SAP_Material_01({'Material': 'A992Fy50', 'Type': 'Steel', 'SymType': 'Isotropic', 'TempDepend': 'No', 'Color': 'Red', 'Notes': '"ASTM A992 Grade 50 23/12/2019 2:17:43 pm"'})
SAP_Material_01({'Material': 'C30', 'Type': 'Concrete', 'SymType': 'Isotropic', 'TempDepend': 'No', 'Color': 'Blue', 'Notes': '"Concrete added 23/12/2019 2:18:37 pm"'})
SAP_Material_01({'Material': 'C45', 'Type': 'Concrete', 'SymType': 'Isotropic', 'TempDepend': 'No', 'Color': 'Blue', 'Notes': '"Concrete added 23/12/2019 2:20:37 pm"'})
SAP_Material_01({'Material': 'C60', 'Type': 'Concrete', 'SymType': 'Isotropic', 'TempDepend': 'No', 'Color': 'Blue', 'Notes': '"Concrete added 23/12/2019 2:21:13 pm"'})

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	python format conversion	bluefrog	2	3,353	Jul-22-2018, 03:49 PM Last Post: snippsat

Python for syntax conversion

User Panel Messages

Announcements