Python Forum - Preprocessing an OBJ file, what is the best way to?

I would like to preprocess some very large OBJ file see https://en.wikipedia.org/wiki/Wavefront_.obj_file.

In python I would like to preprocess the file before using a c++ routine to process vertex and face definition.

Specifically I would like to create a python dictionary or array of the following information

materials : usemtl {material} rest of line
objects : o {name} rest of line
groups : g {name} rest of line

examples

usemtl copper
o fredObj
g fredGroup

In the case of objects and groups, the associated material

I tried something like

def processOBJ(doc, filename):
    import re
    print("import OBJ as GDML Tessellated")     
    fp = pythonopen(filename)
    data = fp.read()
    materials = re.findall('usemtl',data)
    print(f"Materials {materials}")
    objects = re.findall('o',data)
    print(f"objects {objects}")
    groups = re.findall('g',data)
    print(f"groups {groups}")
    return

Which finds thing okay but I am not sure how I could access the rest of the line.

I could loop through processing every line but these are potentially very large files and think there must be a better way,

Have you tried a search in Pypi ? The first link is a wavefront obj file parser that transforms a file into a Python dictionary.

Warning: intall unknown Pypi packages with caution.

(Mar-01-2024, 11:38 AM)KeithSloan Wrote: [ -> ]I would like to preprocess some very large OBJ file see https://en.wikipedia.org/wiki/Wavefront_.obj_file.

In python I would like to preprocess the file before using a c++ routine to process vertex and face definition.

Specifically I would like to create a python dictionary or array of the following information

materials : usemtl {material} rest of line
objects : o {name} rest of line
groups : g {name} rest of line

examples

usemtl copper
o fredObj
g fredGroup

In the case of objects and groups, the associated material

I tried something like
def processOBJ(doc, filename):
    import re
    print("import OBJ as GDML Tessellated")     
    fp = pythonopen(filename)
    data = fp.read()
    materials = re.findall('usemtl',data)
    print(f"Materials {materials}")
    objects = re.findall('o',data)
    print(f"objects {objects}")
    groups = re.findall('g',data)
    print(f"groups {groups}")
    return
Which finds thing okay but I am not sure how I could access the rest of the line.

I could loop through processing every line but these are potentially very large files and think there must be a better way,

Probably did not explain very well.

I can call C++ code to perform most of the action I need but it does not handle materials, so I wish to quickly and efficiently scan the file for the material options and then process those after I have called the C++ code.

I do not want to use Python to process the wavefront OBJ file

Could you clarify the input that you have and the output that you want? I think it is very difficult to understand. An example Obj file would help, together with the expected output.

Thanks for trying to help, the OBJ file is as per the https://en.wikipedia.org/wiki/Wavefront_.obj_file

My initial test file is 98.7 meg but I would like to be able to process even larger.

The file has a few lines of comments then a

mtllib {filename}

That is a reference to a materials definition file.

Then a large number of vertex and face definitions

v 6.806145 3.287188 34.507591
v 6.889372 3.314378 34.482964
...... repeat large number of times with different values
g groupName
usemtl groupMaterial
f 1 2 3
f 4 5 6
f 4 7 8
f 5 4 8
f 8 7 9
...... repeat a large number of times with different values
v -5.852165 2.924480 33.700558
v -5.861451 2.937897 33.763584
v -5.978782 2.973776 33.743969
v -5.635860 2.871064 33.685860
v -5.709496 2.899417 33.751705
v -5.442479 2.838525 33.676365
v -5.522288 2.871565 33.741238
v -5.261642 2.836869 33.678814
v -5.349662 2.853023 33.73287
.... repeat a large number with different values

g nextGroupName
usemtl nextGroupMaterial

... more v & f definitions

In total there are 173 different groups and material definition.

I can call a python function which has a c++ function to process the file and it creates the 173 objects but does nothing
with the materials, so I wish to preprocess the file to get a list/dictionary of objects and their materials.
After loading I would then process the 173 objects and allocate the associated material.

It would also be useful to get a list of the materials in the preprocess so that I could offer a set of mapping to
a different set of materials.

You could extract all the relevant lines with one pattern

import io
import re

file = io.StringIO('''\
v 6.806145 3.287188 34.507591
v 6.889372 3.314378 34.482964
...... repeat large number of times with different values
g groupName
usemtl groupMaterial
f 1 2 3
f 4 5 6
f 4 7 8
f 5 4 8
f 8 7 9
...... repeat a large number of times with different values
v -5.852165 2.924480 33.700558
v -5.861451 2.937897 33.763584
v -5.978782 2.973776 33.743969
v -5.635860 2.871064 33.685860
v -5.709496 2.899417 33.751705
v -5.442479 2.838525 33.676365
v -5.522288 2.871565 33.741238
v -5.261642 2.836869 33.678814
v -5.349662 2.853023 33.73287
.... repeat a large number with different values

g nextGroupName
usemtl nextGroupMaterial

... more v & f definitions''')

pattern = re.compile(r"^(?:[og]|usemtl)\s.*", re.MULTILINE)

print(pattern.findall(file.read()))

Output:
['g groupName', 'usemtl groupMaterial', 'g nextGroupName', 'usemtl nextGroupMaterial']

You could then split the result with more_itertools.bucket()

lines = pattern.findall(file.read())

from more_itertools import bucket
from operator import itemgetter

s = bucket(lines, itemgetter(0))
print(list(s['o']))
print(list(s['u']))
print(list(s['g']))

Output:[]
['usemtl groupMaterial', 'usemtl nextGroupMaterial']
['g groupName', 'g nextGroupName']

(Mar-02-2024, 09:00 AM)Gribouillis Wrote: [ -> ]You could extract all the relevant lines with one pattern

import io
import re

file = io.StringIO('''\
v 6.806145 3.287188 34.507591
v 6.889372 3.314378 34.482964
...... repeat large number of times with different values
g groupName
usemtl groupMaterial
f 1 2 3
f 4 5 6
f 4 7 8
f 5 4 8
f 8 7 9
...... repeat a large number of times with different values
v -5.852165 2.924480 33.700558
v -5.861451 2.937897 33.763584
v -5.978782 2.973776 33.743969
v -5.635860 2.871064 33.685860
v -5.709496 2.899417 33.751705
v -5.442479 2.838525 33.676365
v -5.522288 2.871565 33.741238
v -5.261642 2.836869 33.678814
v -5.349662 2.853023 33.73287
.... repeat a large number with different values

g nextGroupName
usemtl nextGroupMaterial

... more v & f definitions''')

pattern = re.compile(r"^(?:[og]|usemtl)\s.*", re.MULTILINE)

print(pattern.findall(file.read()))

Output:
['g groupName', 'usemtl groupMaterial', 'g nextGroupName', 'usemtl nextGroupMaterial']

You could then split the result with more_itertools.bucket()

lines = pattern.findall(file.read())

from more_itertools import bucket
from operator import itemgetter

s = bucket(lines, itemgetter(0))
print(list(s['o']))
print(list(s['u']))
print(list(s['g']))

Output:[]
['usemtl groupMaterial', 'usemtl nextGroupMaterial']
['g groupName', 'g nextGroupName']

Thanks that is VERY helpful.