Preprocessing an OBJ file, what is the best way to?

KeithSloan · Mar-01-2024, 11:38 AM

I would like to preprocess some very large OBJ file see https://en.wikipedia.org/wiki/Wavefront_.obj_file.

In python I would like to preprocess the file before using a c++ routine to process vertex and face definition.

Specifically I would like to create a python dictionary or array of the following information

materials : usemtl {material} rest of line
objects : o {name} rest of line
groups : g {name} rest of line

examples

usemtl copper
o fredObj
g fredGroup

In the case of objects and groups, the associated material

I tried something like

def processOBJ(doc, filename):
    import re
    print("import OBJ as GDML Tessellated")     
    fp = pythonopen(filename)
    data = fp.read()
    materials = re.findall('usemtl',data)
    print(f"Materials {materials}")
    objects = re.findall('o',data)
    print(f"objects {objects}")
    groups = re.findall('g',data)
    print(f"groups {groups}")
    return

Which finds thing okay but I am not sure how I could access the rest of the line.

I could loop through processing every line but these are potentially very large files and think there must be a better way,

**Gribouillis** · Mar-01-2024, 11:55 AM

Have you tried a search in Pypi ? The first link is a wavefront obj file parser that transforms a file into a Python dictionary.

Warning: intall unknown Pypi packages with caution.

KeithSloan · Mar-01-2024, 03:23 PM

(Mar-01-2024, 11:38 AM)KeithSloan Wrote: I would like to preprocess some very large OBJ file see https://en.wikipedia.org/wiki/Wavefront_.obj_file.

In python I would like to preprocess the file before using a c++ routine to process vertex and face definition.

Specifically I would like to create a python dictionary or array of the following information

materials : usemtl {material} rest of line
objects : o {name} rest of line
groups : g {name} rest of line

examples

usemtl copper
o fredObj
g fredGroup

In the case of objects and groups, the associated material

I tried something like
def processOBJ(doc, filename):
    import re
    print("import OBJ as GDML Tessellated")     
    fp = pythonopen(filename)
    data = fp.read()
    materials = re.findall('usemtl',data)
    print(f"Materials {materials}")
    objects = re.findall('o',data)
    print(f"objects {objects}")
    groups = re.findall('g',data)
    print(f"groups {groups}")
    return
Which finds thing okay but I am not sure how I could access the rest of the line.

I could loop through processing every line but these are potentially very large files and think there must be a better way,

KeithSloan · Mar-01-2024, 03:36 PM

Probably did not explain very well.

I can call C++ code to perform most of the action I need but it does not handle materials, so I wish to quickly and efficiently scan the file for the material options and then process those after I have called the C++ code.

I do not want to use Python to process the wavefront OBJ file

**Gribouillis** · Mar-01-2024, 04:55 PM

Could you clarify the input that you have and the output that you want? I think it is very difficult to understand. An example Obj file would help, together with the expected output.

KeithSloan · Mar-02-2024, 08:14 AM

Thanks for trying to help, the OBJ file is as per the https://en.wikipedia.org/wiki/Wavefront_.obj_file

My initial test file is 98.7 meg but I would like to be able to process even larger.

The file has a few lines of comments then a

mtllib {filename}

That is a reference to a materials definition file.

Then a large number of vertex and face definitions

v 6.806145 3.287188 34.507591
v 6.889372 3.314378 34.482964
...... repeat large number of times with different values
g groupName
usemtl groupMaterial
f 1 2 3
f 4 5 6
f 4 7 8
f 5 4 8
f 8 7 9
...... repeat a large number of times with different values
v -5.852165 2.924480 33.700558
v -5.861451 2.937897 33.763584
v -5.978782 2.973776 33.743969
v -5.635860 2.871064 33.685860
v -5.709496 2.899417 33.751705
v -5.442479 2.838525 33.676365
v -5.522288 2.871565 33.741238
v -5.261642 2.836869 33.678814
v -5.349662 2.853023 33.73287
.... repeat a large number with different values

g nextGroupName
usemtl nextGroupMaterial

... more v & f definitions

In total there are 173 different groups and material definition.

I can call a python function which has a c++ function to process the file and it creates the 173 objects but does nothing
with the materials, so I wish to preprocess the file to get a list/dictionary of objects and their materials.
After loading I would then process the 173 objects and allocate the associated material.

It would also be useful to get a list of the materials in the preprocess so that I could offer a set of mapping to
a different set of materials.

**Gribouillis** · (This post was last modified: Mar-02-2024, 09:00 AM by Gribouillis.)

You could extract all the relevant lines with one pattern

import io
import re

file = io.StringIO('''\
v 6.806145 3.287188 34.507591
v 6.889372 3.314378 34.482964
...... repeat large number of times with different values
g groupName
usemtl groupMaterial
f 1 2 3
f 4 5 6
f 4 7 8
f 5 4 8
f 8 7 9
...... repeat a large number of times with different values
v -5.852165 2.924480 33.700558
v -5.861451 2.937897 33.763584
v -5.978782 2.973776 33.743969
v -5.635860 2.871064 33.685860
v -5.709496 2.899417 33.751705
v -5.442479 2.838525 33.676365
v -5.522288 2.871565 33.741238
v -5.261642 2.836869 33.678814
v -5.349662 2.853023 33.73287
.... repeat a large number with different values

g nextGroupName
usemtl nextGroupMaterial

... more v & f definitions''')

pattern = re.compile(r"^(?:[og]|usemtl)\s.*", re.MULTILINE)

print(pattern.findall(file.read()))

Output:
['g groupName', 'usemtl groupMaterial', 'g nextGroupName', 'usemtl nextGroupMaterial']

You could then split the result with more_itertools.bucket()

lines = pattern.findall(file.read())

from more_itertools import bucket
from operator import itemgetter

s = bucket(lines, itemgetter(0))
print(list(s['o']))
print(list(s['u']))
print(list(s['g']))

Output:[]
['usemtl groupMaterial', 'usemtl nextGroupMaterial']
['g groupName', 'g nextGroupName']

KeithSloan · Mar-02-2024, 04:48 PM

(Mar-02-2024, 09:00 AM)Gribouillis Wrote: You could extract all the relevant lines with one pattern

import io
import re

file = io.StringIO('''\
v 6.806145 3.287188 34.507591
v 6.889372 3.314378 34.482964
...... repeat large number of times with different values
g groupName
usemtl groupMaterial
f 1 2 3
f 4 5 6
f 4 7 8
f 5 4 8
f 8 7 9
...... repeat a large number of times with different values
v -5.852165 2.924480 33.700558
v -5.861451 2.937897 33.763584
v -5.978782 2.973776 33.743969
v -5.635860 2.871064 33.685860
v -5.709496 2.899417 33.751705
v -5.442479 2.838525 33.676365
v -5.522288 2.871565 33.741238
v -5.261642 2.836869 33.678814
v -5.349662 2.853023 33.73287
.... repeat a large number with different values

g nextGroupName
usemtl nextGroupMaterial

... more v & f definitions''')

pattern = re.compile(r"^(?:[og]|usemtl)\s.*", re.MULTILINE)

print(pattern.findall(file.read()))

Output:
['g groupName', 'usemtl groupMaterial', 'g nextGroupName', 'usemtl nextGroupMaterial']

You could then split the result with more_itertools.bucket()

lines = pattern.findall(file.read())

from more_itertools import bucket
from operator import itemgetter

s = bucket(lines, itemgetter(0))
print(list(s['o']))
print(list(s['u']))
print(list(s['g']))

Output:[]
['usemtl groupMaterial', 'usemtl nextGroupMaterial']
['g groupName', 'g nextGroupName']

Thanks that is VERY helpful.

Preprocessing an OBJ file, what is the best way to?

User Panel Messages

Announcements