Python Forum
Preprocessing an OBJ file, what is the best way to?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Preprocessing an OBJ file, what is the best way to?
#1
I would like to preprocess some very large OBJ file see https://en.wikipedia.org/wiki/Wavefront_.obj_file.

In python I would like to preprocess the file before using a c++ routine to process vertex and face definition.

Specifically I would like to create a python dictionary or array of the following information

materials : usemtl {material} rest of line
objects : o {name} rest of line
groups : g {name} rest of line

examples

usemtl copper
o fredObj
g fredGroup

In the case of objects and groups, the associated material

I tried something like

def processOBJ(doc, filename):
    import re
    print("import OBJ as GDML Tessellated")     
    fp = pythonopen(filename)
    data = fp.read()
    materials = re.findall('usemtl',data)
    print(f"Materials {materials}")
    objects = re.findall('o',data)
    print(f"objects {objects}")
    groups = re.findall('g',data)
    print(f"groups {groups}")
    return
Which finds thing okay but I am not sure how I could access the rest of the line.

I could loop through processing every line but these are potentially very large files and think there must be a better way,
Reply
#2
Have you tried a search in Pypi ? The first link is a wavefront obj file parser that transforms a file into a Python dictionary.

Warning: intall unknown Pypi packages with caution.
« We can solve any problem by introducing an extra level of indirection »
Reply
#3
(Mar-01-2024, 11:38 AM)KeithSloan Wrote: I would like to preprocess some very large OBJ file see https://en.wikipedia.org/wiki/Wavefront_.obj_file.

In python I would like to preprocess the file before using a c++ routine to process vertex and face definition.

Specifically I would like to create a python dictionary or array of the following information

materials : usemtl {material} rest of line
objects : o {name} rest of line
groups : g {name} rest of line

examples

usemtl copper
o fredObj
g fredGroup

In the case of objects and groups, the associated material

I tried something like

def processOBJ(doc, filename):
    import re
    print("import OBJ as GDML Tessellated")     
    fp = pythonopen(filename)
    data = fp.read()
    materials = re.findall('usemtl',data)
    print(f"Materials {materials}")
    objects = re.findall('o',data)
    print(f"objects {objects}")
    groups = re.findall('g',data)
    print(f"groups {groups}")
    return
Which finds thing okay but I am not sure how I could access the rest of the line.

I could loop through processing every line but these are potentially very large files and think there must be a better way,
Reply
#4
Probably did not explain very well.

I can call C++ code to perform most of the action I need but it does not handle materials, so I wish to quickly and efficiently scan the file for the material options and then process those after I have called the C++ code.

I do not want to use Python to process the wavefront OBJ file
Reply
#5
Could you clarify the input that you have and the output that you want? I think it is very difficult to understand. An example Obj file would help, together with the expected output.
buran likes this post
« We can solve any problem by introducing an extra level of indirection »
Reply
#6
Thanks for trying to help, the OBJ file is as per the https://en.wikipedia.org/wiki/Wavefront_.obj_file

My initial test file is 98.7 meg but I would like to be able to process even larger.

The file has a few lines of comments then a

mtllib {filename}
That is a reference to a materials definition file.

Then a large number of vertex and face definitions
v 6.806145 3.287188 34.507591
v 6.889372 3.314378 34.482964
...... repeat large number of times with different values
g groupName
usemtl groupMaterial
f 1 2 3
f 4 5 6
f 4 7 8
f 5 4 8
f 8 7 9
...... repeat a large number of times with different values
v -5.852165 2.924480 33.700558
v -5.861451 2.937897 33.763584
v -5.978782 2.973776 33.743969
v -5.635860 2.871064 33.685860
v -5.709496 2.899417 33.751705
v -5.442479 2.838525 33.676365
v -5.522288 2.871565 33.741238
v -5.261642 2.836869 33.678814
v -5.349662 2.853023 33.73287
.... repeat a large number with different values

g nextGroupName
usemtl nextGroupMaterial

... more v & f definitions
In total there are 173 different groups and material definition.

I can call a python function which has a c++ function to process the file and it creates the 173 objects but does nothing
with the materials, so I wish to preprocess the file to get a list/dictionary of objects and their materials.
After loading I would then process the 173 objects and allocate the associated material.

It would also be useful to get a list of the materials in the preprocess so that I could offer a set of mapping to
a different set of materials.
Reply
#7
You could extract all the relevant lines with one pattern
import io
import re

file = io.StringIO('''\
v 6.806145 3.287188 34.507591
v 6.889372 3.314378 34.482964
...... repeat large number of times with different values
g groupName
usemtl groupMaterial
f 1 2 3
f 4 5 6
f 4 7 8
f 5 4 8
f 8 7 9
...... repeat a large number of times with different values
v -5.852165 2.924480 33.700558
v -5.861451 2.937897 33.763584
v -5.978782 2.973776 33.743969
v -5.635860 2.871064 33.685860
v -5.709496 2.899417 33.751705
v -5.442479 2.838525 33.676365
v -5.522288 2.871565 33.741238
v -5.261642 2.836869 33.678814
v -5.349662 2.853023 33.73287
.... repeat a large number with different values

g nextGroupName
usemtl nextGroupMaterial

... more v & f definitions''')

pattern = re.compile(r"^(?:[og]|usemtl)\s.*", re.MULTILINE)

print(pattern.findall(file.read()))
Output:
['g groupName', 'usemtl groupMaterial', 'g nextGroupName', 'usemtl nextGroupMaterial']
You could then split the result with more_itertools.bucket()
lines = pattern.findall(file.read())

from more_itertools import bucket
from operator import itemgetter

s = bucket(lines, itemgetter(0))
print(list(s['o']))
print(list(s['u']))
print(list(s['g']))
Output:
[] ['usemtl groupMaterial', 'usemtl nextGroupMaterial'] ['g groupName', 'g nextGroupName']
KeithSloan likes this post
« We can solve any problem by introducing an extra level of indirection »
Reply
#8
(Mar-02-2024, 09:00 AM)Gribouillis Wrote: You could extract all the relevant lines with one pattern
import io
import re

file = io.StringIO('''\
v 6.806145 3.287188 34.507591
v 6.889372 3.314378 34.482964
...... repeat large number of times with different values
g groupName
usemtl groupMaterial
f 1 2 3
f 4 5 6
f 4 7 8
f 5 4 8
f 8 7 9
...... repeat a large number of times with different values
v -5.852165 2.924480 33.700558
v -5.861451 2.937897 33.763584
v -5.978782 2.973776 33.743969
v -5.635860 2.871064 33.685860
v -5.709496 2.899417 33.751705
v -5.442479 2.838525 33.676365
v -5.522288 2.871565 33.741238
v -5.261642 2.836869 33.678814
v -5.349662 2.853023 33.73287
.... repeat a large number with different values

g nextGroupName
usemtl nextGroupMaterial

... more v & f definitions''')

pattern = re.compile(r"^(?:[og]|usemtl)\s.*", re.MULTILINE)

print(pattern.findall(file.read()))
Output:
['g groupName', 'usemtl groupMaterial', 'g nextGroupName', 'usemtl nextGroupMaterial']
You could then split the result with more_itertools.bucket()
lines = pattern.findall(file.read())

from more_itertools import bucket
from operator import itemgetter

s = bucket(lines, itemgetter(0))
print(list(s['o']))
print(list(s['u']))
print(list(s['g']))
Output:
[] ['usemtl groupMaterial', 'usemtl nextGroupMaterial'] ['g groupName', 'g nextGroupName']

Thanks that is VERY helpful.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020