Generating Splines from JSON Data



  • Hello!

    I'm revisiting the Moana Island data set and I'm making great progress; I've got almost all of the assets converted into Redshift Proxies.

    The biggest problem I'm currently facing is a 3GB JSON file that defines renderable curves on the largest mountain asset. I don't know exactly how many curves are defined in this file, but based on the curve count and data size of other JSON files I think its roughly 5.2 million curves. Each point of the curve is an array with 3 items, each curve is an array of N points, and the curves are stored inside of a top level array.

    The built-in json module must load the entire file into memory before operating on it. I've experienced extremely poor behavior on any JSON file over 500MB with the json module so I am instead parsing the files with ijson which allows for iterative reading of the JSON files as well as a much faster C backend based on YAJL.

    Using ijson I was able to read an 11GB file that stored transform matrices for instanced assets on the beach. However, even using ijson I cannot seem to build a spline from the curves in this 3GB file (I gave up after letting the script run for 12 hours). I have a suspicion it has more to do with the way I'm building the spline object than parsing the data. So I have some questions. Is there a performance penalty for building a single spline with millions of segments? Should I instead build millions of splines with a single segment? Or would it be better to try and split the difference and build thousands of splines with 10,000 segments each?

    I've done a little performance testing with my current code and right now it takes 10 minutes 13 seconds to build a single spline out of the first 100,000 curves in the file. However, if I build just the first 10,000 curves it only takes 5 seconds.

    I'm leaning heavily toward chunking the splines into 10,000 segment batches but I want to first see if my code could be further optimized, here is the relevant portion:

        curves = ijson.items(json_file, 'item', buf_size=32*1024) 
        #curves is a generator object that returns the points for each segment successively
    
        for i, curve in enumerate(curves):
            #for performance testing I'm limiting the number of segments parsed and created
            if i > num:
                break
            point_count += len(curve) #tracking the total number of points
            segment_count += 1 #tracking the number of segments
            spline.ResizeObject(point_count, segment_count) #resizing the spline
            for id, point in enumerate(reversed(curve)):
                spline.SetSegment(segment_count-1, len(curve), False)
                spline.SetPoint(point_count-id-1, flipMat * c4d.Vector(*(float(p) for p in point)))
        
        spline.Message(c4d.MSG_UPDATE)
    


  • Hi,

    1. I doubt that SplineObject is capeable of handling such large curves, because it is a high abstraction type that encapsulates all the common curve interpolationn stuff and hides away the the raw curve data, Cinema calls these LineData, which are used at render time.
    2. Your data seems to be intended for such raw curve data. But we cannot instantiate LineObject in Python (C++ can do a bit more, but what you are trying to do should also not be possible if I am not mistaken).
    3. Aside from instantiating the SplineObject in the first place, you would also have the problem that Cinema's splines are not static, i.e. are being dynamically cached. That would mean all this point data had to be reprocessed each time the cache for this spline is being build.
    4. This is more an academic point due to the mentioned problems, but setting each point individually seems very inefficient, you should push all points in at once, using PointObject.SetAllPoints.
    5. I haven't looked at the Pixar Mona files, but I somehow doubt that they are well translatable into anything that isn't specifically build for such heavy scenes (i.e. something like Renderman). As already stated, these curve files are very likely not intended for a high abstraction spline type, but as a raw data intput, probably even being loaded just in time for rendering.
    6. The only way to go would be to reduce the dataset in dimensionality, which would mean for a curve to do some curve fitting. Cinema has a curve fitting function in its API, its even accessible in Python in the c4d.utils module, but I somehow doubt that it is up to the task. You will probably have to use numpy and scipy for that.

    Cheers,
    zipit



  • @zipit said in Generating Splines from JSON Data:

    1. Aside from instantiating the SplineObject in the first place, you would also have the problem that Cinema's splines are not static, i.e. are being dynamically cached. That would mean all this point data had to be reprocessed each time the cache for this spline is being build.

    Perhaps this is a reason it would be good to build splines made of fewer segments/points? I will not be modifying the curves after they're built the first time. Are you saying that the cache is built for the SplineObject even outside of the call to SplineObject.Message(c4d.MSG_UPDATE)? That's not the way it seems to behave based on my (limited) observations.

    1. This is more an academic point due to the mentioned problems, but setting each point individually seems very inefficient, you should push all points in at once, using PointObject.SetAllPoints.

    I'd like to use this method but I haven't found any examples of using it on a SplineObject. My concern is that I would have to call this once per SplineObject (as opposed to per segment) which would entail keeping a 10,000 element python list of which each entry is a 5 element python list of c4d.Vector alive until the SplineObject points are ready to be set. I might be able to use one of the methods from itertools to do this relatively quickly, but I'm just not sure if it'll actually be an improvement. I suppose it would be if the cache is in fact being built whenever a point is added.

    1. The only way to go would be to reduce the dataset in dimensionality, which would mean for a curve to do some curve fitting. Cinema has a curve fitting function in its API, its even accessible in Python in the c4d.utils module, but I somehow doubt that it is up to the task. You will probably have to use numpy and scipy for that.

    Each curve is already very simple; just 5 points that Disney expects to be interpolated (I'm using B-Spline in this case). Perhaps lowering the number of intermediate points for this case would be beneficial, especially as these splines are very distant from the majority of the scene (occupy a small section of the frame).

    I've gone ahead and modified my code to chunk the curve data so that each SplineObject ends up with only 10,000 segments. Now processing 100,000 curves takes only 42s down from the prior 10m13s. This seems to be a roughly linear increase in time from the 5s for 10,000 segments that I'd previously recorded... so I think I'm happy for now?

        point_count = 0
        segment_count = 0
        spline_count = 1
        num = 10000
    
        spline = srcSpline.GetClone()
        spline.SetName("{0}_{1:0>4d}".format(name, spline_count))
        curves = ijson.items(json_file, 'item', buf_size=32*1024)
        for i, curve in enumerate(curves):
            index = i % num
            if index >= (num - 1):
                spline.InsertTag(texture_tag.GetClone())
                spline.InsertTag(object_tag.GetClone())
                spline.SetLayerObject(layer)
                spline.Message(c4d.MSG_UPDATE)
                doc.InsertObject(spline, group)
                point_count = 0
                segment_count = 0
                spline_count += 1
                spline = srcSpline.GetClone()
                spline.SetName("{0}_{1:0>4d}".format(name, spline_count))
            if spline_count > 10:
                break
            point_count += len(curve)
            segment_count += 1
            spline.ResizeObject(point_count, segment_count)
            for id, point in enumerate(reversed(curve)):
                spline.SetSegment(segment_count-1, len(curve), False)
                spline.SetPoint(point_count-id-1, flipMat * c4d.Vector(*(float(p) for p in point)))
    


  • Here are my results after 35 minutes of processing:

    Cinema_4D_2020-06-21_17-15-13.png

    Much better than the 12 hours and 0 results I got from my previous attempt 😂

    It looks like there are 5,183,087 segments. I have them split across 519 spline objects. The scene's viewport navigation is still fairly responsive, about 25fps when all the splines are visible. Higher if I zoom into a section.

    I'd still like to improve the method if anyone can provide more information on how I might use SetAllPoints for the splines here.



  • Hi,

    oh, okay I thought you were trying to push all these points into a single spline. I did not read your initial post properly in hindsight ;) And I do not know what tests you did run regarding the caching and what you are doing in detail, but normally you do not have control of the cache building in Cinema in a restrictive way. And for the SetAllPoints thingy, it is not that fancy, you just straight away use it. I assume you have no tangency data, just a bunch of line segments as your raw data, right? Here is a quick example:

    import c4d
    import random
    
    # Some mock curve data. A random number of curves with a random number of
    # points each. Each curve is just a line segment.
    CURVE_DATA = [tuple(c4d.Vector(x, y, 0)
                   for y in range(0, 100, random.randrange(10, 50, 10)))
                  for x in range(0, 100, random.randrange(10, 50, 10))]
    
    
    def main():
        """ Creates a spline with multiple segments.
        """
        # Some stats about our curve data. You probably want to do this less
        # inefficient in its own loop, but I was lazy.
        curve_count = len(CURVE_DATA)
        curve_point_counts = tuple(len(curve) for curve in CURVE_DATA)
        points = list([p for curve in CURVE_DATA for p in curve])
        total_point_count = len(points)
    
        # Create the SplineObject, resize it and push in all points.
        node = c4d.SplineObject(0, c4d.SPLINETYPE_LINEAR)
        node.ResizeObject(total_point_count, curve_count)
        # Nothing magical here, just like the method is documented.
        node.SetAllPoints(points)
    
        # Then go over the point counts of each curve in our curve data, 
        # and write these counts as the size of the segments of the spline.
        for i, length in enumerate(curve_point_counts):
            # Note: There was something wrong with closed spline segments, i.e.
            # it did not work or something like that. I do not remember exactly,
            # but you should find the post in the forum. It was a relatively 
            # recent post. Not sure if this has already been fixed.
            node.SetSegment(i, length, False)
    
        # Push the object into the scene.
        node.Message(c4d.MSG_UPDATE)
        doc.InsertObject(node)
        c4d.EventAdd()
    
    if __name__ == "__main__":
        main()
    

    Cheers,
    zipit



  • Hi @wuzelwazel,
    I think @zipit already did a great job and answers your questions,

    Another point that may be important for you is to turn off the interpolation of the SplineObject via splineObject[c4d.SPLINEOBJECT_INTERPOLATION] = c4d.SPLINEOBJECT_INTERPOLATION_NONE.

    But except that I don't more to optimize on the C4D side.
    Cheers,
    Maxime.


Log in to reply