Performance question: Switch / NodeMask / WriteToFile

Discussion:

Hartmut Leister

2012-03-02 12:34:50 UTC

Hello everyone,

I'm experiencing performance problems with my osg scene. I have many (up to 100k) geometries to display a tree.

I'm building my whole scene graph at the beginning. To view it only partially I inserted
root -> group -> PositionAttitudeTransform -> Switch -> (a/b)
(a) LOD -> Geode -> Geometry or LOD -> BillBoard -> Geometry
(b) Geode -> Geometry
Now when the scene is only partially visible (using osg::Switch), it's laggy nonetheless.

If I wanted to load and save my scene on demand, what would be the best possibility (in regards of performance)
1) write unneeded subgraphs to a file
2) cache unneeded subgraphs (how would I do this?)
3) use NodeMask on osg::Group (would this also black out the respective subgraph?)

Looking forward for input. Thanks in advance.
Hartmut

--
frag nicht - du könntest eine antwort erhalten

Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

Robert Osfield

2012-03-02 13:34:40 UTC

Permalink

Hi Hartmut,

Using 100's of thousands of scene graph objects to represent a tree
will always be a performance problem, no matter what LOD'ing or scene
graph structure you use. Using such a fine grained scene graph will
cause problems in cull, draw dispatch and draw down of the GPU. It
very much sounds like you have engineered one of the worst possible
ways to tackle the task one could think of.

This really begs the question what you are trying to do here? When
you say a tree, could you give us a screen shot so we know what you
are trying to achieve. Perhaps if we know what you are trying to
achieve we can recommend how you should construct your scene graph to
achieve best performance.

Robert.

Post by Hartmut Leister
Hello everyone,
I'm experiencing performance problems with my osg scene. I have many (up to 100k) geometries to display a tree.
I'm building my whole scene graph at the beginning. To view it only partially I inserted
root -> group -> PositionAttitudeTransform -> Switch -> (a/b)
(a) LOD -> Geode -> Geometry or LOD -> BillBoard -> Geometry
(b) Geode -> Geometry
Now when the scene is only partially visible (using osg::Switch), it's laggy nonetheless.
If I wanted to load and save my scene on demand, what would be the best possibility (in regards of performance)
1) write unneeded subgraphs to a file
2) cache unneeded subgraphs (how would I do this?)
3) use NodeMask on osg::Group (would this also black out the respective subgraph?)
Looking forward for input. Thanks in advance.
Hartmut
--
frag nicht - du könntest eine antwort erhalten
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
_______________________________________________
osg-users mailing list
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Christian Buchner

2012-03-06 11:07:34 UTC

Permalink

I found that an osg::Switch object will still have all its children
run through the cull traversal. So we migrated away from osg::Switch
to conditionally display thousands of child nodes.

Node masks could be a better solution for what you are trying to
achieve. This will at least get rid of the bad cull performance.

Christian

Robert Osfield

2012-03-06 16:04:55 UTC

Permalink

HI Christian,

Post by Christian Buchner
I found that an osg::Switch object will still have all its children
run through the cull traversal. So we migrated away from osg::Switch
to conditionally display thousands of child nodes.

A rather curious statement.

If a switch node switches off a child it won't traverse it in the
update and cull traversals. It does however have to test each child to
see if it's switched off or not, which is an O(n) operator where n is
the number of children of the switch.

If you just want to switch on one child out of thosuands then
osg::Switch will be relatively expensive. For this type of switch a
custom node would be best, osg::Switch is designed primarily for
flexibility rather than serving a niche task of optimizing for massive
switching between thousands of children.

Post by Christian Buchner
Node masks could be a better solution for what you are trying to
achieve. This will at least get rid of the bad cull performance.

Using NodeMask will be worse for performance as the osg::Switch
testing will be done prior to checking the child itself so less memory
will need to be checked and less virtual functions needed to be
called.

Robert.

Christian Buchner

2012-03-06 16:19:59 UTC

Permalink

We had 10000 children in an osg::Switch, all osgText::Text objects
(which themselves were children of a PositionAttitudeTransform
object). Nearly all of them were disabled in the parent osg::Switch.

The culling pass would take forever (especially noticeable in debug
builds of the software). Besides this observation, these osg::Text
objects had huge runtime memory requirements, so we had to refactor
the code so that we dynamically add these objects to the scene graph
as needed.

Christian

Christian Buchner

2012-03-06 16:51:22 UTC

Permalink

Here is my repro case for the 10000 children of a switch node causing
massive cull times.

Just enable the stats with the 's' key. Even though most children are
turned off, they seem to be processed in the cull traversal - the
culling dominates the render time.

I do not understand enough of OpenSceneGraph to exactly pinpoint the
cause, maybe Robert can shed a bit of light as to what is going on.

Christian

Sebastian Messerschmidt

2012-03-06 17:50:29 UTC

Permalink

Hello Christian,

Don't forget that the cull traversal also collects all the draw-calls
/objects to be rendered.
So cull should better be called cull&collect.
Also from your example: You're creating one individual node per switch
children.
This means the scenegraph is not able to share any geometry etc. at this
level.
You could try to change your example to use the same text-node over and
over as child and performance should grow.
Besides, try to cut down the individual transforms if possible or set
them to static data variance and possibly run an optimizer over the scene.

cheers
Sebastian

Post by Christian Buchner
Here is my repro case for the 10000 children of a switch node causing
massive cull times.
Just enable the stats with the 's' key. Even though most children are
turned off, they seem to be processed in the cull traversal - the
culling dominates the render time.
I do not understand enough of OpenSceneGraph to exactly pinpoint the
cause, maybe Robert can shed a bit of light as to what is going on.
Christian
_______________________________________________
osg-users mailing list
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Sebastian Messerschmidt

2012-03-06 19:18:59 UTC

Permalink

well in our production code each text node would show an individual
text (some radio related parameters).
The thing is: the cull traversal should not even be looking at objects
in the osg::Switch that are turned off.

If the visitor (in this case cullvisitor) is set to
TRAVERSE_ACTIVE_CHILDREN it will only traverse those nodes which are
active.
The cull visitor takes exactly this settings, so it won't traverse non
active children.
However, the computeBounds will check every node, thus recomputing the
bounds if a children is changed, so every child has to be touched (not
saying that all of them have to recompute their bounds).
So I can't quite support your statement.
Why don't you experiment with the count of active children in your
example?
Given your theory the cull time should not increase linearly with
increasing number of nodes.

Post by Sebastian Messerschmidt
Besides, try to cut down the individual transforms if possible or
set them
to static data variance and possibly run an optimizer over the scene.

Same here, the positions of my texts would be dynamically updated each
frame. But only a few texts would be needed at any given time.
All is good now because we dumped osg::Switch.

How did you organize it then?
One question that arose is : Do you have to display all the active
children at a time?

Christian

Christian Buchner

2012-03-09 16:09:09 UTC

Permalink

Post by Christian Buchner
Here is my repro case for the 10000 children of a switch node causing
massive cull times.

Tried this piece of code on Linux with osg 3.1.1 developer release. I
do not observe any high cull times here, even when drastically
increasing the number of children of the switch node. Too bad, my
intention was to do code profiling to get to the bottom of this.

So either osg 3.1 fixes this, or some implementation detail is more
efficient on Linux than it is on Windows.

I would have profiled on Windows, except the Visual Studio versions
with this feature cost a fortune (you can purchase a car for that
price).

Christian

Sebastian Messerschmidt

2012-03-07 12:32:02 UTC

Permalink

However, the computeBounds will check every node, thus recomputing the bounds if a
children is changed, so every child has to be touched (not saying that all of them have
to recompute their bounds).

So you're essentially saying by imlementing a subclass of
osg::Switch() and overloading computeBounds() to only look at active
children I would be able to fix the performance issues? I shall try,
that is easy enough. ;)

That could be an option as far a I saw. But I don't think there is much
benefit. Essentially the computeBounds is called recursively for each
node, that states a dirty bound.
So you would maybe save on virtual call per child that is not changed,
but you'd have to manage changes on the children on your own.
In my opinion the benefits wouldn't outweigh the efforts, as the
recursive traversal here is the only way to hide the information at the
child level.

A more interesting fact that I was thinking about and which might
improve your performance is to organize your graph in a different way.
Instead of putting all childs under one big switch I'd rather split them
up to switch groups of less heavy size.
I'm not sure (because I never measured) but I guess that with vectors
~10000 elements large things like cache coherency and stuff might start
to play a bigger role.
But as said before, anything but measuring (using a profiler or at least
doing some high res. time queries inside the code) is wild guessing,
and as you have the OSG source code you are totally free to check where
the bottlenecks are.

cheers
Sebastian

Christian