It looks like the Profiler is showing a very high timing value for the RenderUI (which is definitely not right for a simple UI such as this), and 0 for ApplyFrameLimit. Is is possible the that idle time that should show under ApplyFrameLimit, is getting accumulated under RenderUI by mistake?
The UI render time also considers the cost of the debug UI, being rendered per frame. I can’t begin to interpret the UI render time, but I can stop it happening by only drawing what I need, and avoiding complex UI elements like graphs.
The profiler shown in this thread: Profiler rework and profiling tool has more realistic values, 0.48 ms for UI rendering and 11.9 ms for applying frame limiting.
That’s what the Urho profiler should show.
Have you narrowed it down to what could be the cause of the difference?
I figured out the problem, it looks like if you set the VSync flag on, the ApplyFrameLimit profiling always shows 0. Which is technically correct, however in practice it’s not very useful, since you want to see how much spare frame time you have left before the frame is presented.
IMO a more useful metric would be if the ApplyFrameLimit, was called something like “FrameSpareTime” and it started measuring the time just before the call to Graphics::EndFrame. So, if I have a game that I want to run at 60fps, and I want to know how much spare frame time I have to determine if it’s worth my trouble to be spending weeks/months implementing some extra effects. If I only have 2 ms left, I might not bother but if I have a spare 10 ms left then I can definitely try to do more with it.
Looking that my sample above, I’m quite sure that my frame only takes about 2-3 ms to update/render (including the UI), and about 14 ms is spare time, but the Profiler is not showing me that.
You can still think of V-Sync as applying a frame limit, it’s just done by the graphics hardware rather than by the engine, so that timing should still be profiled separately, it just feels wrong to show it under the UI Render timing (as it’s not really UI render time, it’s something else).
The GPU can choke for a variety of reasons. You probably already know that V-Sync was intended to deliberately slow down our graphics pipeline to match the rate our display hardware can achieve (so we’re not “spinning our wheels” to generate frames that will never be displayed). Another reason it was invented was to deal with “vertical tearing” in the days before we invented back-buffering… your code could try to draw a new frame, while the old one was only half-way being displayed on the hardware… looked really bad.
But there are other reasons that the GPU can cause issues with respect to graphics framerate - uploading too much data is the usual reason, whether it be texture thrashing, dynamic vertices, or just trying to draw a bunch of triangles that rightly could have been culled earlier.
Let’s look at OpenGL for example.
It demands that we issue all graphics-related calls from our main thread, which owns the render context (yes there are ways to share the render context, and there are also costs associated with thread barriers).
Since only the main thread can do that, this places a big limit on how much work can be done by worker threads, and places great load on the main thread, which drives the engine update loop. The entire engine tick rate is then brought to the mercy of the GPU.
Should we find that the GPU is responsible for spikes in the application update rate, there are more suitable tools for profiling graphics issues than the built-in profiling may offer.
[EDIT] The spare time you describe is likely the time it takes for the render commands we sent to actually get executed - the “flush time” if you will. This is the GPU stalling the CPU main thread - something that new graphics api like Vulkan hope to address. You are correct in assuming that you can actually do work during the time we’re waiting for the gpu to finish, but it can’t involve render commands. We can queue them, but not do them. This is something I implemented in the past.
@Leith, having written both official and homebrew Nintendo 3ds ports of Urho3D, I assert that everything you’ve said is false on even the tightly resource constrained 3ds platform and thus as false can be.
This is likely an instrumenting error. Because of how Urho batches 2d UI drawing it’s almost inconceivable that 2d drawing would be a real issue. Possibly some compatibility triggers or the user named his output file poorly and his video drivers are doing some nonsense for an exe they think is Doom3 or such.
Regardless, he’s getting different timings out of different instrumenting methods. That indicates the instruments have issues.
I’m not sure which part of GPU stalling on render back ends you disagree with. I am willing to be corrected by a true master. How the hell is everything I said false? I described the vsync mechanism and its intent, was that wrong? Man, not everything I said was wrong, I’ve been doing this a while. If you can correct me, do so.
Time is very important to me, and I tend to watch where it goes.
As for your Nintendo port, man, I predate Nintendo, I am impressed that you have a clue, not many of us seem to.
Would you like me to write a new white paper on render pipe stalls, or quote an existing one?