Just added sending profiling data over network. This is still unoptimized as it sends all hash strings over network with each interval, but a good start nonetheless. Works great on local machine. All you have to do is:
_profilerSink = new ProfilerNetworkSink(context_);
A better screenshot:
I am going to polish integration of this lib instead because honestly - no point in reinventing the wheel when we have a rocket engine here. I do not think i can build a rocket engine, i just want to get my work done / goals reached really.
As you can see it supports logging performance on all threads as well. Different colors represent different kinds of logged code paths. Orange items are events, blue would be resource loading, yellowish are normal code blocks. We can specify our own colors as well. It already supports sending data over network. I am not sure i can get profiler tree to be printed on debug hud though. Would it be ok to sacrifice that?
I’m a maintainer of easy_profiler. I’m glad to see it useful for you.
Because the documentation is small at the moment we can help you. Ask question about core and gui and we will answer to the best of our opportunity.
Thank you for kind offer, if i bump into any problems - i definitely will. So far it seems i have things figured out. I noticed there are windows-specific build options EASY_OPTION_EVENT_TRACING and EASY_OPTION_LOW_PRIORITY_EVENT_TRACING. What do they do exactly?
Is there any way to access logged events from within application? I cant find anything like it. Closest thing would be dumpBlocksToStream() function but its internal. Reason for this would be displaying currently frame info on screen in game.
EASY_OPTION_EVENT_TRACING turns on/off event tracing for Windows to be able to capture thread context switch events. As Windows event tracing requires creation of separate thread for events handling, you may wish to disable it at all - that’s for this option is used.
EASY_OPTION_LOW_PRIORITY_EVENT_TRACING sets priority for event tracing thread to Low to reduce it’s impact on the application performance, but in that case you can miss some context switch events as profiling session could be stopped before all these events are gathered.
By the way enabling/disabling event tracing and changing it’s priority can be done via profiler_gui after connecting to the profiled application. So using the above options affects only default state of the profiler.
As for the last question, there is no such possibility yet, but we are opened for ideas. Which information would you like to access during profiling? Maybe it would be better to display frames in profiler_gui at runtime instead of displaying them in the profiling application?
If you try to change the value of the “Frame time” at the tool-bar you would see different colors at the histogram where all frames are displayed (green columns on a screenshot below main blocks diagram). Red color indicates frames which were executed longer than expected.
If you set “Frame time” to 0 then histogram would be colored in next way: from green color (minimum duration) to red (maximum duration).
Also this histogram has two modes: overview mode (by default) - when all captured frames are displayed, and zoom mode - when histogram displays only frames which are currently visible on a blocks diagram. Click right mouse button on the histogram to change mode.
Currently Urho3D debug hud may display a tree of executed events and how long they took. That information is gathered in intervals and averaged out. Getting access to frame event trees of last T seconds would be sufficient. I definitely agree that viewing this information in profiler_gui live would be better, however if i were to try and get this merged maintainers might require to preserve old functionality as well. Besides displaying that simple tree on screen could also be useful for spotting quick and dirty slowdowns thus speeding up iteration cycles.
Getting a tree of events could be very difficult and expensive operation because all events for each thread are stored in one-dimensional (and one-directional) buffer without saving a hierarchy (hierarchy restored by reader and this is a quite expensive operation as you can see when opening a file). This is done to reduce memory and timing cost for store operation as much as possible for profiled application. I’m missing a point why you need to get full tree of events at run-time? Full tree could contain thousands or tens of thousands of events.
What can be simple: additionally store duration for top-level events (frames) in a separate container (this is not free of charge though) or store only last frame duration in atomic variable (this should not cost anything) - you could still draw a diagram like on this screenshot: https://global.discourse-cdn.com/standard17/uploads/urho3d/original/1X/221c99ba40e3ad5a03ddf9add16ee494100bf395.png
By the way, do you need to get durations for all threads or for one main thread only?
P.S. I think it would be better to continue this conversation in private (if it is possible here) as it has a lot of technical details related to easy_profiler and not to Urho3d
Current profiler shows just current interval (slowest time and averages of event times in certain timespan, say 1 second) information of main thread events. I am just aiming to support current behavior of urho3d with new profiler, thats why i am asking for this.
To be honest - i am not sure this is even needed. Lets make sure. @cadaver / @weitjong - is there a chance PR integrating easy_profiler would be accepted? It has great tool for inspecting data but does not currently easily expose data at runtime so debug hud would loose display of performance data. Is that acceptable?
@victorzs if maintainers decide this is PR-worthy but we still need to maintain display of said info at runtime i will make an issue on github so we can discuss possible options. I hope they give an OK to loosing performance data display at runtime as it is not of much practical use anyway.
Well, as I said getting a duration of frame would be easy. I have just been confused by “displaying a tree of events”. The profiler requires an additional regime though in which it measures only frame times and do not store any events until enabled.
@rku, there are some changes to profiler: you can now get slowest frame duration (local max since the moment of last query) and previous frame duration in cpu ticks or microseconds.
Use profiler::this_thread::frameTimeLocalMax() for getting frame time for current thread or profiler::main_thread::frameTimeLocalMax() for getting frame time for main thread (main thread have to be marked by using EASY_MAIN_THREAD macro to get profiler::main_thread functions work).
Just reminding that “frame” means every top-level block (block without parents).
You can find these changes in develop-HEAD at github repository. We would be thankful for getting a feedback before adding this feature into release version. Any suggestions are welcome.
P.S.: Another big update is that we have changed GPLv3 license to MIT. Now EasyProfiler could be licensed under either of MIT or Apache v2.0 at your option (as always, you do not need to put both LICENSE files with sources/binaries - you need to put only one license file you like). We are still using double licensing due to compatibility with larger variety of projects.
New release v1.1.0 available.
Whats new in a few words:
GPLv3 license changed to MIT.
Added API functions to get maximum and average duration of frame (for getting information for current thread use profiler::this_thread:: ; for getting information for main thread use profiler::main_thread::). Maximum and average duration are calculated always (even if profiler is disabled). You should mark main thread with EASY_MAIN_THREAD.
Added live FPS Monitor to GUI. You just need to connect to the profiled application and fps monitor will start to receive max/avg frame duration of main thread. You should mark main thread with EASY_MAIN_THREAD in the profiled application. There are several settings for FPS Monitor available (See Settings -> FPS Monitor). If you do not want GUI to send network requests to the profiled application then just hide FPS Monitor ([x] button or right click -> Hide).
Ugh, the forum says that I have reached the reply limit so I have to edit previous post
@TheComet , @sabotage3d Yes, it could have issues if the same thread would be executed at different cpu cores.
We have used rtdsc for *nix platforms because it was the only fast solution:
with clock_gettime we have got ~800ns per block
with std:: chrono - ~600ns per block (msvc2013 and gcc4)
with rtdsc - ~15ns per block
with QueryPerformanceCounter - ~15ns per block
In our project we have millions of blocks per several seconds profiling session - that’s because we have chosen the last two timers.
We know that std::chrono has been fixed in msvc2015, but we have not tested yet. Also we have not tested std::chrono for gcc5.
I think we would add an option to CMakeLists for choosing the timer in release v1.2.0
If you wish to make use of that profiler you should take code straight from Atomic. Porting it would be very easy. Take a look at my PR to get idea what changes were needed to be done. Be aware that since PR code received some tweaks and fixes. In short: i replaced Profiler.h and Profiler.cpp, tweaked Object::SendEvent(), added easy_profiler third party dependency and added few other minor changes here and there.