For application specific reasons I must render to a texture and then retrieve that texture back into system memory. This is currently painfully slow (~10 fps) and results in CPU and GPU idle bubbles. It would help if I could either use the GPU to handle the blitting or overlap the GetData with the beginning of the next frame. Are either of these options viable in Urho3D D3D9 / D3D11 or OpenGL?
I noticed there is a project to implement this functionality as a plugin for Unity:
Your best possible situation is to split it up. You’ll end up with a stream of readback-tasks.
Fire off the readback like it was any other graphics-call.
Then do the actual map and read later so you aren’t forcing the CPU to wait until the GPU has copied everything into staging-texture’s CPU-local mem has finished until you truly must. You can use DO_NOT_WAIT in the map call to return an error if it would have blocked (then you handle it again later when it finally doesn’t return an error).
If you can’t wait a frame or 2 you’re kind of sunk so you’ll either need to rework your stuff to understand that there’s a delay, or swallow the wait and settle with at least not blocking for the whole time.
Okay makes sense & thanks for the clarification. To my understanding (not a D3D person), the staging texture is needed and certainly when I attempt to copy directly from the texture I get an empty buffer.
With reference to this I am going to try rotating the staging textures and of course keeping them in a queue rather than creating them on the fly:
Makes sense and I think this aligns with the approach listed in the StackOverflow post-- the ResourceCopies are necessary, but apparently they can overlap with other GPU work if the staging buffers are used in sequence?