Performance: Possible to do Texture GetData() Asynchronously?

For application specific reasons I must render to a texture and then retrieve that texture back into system memory. This is currently painfully slow (~10 fps) and results in CPU and GPU idle bubbles. It would help if I could either use the GPU to handle the blitting or overlap the GetData with the beginning of the next frame. Are either of these options viable in Urho3D D3D9 / D3D11 or OpenGL?

I noticed there is a project to implement this functionality as a plugin for Unity:

Not sure how well it works, but I was looking through the code to see if I could staple this into what I have in Urho3D as a starting point.

TL;DR: There is no async API for GAPI objects in Urho.

Try calling GetData in the beginning of next frame and remove/cache all redundant work from it

Thanks for your response. I didn’t understand this: how would I identify what work is redundant and cache or remove it?

Also: is D3D11 the fastest way to run this in Windows or should I try either OpenGL or D3D9?

Your best possible situation is to split it up. You’ll end up with a stream of readback-tasks.

Fire off the readback like it was any other graphics-call.

Then do the actual map and read later so you aren’t forcing the CPU to wait until the GPU has copied everything into staging-texture’s CPU-local mem has finished until you truly must. You can use DO_NOT_WAIT in the map call to return an error if it would have blocked (then you handle it again later when it finally doesn’t return an error).

If you can’t wait a frame or 2 you’re kind of sunk so you’ll either need to rework your stuff to understand that there’s a delay, or swallow the wait and settle with at least not blocking for the whole time.

I linked relevant line of code that causes a lot of delay (besides GPU-CPU sync): CreateTexture2D

I never tested it.

Okay makes sense & thanks for the clarification. To my understanding (not a D3D person), the staging texture is needed and certainly when I attempt to copy directly from the texture I get an empty buffer.

With reference to this I am going to try rotating the staging textures and of course keeping them in a queue rather than creating them on the fly:

Makes sense and I think this aligns with the approach listed in the StackOverflow post-- the ResourceCopies are necessary, but apparently they can overlap with other GPU work if the staging buffers are used in sequence?

Simplest thing you can try is to keep staging texture created in texture itself, and call GetData in the beginning of next frame. See if it makes things better.

Alright, using a ring buffer of staging textures was good enough to get me from 60 to 98% GPU utilization. Not sure which one of you to click solution on, thank you both for your help on this!