The question here is: Is calling malloc twice having any impact on performance? Or are there other things that outweigh the cost of allocating?
If this is a concern, then I have two ideas. The first is to inline refs_ and weakRefs_ into RefCounted and get rid of RefCount entirely. In order for weakrefs to still work, the conditions for when the object is destructed and freed would have to be modified: Call the destructor when refs_ reaches zero, free the memory when weakRefs_ reaches zero. This has the advantage of only having a single malloc call and the refcounts are located close to the object itself which makes the cache more coherent. The disadvantage is the additional delete logic, having to overload operator new, and the object remains allocated as long as there are weak references pointing to it.
Another idea might be to have a memory pool for RefCount objects. This would improve allocation speed but the refcount would be located far away from the object in memory, which is bad news for the cache whenever you modify the refcounts.
I’d like to hear your thoughts. Maybe this whole thing is also not an issue.
Either way you go there’s something to be lost. Threading issues with memory pools, memory issues with moving it into the same class.
And if you try to do it like a shared pointer thing where you store the counter and instance in the same structure to be near each-other, you end up “leaking” memory because if there’s still a weak reference you can’t release the counter and thus you can’t release the memory for the actual instance. Not to mention that you get a few inconveniences.
Either way it gets nasty. That’s why shared_ptr and similar smart pointers use a similar counter that is allocated separately. And ReferenceCounted is no different than that.
If you do have a memory allocator that can allocate the counter near the instance, you could just overwrite the new operator to replace the default behavior. Should reduce the number of changes to be made to existing code.
Good point. Maybe it’s possible to do as @S.L.C said, somehow allocate RefCount near the object being refcounted. Or perhaps even allocating a larger block of memory to fit both and constructing them next to each other?
In the grand scheme of things, this probably won’t matter too much for two reasons: 1) weak references aren’t that common and 2) the refcounted objects all have a relatively small memory footprint. I can’t think of any single Urho class that requires over, say, 1kB of memory when allocated.
I’m just thinking out loud here. I’ll have to actually do some measurements.
I think cache misses and memory fragmentation can lead to performance issues and should not be neglected. On top of that, proper profiling on core mechanics is hard to achieve, so it’s very difficult to asses the impact.
There might be another way that will avoid double allocation and non-contiguous memory, and without detaching the destructor from the deallocation.
Instead of a weak reference counter, we can have a weak reference double linked list (weak_ptr containing node information), with the head kept in RefCounted. The penalty is a tail insert at weak reference creation, a node removal at breaking the weak reference, and a list traversal at object destruction (when strong refs go to zero) to invalidate existing weak refs.
The assumption is that weak references are used just occasionally to break circular dependencies and that most objects don’t have weak refs to them.