![]() Hitman is one such example where People are (not surprisely) shocked to find out that async compute is only responsible for 5-10% of the performance gains. Typically going from DX11 to DX12 alone (without even using async compute) is responsible for a lot of the performance gain to begin with on GCN. Timespy is an apples to apples comparison where the only variable being changed is async compute. Some people are expecting 20-30% gains with async compute on GCN hardware because they are so used to seeing those gains in benchmarks that compare DX11 to DX12 performance. If anything, Timespy gets better async compute results out of GCN than the best real world implementations so far. Most of the performance gains in AMDs case are due to CPU driver head reductions" Async Compute, which has been used for SSAA (Screen Space Anti Aliasing), SSAO (Screen Space Ambient Occlusion) and the calculation of light tiles in HITMAN, was also “super hard” to tune according to IO Interactive, too much Async work can even make it a penalty, and then there’s also the fact that PC has lots of different configurations that need tuning.Įven in Ashes of the Singularity, which has been the center of the whole DirectX 12 & Async Compute talk for quite a few months, developers have confirmed that Async Compute is a modest performance increase compared to others in their game.įrom Oxide developer: "Saying that Multi-Engine (aka Async Compute) is the root of performance increases on Ashes between DX11 to DX12 on AMD is definitely not true. ![]() It’s quite surprising to read that even AMD cards merely got a 5-10% performance boost, especially after AMD endorsed HITMAN’s implementation as the best one yet. ![]() I also didn't see anywhere discussing the CU's being able to execute separate workloads at the same time. It also buries the notion that 3DMark aren't fully utilizing GCN hardware, when its the hardware itself that decides which Shader Engine to use for what task (granted 3D Mark could mess around with queue priorities, but then it would impact Nvidia as well). I'm inclined to believe its the latter (otherwise they would use the word interlaced), which explains why the white paper also discusses context switches, and how they should not be done too often. Why does it have context switching in the 1st place if the CU would be able to, as you claim - it's not, that's why context switching existsīased on the white paper, you can almost take it one of two ways:ġ - two separate queue's can access the resources simultaneously of a single Shader Engine, orĢ - two separate queue's can work simultaneously using different Shader Engines. Point me where is sais black on white that the CU can execute render and compute workload at the same time No, it has the same strain, just fewer resources with which to address it, but I'm not going to dive into my disdain for game and engine developers' weakness in HPC concepts. You're talking a few tens of microseconds actually to send commands for pre-emption according to Nvidia's DX 12 optimization guide for Maxwell. Why move 4 bytes at a time when you could move 32 of them (AVX2)? ![]() Heck, for data marshaling there's a metric ton left (marshaling is currently done using SISD instructions to put it into perspective. The context switching is only at the driver level for Maxwell, and there is a ton of CPU headroom left on Intel processors. We are talking milliseconds, but that is all it takes to miss a frame refresh. However i think you forget that with Maxwell and Pascal, this should ONLY apply in cases where there is sufficient CPU horsepower to overcome the extra strain that driver level context switching and merging poses.Īn i3 with a 1060 (970 equiv) would have more latency to the rendering pipeline due to driver overhead and strained CPU resources then a i5 or i7.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |