"ARCTURUS WILL BE BEAST!! So, I had predicted that Navi10 is going to be a tiny super efficient chip that is going to form the basis of the chiplet architecture. Looking that the layout of RDNA it seems that that may indeed be the case. Make no mistake, MCM on a GPU is completely different than on a CPU because stream-processors are like thousands of threads that needs to work coherently towards a single output, vs a CPU where threads can be assigned asynchronously. However here are the key differences I noted in the RDNA architecture vs GCN:
1) Multi level cache, just like Zen 2 2) double the load bandwidth compared to Vega! 3) ) It's similar to bulldozer but that's a benefit for GPU efficiency given that it's very dedicated to gaming i.e. rendering and shading 4) designed for higher frequencies (just like bulldozer) 5) much lower latency 6) 4x Asynchronous Compute Engines, one per shader complex 7) low power consumption 8) 1 clock cycle per instruction vs 4 cycles for Vega. Very important if scaling out with MCM
Looking at the die layout, it seems like the Memory controllers together with the L2 cache (since L1 is now on the shader complex), Geometry engine, Command Processor, Asynchronous Compute Engines, Multimedia Engine, Display engine, PCIE4 circuitry; all of this becomes the I/O die. Note, this is not applicable to Radeon VII or MI50/60 since those are compute cards and rely on infinity fabric, High BW Cache controllers and memory architectures that are entirely different than gaming.
Therefore, My best guess is that Navi10 has successfully dethroned the RTX 2070, that much is 100% eveident. In fact this erases Nvidia's margins and forces price drops. Navi is TINY and AMD retains plenty of margins even with a price war. Whatever 'Super' Nvidia releases is meaningless if you have a 1440p adaptivesync monitor Navi is simply the best ROI. Nvidia's game is a pure marketing strategy whereas Navi is pure technical. Navi, when updated to a chiplet (that might become Arcturus) will be quite high-clocked. There will be infinity fabric connecting to I/O die just like Zen2. This is the reason Zen engineers were working with Radeon team. The I/O Die would obviously use HBM to feed the chiplets. The I/O die could also include Ray-Tracing since it's already doing the Geometry so why not perform the RT before sending the data onwards for rendering/shading. We could see Navi 20 having 80 or more CUs since the final rendering will be spanned between the now chiplet-shaders (that have L1 caches just like Zen2) and then the output will be assembled back in the I/O die with beefier L2 cache that has the Display Engine for the final output. RT won't even be an issue since there's plenty of horsepower with the chiplet scale out design. This architecture can even scale via Crossfire over PCIE4 since only one I/O die is the master. It could also be on 7nm EUV which would be genius!
Meanwhile Frankenstein $nvda Turing will need a complete overhaul to achieve anything close to this. It was a gimmick from the start, a tact to hold the gaming community hostage, something that Nvidia suckers continue to have Stockholm-syndrome over. Turing by itself cannot even be compatible with 7nm because it's a huge monolithic design. I haven't seen what Ampere looks like but based on the lack/withholding of information I suspect that Nvidia had built the wrong architecture for the assumed node and additional tweaking is required.
I hold Nvidia at a much higher regards than Intel btw. Nvidia have some really good engineers. However, AMD has the Ace executive team that have envisioned the future far better than Jensen.
Arcturus will be a beast! This is possibly why AMD is winning all cloud and console designs. For Compute you dont need an I/O die really, the I/O die is the Epyc Rome connected to Infinity Fabric with over a hundred PCIE 4 lanes per socket which is why AMD won the exascale super-computing design with Cray." |