Sunday, January 30, 2022

The State of the CPU Market, early 2022

It's been a year of shakeups in the high-end CPU market, what with catastrophic supply chain shortages, the rise of AMD, and Pat Gelsinger's spearheading of Intel's return to competency. Now that the dust has mostly settled, it's interesting to look at what's hot, and what's not.

The 5950X is still king...

Alder Lake i9 really gives AMD a run for the money, but the 5950X is still the king of workstation CPUs, especially now that you can buy it. It has aggressive boost clocks which allow it to beat every Skylake and Cascade Lake Xeon Platinum (including the 28-core flagships), consistent internal design which scales well in every application, consistent and manageable 142W power limit, and bonus server features like ECC support. ADL is good, but the 250W PL2 really kills it for workstation use (a good 12900K build requires serious effort to get right), and because of the heterogeneous internal layout it fails to scale on some operating systems and in some applications.

Scaling is really important in this era of 16, 32, and 64-core processors; many applications completely fail to scale past 16 cores, and even those that do exhibit much less than linear returns. As a result, those 16 highly clocked cores punch above their weight when it comes to real-world results - a 4.x GHz Zen 3 core isn't actually twice as fast as a 2.8 GHz Skylake core, but 16 4.x GHz Zen 3 cores can still outperform 28 Skylake cores because the 12 extra cores are doing less work.

...Except when it's not

The elephant in the room, of course, is single-threaded performance. Alder Lake outruns Zen 3 by about 20%, which is an enormous leap in x86 performance. For all intents and purposes, ADL is an 8-core CPU with 8 bonus cores that you can't really rely on. If you commit to the 8-core life (which encompasses a lot of applications), Alder Lake suddenly looks a lot more enticing, because 8 Golden Cove cores are 20% faster than 8 Zen 3 cores.

Of course, part of the reason why ADL can do this is because Intel 10 ESF ("7 nm") is a high performance node designed to scale to aggressive clocks at high voltages, and TSMC N7 is a SoC node designed for lower clocks and voltages. The price you pay is that those 8 Golden Cove cores draw twice as much power as 8 Zen 3 cores to perform 20% more work, which isn't very good engineering.

In the end, Zen 3 and Alder Lake are mostly complementary products. If your workflow is interactive content creation, gaming, or design work, ADL is right for you. If you're building a machine mostly to handle long renders and simulations, the 5950X is the best processor under $2000 for the job. What about HEDT? HEDT is a curious thing. In the beginning, desktop and server parts were cut from the same silicon. Starting with Nehalem, Intel experimented with bifurcating their designs into laptop (Lynnfield) and server (Gulftown) variants with rather drastically differing designs - Lynnfield was a SoC with an on-die PCIe root complex offering 16 lanes, Gulftown was a traditional design with an off-package PCIe controller offering 48 lanes. The bifurcation makes sense - laptops rarely need more than 16 PCIe lanes, whereas servers need dozens, or even hundreds, of lanes for accelerators, storage, and networking. The bifurcation really took off starting with Sandy Bridge; Intel aggressively marketed the 2600K, which was cut from the mobile silicon. Sandy Bridge-E, the server based variant, filled a niche, but the platform was expensive and to top it off, unlocked processors based on the full 8-core SB-EP die were never released. Since then, HEDT has come and gone - it hit a real low during the Broadwell-EP era, but experienced a resurgence with Skylake-X, which competed against the dubious-and-not-really-recommendable Threadripper 1 and 2. Unfortunately, Ice Lake-SP gives off Broadwell-EP vibes - namely, the process it is built on does not have the frequency headroom required to make a compelling desktop platform. This leaves AMD relatively unchallenged in the high end space: • The 24-core 3960X is currently a dubious choice over the 5950X - supply is poor, power consumption is high, and performance is not that much better. If you need balanced performance with good PCIe it's not a bad choice, but there are cheaper (Skylake-X, used Skylake-SP) and faster (the other Threadrippers) offerings in the category. • The 32-core 3970X is a good processor for most applications. Thanks to the blessing (or curse) of multicore scaling, it comes within striking distance of the 3990X is most applications at half the price, while offering the full suite of Threadripper features. • The 64-core behemoth 3990X is...not a very good choice, mostly due to extreme pricing ("$1 per X") and really bad scaling. Fortunately, it wields a very competent implementation of turbo, so it is never slower than the 3970X.
• Threadripper Pro ("Epyc-W") is everything you've ever wanted, but is expensive and platform options are limited.
There are also a few interesting choices in the server space, with the usual caveats (long POST times, no audio, no RGB):
• Dual Rome or Milan 64-core processors offer unmatched multithreaded performance, but not much can take advantage of 256 threads.
• Dual 32-core Epycs are an interesting choice, offering performance comparable to a 3990X but with four times the aggregate memory bandwidth for all your sparse matrix needs
• Dual low-end Ice Lake (e.g. 5320, 6330)  offers AVX-512 support and high memory bandwidth at a price and performance comparable to those of a 3990X, but may be more available. Unfortunately, 2P ICL motherboards are rather expensive.
As far as used options go, Haswell-EP and older are finally ready to retire, unless you really need RDIMM support. A pair of 14-core Haswell processors performs worse than a 5950X at twice the power, with all the caveats of 2P platforms attached. Otherwise:
• Dual Skylake-SP is an OK choice, simply because Skylake Xeons are entering liquidation and Epyc Rome is not. Technologically, Skylake has no redeeming features over Rome, but the fact that you can pick up a pair of 24-core Platinums for slightly more than $1000 is interesting. It's worth noting only the 2P configuration is interesting; 1P Xeon is generally slower than a 5950X. • Epyc Naples is bad. Don't do it. Threadripper 1 falls in the same category, the only times you'd consider either of these is if you found a motherboard in the trash or something. Summary "My application scales indefinitely with core count" No, it doesn't. But for this class of trivially-parallelizable application (rendering, map/reduce, dense matrix), the 5950X is a safe bet. The most extreme cases can benefit from one of the high core count platforms (Threadripper, Epyc, Xeon Scalable), but careful to benchmark the applications first - the 5950X wields a considerable clock speed advantage over the enterprise platforms which often swings things in it favor. "I only need 8 cores" The 12700K is probably your friend here, it's strictly faster than the 5800X. This category encompasses most content creation and all of CAD (minus simulation). "Give me PCIe or give me death!" This encompasses all of machine learning, plus anything which streams data in and out of nonvolatile storage. The 3960X is perfect for you, but in case its out of stock (which it probably is), the winner is...the 10980XE, which is fast enough to feed your accelerators and generally available. Of course, die-hard accelerator enthusiasts are going to look to more exotic platforms, and there, the platform dictates the choice of CPU. "I'm out of RAM" If your application requires more than 256GB of memory, Epyc-W is the CPU for you. Unfortunately, it is rather expensive, so the second place prize, and the bang-for-the-buck prize, goes to a pair of used 24-core Xeon Scalable, which gets you pretty darn close to Epyc-W for$1200.