Friday, September 16, 2022

The best little telescope that'll never get built

 I really love the IMX183 from Sony. For a very good price, you get 20 million BSI pixels with photon-counting levels of read noise and negligible dark current - truly a miracle of the economies of scale. The sensor is also large enough to have good light gathering capabilities, yet small enough to work with compact optics.

The problem is taking advantage of all 20 million of those pixels. 20 MP isn't that much - 5400-pixel horizontal resolution leaves you with precious little room to crop to 4K so actually resolving 20MP is important. You can't really buy a telescope that resolves 2.4um pixels - in the center anything diffraction limited will work, but if you want a fast, wide system getting good performance in the corners isn't happening. Obviously, the answer is to design your own telescope.

Now, I have zero interest in making telescopes - grinding your own lenses is ass and generally a poor use of time. If I wanted to spend time doing repetitive tasks I'd pick up embroidering or something. As such the system is designed to be reasonably manufacturable by an overseas vendor.

The telescope is a Houghton-type design with integrated field flattener, a focal length of 150mm, an entrance pupil of 76mm (3"), and a 6-degree field of view covering a 16mm sensor. The optical performance is pristine:

Normally, I would not vouch for a form like this - the telescope is much longer than its focal length, the image plane is inside the tube, and two full-aperture refractive elements are needed. However, in this case, we are building a small, tightly-integrated system: polishing a 3" lens is easy, the difference between a 6" and 10" OTA is negligible from a practical standpoint, and the IMX183 can easily be fit into the tube.

The glasses aren't great - the automatic glass substitution tool came up with BSM28 and BSM9 which are moderately costly, infrequently-melted glasses. The chemical properties are poor, basically akin to those of ED glass, but 77mm clear filters are cheap and readily available. The elements don't have any serious manufacturing red flags, though the curvatures are a bit steep compared to your usual refractor doublet.

Sunday, March 20, 2022

Image a DLP down through a microscope objective and look at it with a camera through the same objective


For some reason it's taken me a while to actually do this, but its easy to expose high resolution film or resist with a DLP. All you need is a camera, a beam splitter, a DLP, and some microscope parts.

The DLP is an ordinary DLP projector with the lens removed - I used a TI dev kit because the firmware offers nice features like "disable the LEDs" which can be used to, for example, replace the blue LED with a violet one for exposing resist. Unless you have a really exotic objective it is probably best to use a g-line (430nm) LED rather than a 405nm LED - the 430nm parts are hard to find but objectives will be quite dispersive in the far violet unless they are specifically designed to be apochromatic from violet to red. Your best bet is probably to leave the green LED in place for focusing and replace the blue LED with a violet one. A regular pico-projector will work just fine, but is less convenient to work with. You do save about $800 at current street prices though.

The light from the DLP reflects off a beam splitter and passes through the tube lens, which is just a 200mm a(po)chromat. A 200mm doublet would probably work fine here, but you can get better corner performance from a triplet or a quadruplet - the DLP is not very large though, so it might not matter.  The tube lens collimates the light and sends it down the microscope objective, which focuses it on the target.

The camera looks down the other beam splitter path and sees the light scattered off the target. Unfortunately, this technique only works with optically smooth targets - otherwise, the camera sees the surface texture of the target and not the imaged patterns displayed on the DLP.

Parfocalizing the camera and the DLP is easier than it seems - the object side of the tube lens is something like f/50 so the depth of field is very large. Roughly speaking, there is only a small range of distances where the the objective comes into focus at all. Either by reading the datasheet or by trial and error it is possible to roughly set the backfocus, then the camera and target position are adjusted for best focus. Once a starting position is found, the backfocus can be adjusted to minimize spherical aberration (microscope objectives gain some spherical aberration if the conjugates aren't the right ones).

The pixel scale in the capture is 560nm/pixel, so the Windows clock is only a few microns long. Performance is as expected but it is always entertaining to use the Windows desktop on a 1mm wide screen :)

Sunday, January 30, 2022

The State of the CPU Market, early 2022

It's been a year of shakeups in the high-end CPU market, what with catastrophic supply chain shortages, the rise of AMD, and Pat Gelsinger's spearheading of Intel's return to competency. Now that the dust has mostly settled, it's interesting to look at what's hot, and what's not.

The 5950X is still king...

Alder Lake i9 really gives AMD a run for the money, but the 5950X is still the king of workstation CPUs, especially now that you can buy it. It has aggressive boost clocks which allow it to beat every Skylake and Cascade Lake Xeon Platinum (including the 28-core flagships), consistent internal design which scales well in every application, consistent and manageable 142W power limit, and bonus server features like ECC support. ADL is good, but the 250W PL2 really kills it for workstation use (a good 12900K build requires serious effort to get right), and because of the heterogeneous internal layout it fails to scale on some operating systems and in some applications.

Scaling is really important in this era of 16, 32, and 64-core processors; many applications completely fail to scale past 16 cores, and even those that do exhibit much less than linear returns. As a result, those 16 highly clocked cores punch above their weight when it comes to real-world results - a 4.x GHz Zen 3 core isn't actually twice as fast as a 2.8 GHz Skylake core, but 16 4.x GHz Zen 3 cores can still outperform 28 Skylake cores because the 12 extra cores are doing less work.

...Except when it's not

The elephant in the room, of course, is single-threaded performance. Alder Lake outruns Zen 3 by about 20%, which is an enormous leap in x86 performance. For all intents and purposes, ADL is an 8-core CPU with 8 bonus cores that you can't really rely on. If you commit to the 8-core life (which encompasses a lot of applications), Alder Lake suddenly looks a lot more enticing, because 8 Golden Cove cores are 20% faster than 8 Zen 3 cores.

Of course, part of the reason why ADL can do this is because Intel 10 ESF ("7 nm") is a high performance node designed to scale to aggressive clocks at high voltages, and TSMC N7 is a SoC node designed for lower clocks and voltages. The price you pay is that those 8 Golden Cove cores draw twice as much power as 8 Zen 3 cores to perform 20% more work, which isn't very good engineering.

In the end, Zen 3 and Alder Lake are mostly complementary products. If your workflow is interactive content creation, gaming, or design work, ADL is right for you. If you're building a machine mostly to handle long renders and simulations, the 5950X is the best processor under $2000 for the job.

What about HEDT?

HEDT is a curious thing. In the beginning, desktop and server parts were cut from the same silicon. Starting with Nehalem, Intel experimented with bifurcating their designs into laptop (Lynnfield) and server (Gulftown) variants with rather drastically differing designs - Lynnfield was a SoC with an on-die PCIe root complex offering 16 lanes, Gulftown was a traditional design with an off-package PCIe controller offering 48 lanes. The bifurcation makes sense - laptops rarely need more than 16 PCIe lanes, whereas servers need dozens, or even hundreds, of lanes for accelerators, storage, and networking.

The bifurcation really took off starting with Sandy Bridge; Intel aggressively marketed the 2600K, which was cut from the mobile silicon. Sandy Bridge-E, the server based variant, filled a niche, but the platform was expensive and to top it off, unlocked processors based on the full 8-core SB-EP die were never released.

Since then, HEDT has come and gone - it hit a real low during the Broadwell-EP era, but experienced a resurgence with Skylake-X, which competed against the dubious-and-not-really-recommendable Threadripper 1 and 2. Unfortunately, Ice Lake-SP gives off Broadwell-EP vibes - namely, the process it is built on does not have the frequency headroom required to make a compelling desktop platform. This leaves AMD relatively unchallenged in the high end space:
  • The 24-core 3960X is currently a dubious choice over the 5950X - supply is poor, power consumption is high, and performance is not that much better. If you need balanced performance with good PCIe it's not a bad choice, but there are cheaper (Skylake-X, used Skylake-SP) and faster (the other Threadrippers) offerings in the category.
  • The 32-core 3970X is a good processor for most applications. Thanks to the blessing (or curse) of multicore scaling, it comes within striking distance of the 3990X is most applications at half the price, while offering the full suite of Threadripper features.
  • The 64-core behemoth 3990X is...not a very good choice, mostly due to extreme pricing ("$1 per X") and really bad scaling. Fortunately, it wields a very competent implementation of turbo, so it is never slower than the 3970X.
  • Threadripper Pro ("Epyc-W") is everything you've ever wanted, but is expensive and platform options are limited.
There are also a few interesting choices in the server space, with the usual caveats (long POST times, no audio, no RGB):
  • Dual Rome or Milan 64-core processors offer unmatched multithreaded performance, but not much can take advantage of 256 threads.
  • Dual 32-core Epycs are an interesting choice, offering performance comparable to a 3990X but with four times the aggregate memory bandwidth for all your sparse matrix needs
  • Dual low-end Ice Lake (e.g. 5320, 6330)  offers AVX-512 support and high memory bandwidth at a price and performance comparable to those of a 3990X, but may be more available. Unfortunately, 2P ICL motherboards are rather expensive.
As far as used options go, Haswell-EP and older are finally ready to retire, unless you really need RDIMM support. A pair of 14-core Haswell processors performs worse than a 5950X at twice the power, with all the caveats of 2P platforms attached. Otherwise:
  • Dual Skylake-SP is an OK choice, simply because Skylake Xeons are entering liquidation and Epyc Rome is not. Technologically, Skylake has no redeeming features over Rome, but the fact that you can pick up a pair of 24-core Platinums for slightly more than $1000 is interesting. It's worth noting only the 2P configuration is interesting; 1P Xeon is generally slower than a 5950X.
  • Epyc Naples is bad. Don't do it. Threadripper 1 falls in the same category, the only times you'd consider either of these is if you found a motherboard in the trash or something.

"My application scales indefinitely with core count"

No, it doesn't. But for this class of trivially-parallelizable application (rendering, map/reduce, dense matrix), the 5950X is a safe bet. The most extreme cases can benefit from one of the high core count platforms (Threadripper, Epyc, Xeon Scalable), but careful to benchmark the applications first - the 5950X wields a considerable clock speed advantage over the enterprise platforms which often swings things in it favor.

"I only need 8 cores"

The 12700K is probably your friend here, it's strictly faster than the 5800X. This category encompasses most content creation and all of CAD (minus simulation).

"Give me PCIe or give me death!"

This encompasses all of machine learning, plus anything which streams data in and out of nonvolatile storage. The 3960X is perfect for you, but in case its out of stock (which it probably is), the winner is...the 10980XE, which is fast enough to feed your accelerators and generally available. Of course, die-hard accelerator enthusiasts are going to look to more exotic platforms, and there, the platform dictates the choice of CPU.

"I'm out of RAM"

If your application requires more than 256GB of memory, Epyc-W is the CPU for you. Unfortunately, it is rather expensive, so the second place prize, and the bang-for-the-buck prize, goes to a pair of used 24-core Xeon Scalable, which gets you pretty darn close to Epyc-W for $1200.