Sunday, March 20, 2022

Image a DLP down through a microscope objective and look at it with a camera through the same objective


For some reason it's taken me a while to actually do this, but its easy to expose high resolution film or resist with a DLP. All you need is a camera, a beam splitter, a DLP, and some microscope parts.

The DLP is an ordinary DLP projector with the lens removed - I used a TI dev kit because the firmware offers nice features like "disable the LEDs" which can be used to, for example, replace the blue LED with a violet one for exposing resist. Unless you have a really exotic objective it is probably best to use a g-line (430nm) LED rather than a 405nm LED - the 430nm parts are hard to find but objectives will be quite dispersive in the far violet unless they are specifically designed to be apochromatic from violet to red. Your best bet is probably to leave the green LED in place for focusing and replace the blue LED with a violet one. A regular pico-projector will work just fine, but is less convenient to work with. You do save about $800 at current street prices though.

The light from the DLP reflects off a beam splitter and passes through the tube lens, which is just a 200mm a(po)chromat. A 200mm doublet would probably work fine here, but you can get better corner performance from a triplet or a quadruplet - the DLP is not very large though, so it might not matter.  The tube lens collimates the light and sends it down the microscope objective, which focuses it on the target.

The camera looks down the other beam splitter path and sees the light scattered off the target. Unfortunately, this technique only works with optically smooth targets - otherwise, the camera sees the surface texture of the target and not the imaged patterns displayed on the DLP.

Parfocalizing the camera and the DLP is easier than it seems - the object side of the tube lens is something like f/50 so the depth of field is very large. Roughly speaking, there is only a small range of distances where the the objective comes into focus at all. Either by reading the datasheet or by trial and error it is possible to roughly set the backfocus, then the camera and target position are adjusted for best focus. Once a starting position is found, the backfocus can be adjusted to minimize spherical aberration (microscope objectives gain some spherical aberration if the conjugates aren't the right ones).

The pixel scale in the capture is 560nm/pixel, so the Windows clock is only a few microns long. Performance is as expected but it is always entertaining to use the Windows desktop on a 1mm wide screen :)

Sunday, January 30, 2022

The State of the CPU Market, early 2022

It's been a year of shakeups in the high-end CPU market, what with catastrophic supply chain shortages, the rise of AMD, and Pat Gelsinger's spearheading of Intel's return to competency. Now that the dust has mostly settled, it's interesting to look at what's hot, and what's not.

The 5950X is still king...

Alder Lake i9 really gives AMD a run for the money, but the 5950X is still the king of workstation CPUs, especially now that you can buy it. It has aggressive boost clocks which allow it to beat every Skylake and Cascade Lake Xeon Platinum (including the 28-core flagships), consistent internal design which scales well in every application, consistent and manageable 142W power limit, and bonus server features like ECC support. ADL is good, but the 250W PL2 really kills it for workstation use (a good 12900K build requires serious effort to get right), and because of the heterogeneous internal layout it fails to scale on some operating systems and in some applications.

Scaling is really important in this era of 16, 32, and 64-core processors; many applications completely fail to scale past 16 cores, and even those that do exhibit much less than linear returns. As a result, those 16 highly clocked cores punch above their weight when it comes to real-world results - a 4.x GHz Zen 3 core isn't actually twice as fast as a 2.8 GHz Skylake core, but 16 4.x GHz Zen 3 cores can still outperform 28 Skylake cores because the 12 extra cores are doing less work.

...Except when it's not

The elephant in the room, of course, is single-threaded performance. Alder Lake outruns Zen 3 by about 20%, which is an enormous leap in x86 performance. For all intents and purposes, ADL is an 8-core CPU with 8 bonus cores that you can't really rely on. If you commit to the 8-core life (which encompasses a lot of applications), Alder Lake suddenly looks a lot more enticing, because 8 Golden Cove cores are 20% faster than 8 Zen 3 cores.

Of course, part of the reason why ADL can do this is because Intel 10 ESF ("7 nm") is a high performance node designed to scale to aggressive clocks at high voltages, and TSMC N7 is a SoC node designed for lower clocks and voltages. The price you pay is that those 8 Golden Cove cores draw twice as much power as 8 Zen 3 cores to perform 20% more work, which isn't very good engineering.

In the end, Zen 3 and Alder Lake are mostly complementary products. If your workflow is interactive content creation, gaming, or design work, ADL is right for you. If you're building a machine mostly to handle long renders and simulations, the 5950X is the best processor under $2000 for the job.

What about HEDT?

HEDT is a curious thing. In the beginning, desktop and server parts were cut from the same silicon. Starting with Nehalem, Intel experimented with bifurcating their designs into laptop (Lynnfield) and server (Gulftown) variants with rather drastically differing designs - Lynnfield was a SoC with an on-die PCIe root complex offering 16 lanes, Gulftown was a traditional design with an off-package PCIe controller offering 48 lanes. The bifurcation makes sense - laptops rarely need more than 16 PCIe lanes, whereas servers need dozens, or even hundreds, of lanes for accelerators, storage, and networking.

The bifurcation really took off starting with Sandy Bridge; Intel aggressively marketed the 2600K, which was cut from the mobile silicon. Sandy Bridge-E, the server based variant, filled a niche, but the platform was expensive and to top it off, unlocked processors based on the full 8-core SB-EP die were never released.

Since then, HEDT has come and gone - it hit a real low during the Broadwell-EP era, but experienced a resurgence with Skylake-X, which competed against the dubious-and-not-really-recommendable Threadripper 1 and 2. Unfortunately, Ice Lake-SP gives off Broadwell-EP vibes - namely, the process it is built on does not have the frequency headroom required to make a compelling desktop platform. This leaves AMD relatively unchallenged in the high end space:
  • The 24-core 3960X is currently a dubious choice over the 5950X - supply is poor, power consumption is high, and performance is not that much better. If you need balanced performance with good PCIe it's not a bad choice, but there are cheaper (Skylake-X, used Skylake-SP) and faster (the other Threadrippers) offerings in the category.
  • The 32-core 3970X is a good processor for most applications. Thanks to the blessing (or curse) of multicore scaling, it comes within striking distance of the 3990X is most applications at half the price, while offering the full suite of Threadripper features.
  • The 64-core behemoth 3990X is...not a very good choice, mostly due to extreme pricing ("$1 per X") and really bad scaling. Fortunately, it wields a very competent implementation of turbo, so it is never slower than the 3970X.
  • Threadripper Pro ("Epyc-W") is everything you've ever wanted, but is expensive and platform options are limited.
There are also a few interesting choices in the server space, with the usual caveats (long POST times, no audio, no RGB):
  • Dual Rome or Milan 64-core processors offer unmatched multithreaded performance, but not much can take advantage of 256 threads.
  • Dual 32-core Epycs are an interesting choice, offering performance comparable to a 3990X but with four times the aggregate memory bandwidth for all your sparse matrix needs
  • Dual low-end Ice Lake (e.g. 5320, 6330)  offers AVX-512 support and high memory bandwidth at a price and performance comparable to those of a 3990X, but may be more available. Unfortunately, 2P ICL motherboards are rather expensive.
As far as used options go, Haswell-EP and older are finally ready to retire, unless you really need RDIMM support. A pair of 14-core Haswell processors performs worse than a 5950X at twice the power, with all the caveats of 2P platforms attached. Otherwise:
  • Dual Skylake-SP is an OK choice, simply because Skylake Xeons are entering liquidation and Epyc Rome is not. Technologically, Skylake has no redeeming features over Rome, but the fact that you can pick up a pair of 24-core Platinums for slightly more than $1000 is interesting. It's worth noting only the 2P configuration is interesting; 1P Xeon is generally slower than a 5950X.
  • Epyc Naples is bad. Don't do it. Threadripper 1 falls in the same category, the only times you'd consider either of these is if you found a motherboard in the trash or something.

"My application scales indefinitely with core count"

No, it doesn't. But for this class of trivially-parallelizable application (rendering, map/reduce, dense matrix), the 5950X is a safe bet. The most extreme cases can benefit from one of the high core count platforms (Threadripper, Epyc, Xeon Scalable), but careful to benchmark the applications first - the 5950X wields a considerable clock speed advantage over the enterprise platforms which often swings things in it favor.

"I only need 8 cores"

The 12700K is probably your friend here, it's strictly faster than the 5800X. This category encompasses most content creation and all of CAD (minus simulation).

"Give me PCIe or give me death!"

This encompasses all of machine learning, plus anything which streams data in and out of nonvolatile storage. The 3960X is perfect for you, but in case its out of stock (which it probably is), the winner is...the 10980XE, which is fast enough to feed your accelerators and generally available. Of course, die-hard accelerator enthusiasts are going to look to more exotic platforms, and there, the platform dictates the choice of CPU.

"I'm out of RAM"

If your application requires more than 256GB of memory, Epyc-W is the CPU for you. Unfortunately, it is rather expensive, so the second place prize, and the bang-for-the-buck prize, goes to a pair of used 24-core Xeon Scalable, which gets you pretty darn close to Epyc-W for $1200.

Tuesday, September 21, 2021

A Small Astrograph with a Large Payload


Building a large telescope is hard; designing a small telescope is hard. What exactly do I mean by that? Well, there are parts of the telescope that don't scale well with size, for example, the instrument payload, the filters, or the focusing actuators. More often than not, a design which works well on a 1m-class instrument fails to scale down to a 300mm-class instrument because the payload is incompatible with the mechanics, or is so large that it fills the clear aperture of the instrument.

A small telescope should also be...small. A good example of this is the remarkable unpopularity of equatorially-mounted Newtonians; a parabolic mirror with a 3-element corrector offers fast focal ratios and good performance, but an f/4 Newtonian is four times longer than it is wide, which gets unwieldy even for a 300mm diameter instrument.

The Argument for Cassegrain Focus

Prime focus instruments are popular as survey instruments in professional observatories. However, they fail to meet the needs of small instruments because of:

  • Excessive central obscuration. A 5-position, 2" filter wheel is about 200mm in diameter. In order to maintain a reasonable central obstruction, a 400mm clear aperture instrument is required which is marginally "small". Any larger-diameter instrumentation requires a 0.6m+ class instrument which is outside of the scope of many installations.
  • Unreasonable length. The fastest commercially available paraboloids are about f/3. Anything faster is special-order and very expensive. An f/3 prime focus system is actually longer than 3 times its diameter because of the equipment required to support the instrument payload.
  • Challenging focusing. For a very large system, actuating the instrument is the correct method for focusing because even the secondary mirror will be several tons. For a small system, reliably actuating 10+ kg of payload with no tilt or slip in a cost-effective fashion is rather unpleasant.
  • Too fast. A short prime focus system is necessarily very fast, complicating filter selection. A very fast system also performs poorly combined with scientific sensors with large pixels.
The commercially available prime focus instruments (Celestron RASA/Starizona Hyperstar, Hubble HNA, Sharpstar HNT) are designed for use with small, moderately-cooled CMOS cameras, possibly with a filter wheel in case of the Newtonian configurations. The RASA is wholly unsuited for narrowband imaging because a filter wheel would almost cover the entire aperture.

A Cassegrain system solves these issues by (1) allowing for moving-secondary focusing (2) roughly decoupling focal ratio from tube length and (3) moving the focal plane to be outside of the light path.

The 50% Central Obstruction

A 50% CO sounds bad, but by area the light loss is 25%, or less than half a stop. A 300mm nominal instrument with a 50% CO has the light gathering capacity of a 260mm system, which is pretty reasonable. The 50% CO also makes sizing the system an interesting exercise, because at some point the payload will be smaller than the secondary and prime focus makes sense again.

The Design

The Busack Medial Cassegrain is a really nice telescope that this design draws inspiration from, but it requires two full-aperture elements each with two polished sides that makes it ill-suited to mass production. Instead, we build the system as a Schmidt Corrector, an f/2 spherical mirror, and a 4E/3G integrated corrector. There's really nothing to it - by allowing the CO to grow and using the corrector to deal with the increasing aberrations, an f/4 SCT is entirely within the realm of possibility. There's a ton of freedom in the basic design, the present example makes the following tradeoffs:
  • f/4 overall system allowing for the use of an f/2 primary (which we know is cheaply manufacturable based on existing SCT's). f/4 also allows for the use of commodity narrowband filters.
  • 400mm overall tube length (not counting back focus) is a good balance between mechanical length and aberrations. 50mm between the corrector and secondary allows ample space for an internally-mounted focus actuator.
  • 160mm back focus allows for generous amounts of instrumentation including filters, tip-tilt correction, and even deformable mirrors.
  • Integrated Schmidt corrector allows for good performance with no optical compromises.
  • Corrector lenses are under 90mm in diameter and made from BK7 and SF11 glass, all easily fabricated using modern computer-controlled polishing.
The total length of the system could also be shortened, and the corrector diameters reduced, by increasing the primary-secondary separation and reducing the back focus, depending on instrument needs. Overall performance is quite good, achieving 4um spots sizes in the center and a high MTF across the field.

Actually Building It?!

Obviously, you are not going to make a 300mm Schmidt corrector and a four-element, 90mm correction assembly at home. This design is probably buildable via standard optical supply chains (the hardest part would be getting someone who is neither Celestron nor Meade to build Schmidt correctors). The correction assembly should also be further improved - there are a huge number of choices for its configuration and the 'correct' one is probably the one that is most manufacturing-friendly.

Shoot me an e-mail in case you are crazy and want to do something with the prescription for this design!

Friday, July 9, 2021

GCA 6100C Wafer Stepper Part 2: the stages

The modern wafer scanner is a truck-sized contraption full of magnets, springs, and slabs of granite capable of accelerating at several g's while maintaining single-digit nanometer positioning accuracy. The motion systems contained within painstakingly try to optimize for dynamic performance by using active vibration dampening, voice coils, linear motors, and air bearings, all to increase the value of the machine for its owner (who spent a good fraction of a billion dollars on it).

As it turns out, a 80's stepper is none of these things. Scanners are immensely complex because they are dynamic systems - as the wafer moves in one direction, the reticle moves in the other direction, perfectly synchronized but four times faster. In contrast, steppers are allowed time to settle between steps, which allows for much more leeway in the motion system design. Throughput requirements were also lower; compare the 35 6" wph of an old stepper to the 230 12" wph of a modern scanner.

Old stepper stages are an instructive exercise in the design of a basic precision motion system; in fact, Dr. Trumper used to give this exact stage out as a controls exercise in 2.171. The GCA stages are also particularly interesting from a hardware perspective - they are carefully designed to achieve 40nm positioning accuracy using fairly commodity parts. The only precision parts seem to be the slides for the coarse stage, and even those are ground, not scraped.

The stage architecture

System overview

GCA steppers use a stacked stage architecture. Coarse positioning is done by two conventional mechanical bearing stages stacked on top of each other. Fine positioning is done by a single two-axis flexure stage. Rotational positioning, which only happens for alignment, is done using a simple open-loop, limited travel stage mounted on the fine stage. Focusing, which is done by changing the Z spacing between the lens and the wafer, is done by moving the optical column up and down with a linkage mechanism.

The position feedback system

The fine position feedback on GCA steppers is implemented through a two-axis HP 5501A heterodyne interferometer. Briefly, a stabilized HeNe laser is Zeeman split through a powerful magnet to create two adjacent lines separated by a few MHz with different polarizations. One of these lines is separated with a polarizing beam splitter and reflected off a moving mirror; this line is Doppler shifted due to the velocity of the moving mirror and beat against the stationary component to generate a signal. This signal is compared against a stationary REF signal to derive velocity and position measurements. Heterodyne interferometers are the preferred choice for metrology due to their insensitivity to ambient effects and power fluctuations.

The 5501A is the de facto choice for interferometric metrology; its successor the 5517 is still available from Keysight. A description of the system as found in the GCA steppers is as follows:

The laser points towards the rear of the stepper; a 10707A beam bender and a 10701A 50% beam splitter generate the two axes of excitation. The X and Y stages have identical measurement assemblies; the Y assembly is located to the rear of the stepper (behind the column) and the X assembly is located inside the laser housing. Both assemblies use a plane-mirror interferometer which differentially measures the wafer position against the optical column; the stationary mirror is a corner cube mounted to the column and the moving mirror is a 6" long dielectric quartz block mirror mounted to the wafer stage. The flats are precision shimmed to ensure orthogonality (since it is the orthogonality of the flats which determines the closed-loop orthogonality of the motion).

There are two additional position sensors in the system. The first is a sensor to measure the position of the fine stage relative to the coarse stage. Literature indicates that this is an LVDT, but on the 6100C it appears to be implemented as two photodiodes outputting a sin/cos type signal. The second is a brushed tachometer on each of the coarse stage drive motors, which is used for loop closure by the stock controller.

The coarse stage

The purpose of the coarse stage is to position the fine stage to within 0.001" of its final position. The stage is built as a pair of stacked plain-bearing stages; these stages are driven by brushed DC motors with brushed tachometers for velocity feedback. The motors go through a right-angle gearbox comprising of a bevel gear and several spur gear stages before being coupled by a flexible coupling to a long drive shaft which turns a pinion positioned near the center of each stage. This pinion drives a brass rack mounted to the stage which generates the final motions.

The fine stage

The fine stage is constructed as a parallel two-axis flexure stage with a few hundred microns of travel on each axis. The flexures are constructed from discrete parts; the stage is made from cast iron and the flexures themselves are constructed from blue spring steel. Actuation is by moving-coil voice coil motors with samarium-cobalt magnets, and position is read directly from the interferometer system.

The theta stage

The theta stage is a limited travel stage based on a tangent arm design. A (very small) Faulhaber Minimotor is coupled into a high reduction gearbox, which drives a worm gear that turns a segment of a worm wheel. The worm wheel pushes on a linkage which rotates the wafer stage about a pivot point.

Rotation control is entirely open-loop - the wafer is rotated once during the alignment process based on the fiducials observed through the alignment microscopes. A slow open-loop system is acceptable given that the speed of rotational alignment does not significantly affect wafer throughput.

The Z mechanism

The focusing mechanism is a limited-travel (according to literature, about 600um) flexure mechanism. The entire optical column is suspended on two large spring steel plates; a stiff spring counterbalances the weight of the column. A voice coil motor (identical to the fine stage VCMs) actuates a linkage mechanism which moves the column up and down.

Adjusting the mechanism is a bit subtle. The white rod sticking out is actually a tensioning mechanism for the counterbalance; it is possible to aggressively tension the spring to stiffen the assembly for transport. The cap at the end of the rod can be removed to reveal a nut and a piece of threaded rod with a flathead in it. You want to hold the rod in place with a screwdriver and crank on the nut with a wrench until the column just barely 'floats' in place.

Incidentally, this mechanism also reveals a fairly severe weakness of the focusing system - it is extremely undamped. Any disturbances on the column cause the whole assembly to ring like a bell, with the only source of damping being the resistance of the VCM. I think (though there is some information to the contrary) that 6000-series GCA steppers focused once per wafer, relying on wafer leveling to keep the image in resist in focus between fields. Otherwise if the focusing had to be highly dynamic there could be problems.

Sunday, May 23, 2021

GCA 6100C Wafer Stepper Part 1: Intro and Maximus 1000 Light Source

The yellow lights make it look more legitimate

I have always wanted to expose a wafer. I'd written off making my own transistors long ago (nothing that fits in a house is good for feature sizes small enough for interesting logic, and I'm not a good enough analog engineer to design interesting analog), but there are many useful optical and mechanical parts that can be made lithographically.

The usual route to home lithography is a microscope and a DLP, but the resultant ~2mm field sizes are not sufficient for mechanical parts and stitching a 20mm field out of 2mm subfields is very taxing on your motion system. Contact aligners are simple and perform well, but getting submicron resolution for interesting optical parts out of a contact aligner is challenging (the masks also get quite expensive).

The natural solution is to start with a stepper lens (which is basically a giant microscope objective with very bad color correction). There are a few variants - 1:10 lenses with a 10x10mm field, 1:5 lenses with a 14x14mm field, and 1:4 lenses, which weigh several hundred kg and have a 20x20mm field. Stepper lenses also come in several colors: g-line (436nm), i-line (365nm), and DUV (~250nm).

I wound up with a 1:5 g-line lens; the 1:5 lenses strike a good balance between performance and unwieldiness. I also had a set of stages pulled from a DNA sequencer good for a couple microns of resolution. The rough plan was to stack a fine stage on top of these and use a direct-viewing technique to perform alignment. However, the project quickly went south when I realized building an exposure tool entailed buying the parts out of an...exposure tool. Conveniently, a circa 1985 GCA DSW 6100C showed up for more or less scrap value near me, so one rigging operation later I was the proud owner of a genuine submicron stepper.

The DSW family of steppers are true classics; GCA Mann practically invented the commercial stepper in the late 70's. The GCA steppers remained more or less unchanged until the company's demise; everything from the g-line DSW 4800 to the AutoStep 200 shared a stage design, alignment system, and mechanical construction (unfortunately, they also all shared a terrible 70's-grade electronics package!). A number of GCA tools still survive in university fabs, mostly converted to manual operation. Briefly, the design consists of:

  • A cast-iron base with a cast-iron 'bridge' holding the optical column.
  • A stacked stage consisting of two coarse mechanical bearing stages driven by servomotors, two fine flexure stages constructed as a single unit driven by voice coil actuators, and a open-loop, limited-travel rotation stage driven by DC motors.
  • Feedback provided by an HP 5500-series interferometer that meters the displacement between two mirrors mounted to the optical column and two flats mounted to the fine stage.
  • A reticle alignment stage consisting of a small flexure actuator and fine-pitch screws for adjustment.
  • A focusing system using a photoelectric height sensor and a linkage mechanism that adjusts the entire optical column height (!) with a travel range of around 1mm.
  • An alignment system using two fixed microscopes to align the origin and rotation of the wafer.
  • A high-pressure mercury arc lamp with a homogenizer and filter (MAXIMUS) to illuminate the reticle with Kohler illumination of the appropriate wavelength.
My copy showed up in an interesting state of disrepair - the laser and alignment microscopes were missing (why anyone would want the alignment microscopes is beyond me), and the Maximus made rattling noises. The first step was to repair the light source.

Inside the Maximus 1000

Life before LEDs was bad. Arc lamps produce a concentrated point of light a few mm across, and turning that into uniform illumination across a 4" reticle is challenging. Now, normal people use a elliptical collector, a condenser lens, and a fly's-eye homogenizer to produce uniform illumination, but not GCA.

Instead, the inside of the Maximus looks like this:

The arc lamp goes in the center; the four identical assemblies each collect 1/4 of the arc lamp output.

The top left is a condenser lens assembly. The diagonal mirror is a cold mirror (it dumps IR into a heatsink not shown); the round filter below it is a narrowband filter for the design wavelength (in this case, 365nm). So far, reasonable. But, where you would usually see a homogenizer after the filter, there is instead a focusing lens. This lens focuses the lamp output into four fiber lightguides, which bundle into a single lightguide on the other end. The output of this lightguide is then imaged onto the reticle in the usual fashion by a illumination lens. This arrangement, while very complex, has a neat benefit: the characteristics of the illumination are solely determined by the light guide. The NA of the fibers sets the illumination field size, and the diameter of the output bundle determines the NA of the illumination. The illumination is perfectly uniform, since every fiber perfectly illuminates the whole field; missing fibers will only result in a slight overall loss of intensity.

As luck would have it, practically every screw in the Maximus was loose, and the bulb was snapped in half. The rebuild took a couple hours, and was greatly improved by removing the head from the stepper - dealing with loose lenses is much easier when you are not six feet off the ground (if by some chance you are reading this and also servicing a GCA stepper, removing the Maximus is easy - just pull the four socket head screws at the base of the condenser, un-route the shutter cables and lamp cables, and the unit lifts right off).

I haven't had a chance to check performance yet, as the bulb needs replacement. The Maximus uses Ushio USH-350DP bulbs. Of critical note: the USH-350DP is a two-screw-terminal designed for aligners. The Maximus uses a screw-on "bullet" on one end to convert it to a plug-in type; if you are changing bulbs, don't throw out the plug! 

Additional GCA resources

  • Here is a collection of various official GCA manuals scraped off the internet, mostly from university sites. The information in the manuals is helpful for understanding how the system works. If you are intent on actually using the stock GCA controller, the manuals are pretty much mandatory, since the PDP-based software is not very user friendly.
  • Here are various pieces of documentation from third parties (once again, mostly academic fabs). Additionally, there are several good DSW guides:

Sunday, October 11, 2020

The K39 mini-ITX case


Intro: the tiny and elusive K39

The K39 is the worlds' smallest case with discrete GPU support. No one really knows where it came from; there are several K39 variants listed on Chinese shopping site Taobao. There's even one listed on Amazon, with Prime shipping to boot, but at $148 with no PSU, it's of questionable value.

K39 specifications

The K39 is an odd case; it throws away…every feature…to achieve its small size. It has no drive bays, no external ports, and of course no lighting. Instead, it relies on onboard storage and rear I/O ports (though it is worth noting there are obscure K39 variants with a single USB port and a single 3.5mm jack). However, it is incredibly small - at 3.9 to 4.5L depending on the variant, it is even smaller than the 5.0L NFC SkyReach 4 Mini. Like the S4 Mini, it is limited to short (180mm) video cards. It also supports standard flex-ATX PSUs, though there are many caveats...

The K39 power supply

While the K39 can mount generic flex-ATX PSUs, it is, for all intents and purposes, a case with a built-in PSU. The K39 PSUs are very cheap 80Plus Gold rated units, built from recycled server parts.

In order to make the PSU more compact, the stock cable harnesses are cut short and soldered into a modular breakout board. Innovatively, the PSU uses thin, high-temperature silicone wire, which is both flexible and capable of carrying high currents. Due to cable routing restrictions, the modular cables are more or less essential for operation.

While it is possible to buy a K39 with no power supply, there isn't really a reason to do so. The modular supplies cost much less than the competition, and are available up to 600W, which is much more than the case can dissipate.

The build

The actual computer inside this case is strange and sort of terrible. It uses a long-discontinued ASRock X99 board, a 120W 14-core Xeon, and an R9 Nano. The ASRock board has gained some sort of strange cult following and costs as much now as it did new (or maybe there are a ton of people who need 128GB on an ITX board?). The Xeons are very cheap, but not very fast, with any cost savings over a Ryzen 4000 CPU immediately nulled by high motherboard prices. The Nano was never very good, it performs somewhere between a 1060 6GB and a 1070, but is crippled by its 4GB of VRAM. A dubious perk is that it is almost the fastest short card supported by macOS; the AXRX 5700 ITX can only be imported from China for a huge amount of money.

However, the components serve their purpose as being a maximum challenge testbed for the build. The X99e-ITX/ac is a very hard board to build around, especially with a 47mm cooler restriction. My previous thin X99 build had an 83mm clearance which was still quite difficult to work with - it required a discontinued Cooler Master cooler, custom waterjet stainless brackets, and a machined-down 120mm fan. The Nano's TDP is also at the top end of short cards; the other contenders (the 1070, RX 5700, and 2070) have similar ratings.

Build notes

There's surprisingly little to say here. The K39 variants are all a little awkward to work with because they require a complete disassembly for component installation. On this flavor, the front panel comes off to reveal a freestanding motherboard tray. The I/O shield pops into the outer shell, then the tray with installed motherboard and riser slides in. The PSU and PSU cables go in next, followed by wiring, the GPU, and finally the front panel. This is where the super-flexible PSU cables come in handy; it would be impossible to route normal cables in the case.

The standard PSUs actually have a SATA power connector on them, but there is no room in the case for a 2.5" drive. My understanding is that folks who have 2.5" drives in the case use foam tape to affix them to one of the side panels.


I was targeting a laptop-like acoustic profile on this build; that is, quiet when idle and loud and hot under load. I had originally wanted to use a 1U Dynatron vapor chamber active cooler. Unfortunately, the Dynatron was more or less unusable; it ran extremely hot (60C! at idle) and was amazingly loud. Even with a 50mV undervolt and a custom fan curve that ran minimum speed until 75C, the blower would randomly spool up with even one core active.

It was clear that some more "engineering" was needed. Fortunately, 47mm just barely clears a 1U passive cooler (29mm) with a 15mm thick fan stacked on top. To find a fan, I took to the trusty old technique of disassembling stock coolers; stock coolers are often laughed at, but to get sufficient cooling performance out of a small, cheap heatsink requires a serious fan. I ended up using a 70mm, 8.4W fan out of an FM2+ stock cooler:

The fan required a bit of minor machining (it had mounting feet that put it over the height limit). Some brackets were drawn up and printed…

…and the whole thing was put together with some screws and 3M VHB tape.

The small 40mm fan is critical; without it, the CPU would cook the SSD enough to severely throttle, meaning a lengthy cooldown period was necessary after heavy loads. It also cools the PCH by about 15C, which is not too bad for such an anemic fan.

The cooler bolts into place neatly, and the VHB seems to handle the high temperatures just fine.


We'll start with the bad news: the 2683 v3 is no longer fast. It does score a healthy 180 fps on KeyShot, but that's merely the performance of a 9900K, a CPU with six fewer cores. On the other hand, it does perform like a $350 CPU for $120, so if rendering is all you do it's not a bad choice.

There are a couple ways to tweak performance. An X99 + Xeon specific is to undervolt the CPU by 50mV. Furthermore, most boards allow custom-tweaked fan curves:

This is pretty necessary with a small, noisy fan; most boards idle too high by default. With a tweaked fan curve and small undervolt, the CPU idles at a slightly-warm-but-not-concerning 50C:

And a fairly nominal 66W:

Load is much more interesting. Most boards have some manner of power tuning available. On this particular board, the electrical design current (EDC) was settable, but unfortunately, the limit did not seem to correspond to actual amps. Thankfully, it was monotonic...setting an EDC of 80 resulted in a load power consumption of about 180W:

Delta-over-idle of 114W corresponded to an all-core speed of 2.3GHz and a load temperature of 78C.

Importantly, neither the SSD nor the DIMMs overheated, though the memory does get quite warm:

Removing the current limit results in 200W even of power consumption, representing a 134W delta-over-idle.

At this point, the fans get really loud, but temperatures are still in control:

The RAM is now looking uncomfortably hot - some heatspreaders might be warranted...


We learned today that it is possible to dissipate 135W with a 47mm cooler. We also learned the importance of ambient airflow - the 40mm fan doesn't move much air, but was absolutely critical for success. In addition, we learned that Haswell Xeons have underwhelming performance in 2020, though for the price they are pretty solid. Fortunately, a lot of this is still applicable to the upcoming Ryzen 5000 CPUs; 135W is perfect for getting stock performance out of a 105W Ryzen 5000. True masochists might also consider the EPC621D4I; with careful tuning, a 28-core Xeon Platinum may be possible.

Saturday, April 25, 2020

Cambridge Technology 6230H galvo short teardown

Cambridge Technology's galvos are popular as the de facto high performance galvonometer scanner.  The highest performing models are moving magnet scanners; these are conceptually similar to a single phase brushless motor. However, in galvo duty the rotor never completes a revolution, instead scanning back and forth with a maximum range of around 40 mechanical degrees (+/- 20 degrees). This range is small enough that rotor position is not part of the torque generation loop (and in fact, with a single set of coils, it is impossible to control the stator current phase); instead, galvos operate as current-amplitude-to-torque converters.

The galvo in this teardown is a 6230H, a mid-sized model still in production. The rotor (second from the left, bottom row) is a radially magnetized, single-piece sintered neodymium magnet with a very long aspect ratio. This aspect ratio maximizes the torque-to-inertia of the rotor - torque scales as LR*R = LR^2, whereas rotor inertia scales as MR^2 = LR^2*R^2 = LR^4, so torque-to-inertia falls off as R^2. I'm not sure why further optimizations weren't made to the shaft, for example, a hollow shaft and/or a shaft made of an exotic alloy would have reduced inertia further, and the CT galvos are not particularly cost-sensitive products.

The stator (top left) is epoxied into the galvo housing with (hopefully thermally conductive) epoxy. To avoid saturation, the stator is a complex air-cored winding, similar to what is found in high performance servomotors. Not having stator iron has the added benefit of greatly reducing stator inductance, which could limit the electrical step response of the system. A coreless stator means the short-term current of the system is only limited by the fusing of the stator windings, and, in practice, by demagnetization of the rotor PM's due to off-axis current at the ends of travel (since the stator field does not rotate to stay in sync with the rotor field).

The real voodoo in the Cambridge galvos is the position sensor, consisting of the quadrature photodiode assembly in the bottom left. This is used in conjunction with the IR illuminator (bottom row, third from the left) to measure the rotor position with impressive accuracy. 8uRad short term repeatability equates 16b+ of angle data over 40 degrees of travel, and a linearity of 99.9% open loop is very nearly 10 bits with no additional calibration.

Overall, no surprises - this is a state of the art galvo and the design and construction show it. The motor part is nothing fancy (I'm sure you could copy it with a little help from China), but the position sensor would require quite the R&D to duplicate, especially since the little photodiode "slices" look like custom parts.