Saturday, March 4, 2017

Characterizing Ryzen R7 overlocking behavior

It's been a while since I made an overclocking post, mainly because nothing interesting has happened since Sandy Bridge. Ryzen has been the first new thing in half a decade to hit the market, so I dutifully went and bought a R7 1700 and an Asus PRIME B350-Plus in the name of science.

All tests were done using a 400W Antec server supply. CPU currents were measured using a clamp DC ammeter on the 12V wires going into the CPU 8-pin. The rest of the setup consisted of Windows 10 Enterprise (I made sure Defender and Windows Update were disabled), an XFX HD5750 for basic video output, and a 320GB 7200RPM 2.5" drive. Not the most upscale of setups, but unlikely to affect overclocking figures.Everything was done on an open-air setup in a room with ~20C ambient temperatures - I expect stock cooler mileage to go down slightly in a case with higher ambient temps. Adjustments were made on-the-fly in Ryzen Master.

Stock cooler numbers

Wraith Spire is remarkably good for a stock cooler, able to hold off 120W while remaining reasonably cool. Gone are the days of shitty stock coolers that would run stock-clocked i7's at 90C.

[​IMG]

The bad news: 3.8GHz seemed somewhat unattainable on the stock cooler; there was a distinct feeling of thermal runaway (Prime95 would run until temps hit about 76C or so and then crash, but not hard enough to bring down Windows).

[​IMG]

Power seems quite in check out to 3.7GHz, and is not great at 3.8GHz, though you do lose some efficiency past 3.5GHz.

High-End Air numbers
I broke out the old Thermalright TRUE 120 for some testing; it's not a modern cooler by any means but it could cool 200+W 45nm processors back in the day so I figured it would be fine. Note that numbers up to 37x are the stock cooler numbers from above, the stock cooler did a good enough job that I didn't bother rerunning the benchmarks.

[​IMG]

40x was pretty much unattainable on my setup. Temperatures climbed rapidly, eventually crashing hard enough to trigger a reboot. I'm pretty sure this is a die thermal resistance limit; it is just not possible to remove 25W per core on Ryzen's die while maintaining the sub-70C temps needed for stability. I tried 1.4375V as well, but power climbed to 204W (17A) and the system similarly rebooted.

4C/8T numbers
Going by the theory that the frequency wall at 39x was thermal, I disabled four cores in hopes that the lateral heat conduction in the die would reduce temperatures substantially.

[​IMG]

While temperatures did go down dramatically, enabling much improved operation at 40x, 41x was very out of reach; temperatures remained in control, but Windows crashed before I could get meaningful power readings.

Prime95 Blend numbers
Not satisfied with the lack of 4GHz, and still going by the thermal stability theory, I tried Blend, which I figured is more representative of enthusiast thermal loads.

[​IMG]

It pulled off 40x fine, albeit requiring slightly more voltage to do so than the quad core test (crash after about 10 minutes of blending at 1.375V) As a bonus, the lower temps also translate into slightly reduced voltage requirements.


Voltage/Power scaling
For science's sake, I left the multiplier at 30x and slowly raised the voltage.

[​IMG]

Not as interesting as I expected, looks like standard quadratic-ish scaling out to 1300mV, then something happens, resulting in a bump in power consumption. The bump is looks like part of the reason we can't get past 40x, if process refinements could move it out we'd get more

Conclusions

  • Ryzen Master is pretty good. After leaving it running for half an hour while I wrote up this post it crashed, but seemed to work fine when I reopened it. The UI is damn responsive for a overclocking tool, as good as any I've ever used.
  • The PRIME B350-Plus BIOS is not great as far as overclocking goes. In particular, the option to set a fixed voltage is missing, and I have no idea what the 'FID' and 'DID' ratios do (raising DID seems to lower the target frequency??). Presumably this will be remedied in a later BIOS release.
  • Ryzen, or at least my sample, seems to have poor frequency-temperature characteristics past about 75C. On both the 38x stock and 40x aftermarket tests, Windows crashed when the temperature readout in Ryzen Master crossed 75C.
  • It looks like on 8 cores, the thermal limitations come from the ability to pull heat out of the die, not because of the thermal resistance of the heatsink to air. Temperatures climbed rapidly in response to frequency changes, and at least qualitatively, the fins on the TRUE 120 were quite cold.
  • On 4 cores, efficiency is solid. Voltage scaling is not as good as Kaby Lake (which ships at 4.2GHz and typically in the 1.225V neighborhood), but power consumption is in control and temperatures are very low.
  • Ryzen has two distinct bumps in efficiency. I'm wary of saying where they are based on a single sample, but it looks like the first is in the neighborhood of 35x and the second, around 38x. Pushing for that last two bins (or eight, if you want to be picky) causes an insane increase in power consumption on my sample; the sample to sample cutoff may vary but I feel safe in saying that you will lose a whole lot of performance/watt right around 3.8+/- GHz.

Wednesday, May 25, 2016

Bremsproject Season 2 - Part 1: New Electronics

Like many of my colleagues this past semester, I decided it would be fun to build a heavyweight battlebot ("The Dentist") for season 2 of the Battlebots TV show. "The Dentist" involved spinning up a 200KJ energy storage drum, and being as mechanically incompetent as I am, I decided that clearly my contribution to the sport would be some kind of absurd power system involving hundreds of volts and large motors.

I went and bought a few Prius inverters and dusted off the old Bremsthesis bits, and began this year's adventure into turning on someone else's hardware.

New boards, better boards

Wow, these motors have resolvers on them! So much more convenient than our contraption built out of 3D-printed mounts and analog hall sensors. Even more conveniently, there is a rather expensive IC that takes resolver input and outputs quadrature encoder pulses. Rather fortunately, this IC can be sampled from Analog Devices, or purchased from unofficial eBay vendors for about $15.

Implementing the IC using the datasheet notes is fairly straightforward; the only tricky part is getting the output amplifiers right (the sine/cosine generation on the chip itself is the output of a DAC, and is unsuitable for driving a resolver directly):


Layout is a little less straightforward. The following daughtercard works, but has about 1 LSB worth of noise:

Board  || Schematic

I could probably do better, but who needs 12 bits of position accuracy anyway?

The actual interface board for the Prius module remains largely unchanged in design from last year, the only changes being it uses the Morpho headers and is all surface mount, because hell if I'm using sockets in a combat robot.


Board || Schematic


New this year! Prius inverter pinouts

Pin #
Name
Description
Notes
1
IGCT
Gate drive power in
Reverse-protected, ~9-16V
2
GIVB
Small inverter, current sense V
Redundant with GIVA
3
GIVA
Small inverter, current sense V
50mV/A, zero-centered, isolated
4
GIWB
Small inverter, current sense W
Redundant with GIWA
5
GIWA
Small inverter, current sense W
50mV/A, zero-centered, isolated
6
GWU
Small inverter, phase input W
Inverting, 12V logic, isolated; no float
7
GVU
Small inverter, phase input V
See above
8
GUU
Small inverter, phase input U
See above
9
GIVT
Small inverter temperature
Unknown behavior, probably ratiometric
10
GFIV
Small inverter fault indicator
Probably open-collector
11
GSDN
Small inverter ENABLE
12V logic, setting low floats all phases
12
MIVT
Large inverter temperature
Unknown behavior, probably ratiometric
13
MFIV
Large inverter fault indicator
Probably open-collector
14
MSDN
Large inverter ENABLE
12V logic, setting low floats all phases
15
GINV
Gate drive power GND
Isolated
16
MIWA
Large inverter, current sense W
25mV/A, zero-centered, isolated
17
MIWB
Large inverter, current sense W
Redundant with MIWA
18
MIVA
Large inverter, current sense V
25mV/A, zero-centered, isolated
19
MIVB
Large inverter, current sense V
Redundant with MIVA
20
NC
No connection

21
NC
No connection

22
NC
No connection

23
MWU
Large inverter, phase input U
Inverting, 12V logic, isolated; no float
24
MWV
Large inverter, phase input V
See above
25
MWW
Large inverter, phase input W
See above
26
VH
Bus voltage
Ratio not yet determined
27
OVH
Inverter overvoltage
Probably open-collector
28
GND
Chassis ground
Leave unconnected

Power stage notes: Sides are probably CM400 and CM200-class IGBT's. Stock switching freqency is 5KHz; performance becomes rather poor past about 15KHz as internal delays and deadtime introduce severe distortion in synthesized waveforms. Current sensors saturate at 400 and 200A; the power module has fast overcurrent detection set somewhere around there (the entire inverter will automatically float if the phase currents are too high). Diodes are quite small in comparison to the IGBT's, and are probably not rated to full inverter current.

Sleeker firmware

Thanks to mostly me, the original Brems-code had bloated to rather ungainly proportions, featuring contexts, event loops, buffers, debuggers, and more classes than should ever be in a two-person microcontroller project. It was time to downsize; conveniently enough, Ben was working on an encoder-based FOC controller for his robot leg project, so I decided to borrow bits of his code and mix it with my own. The new code is here.

Tidbits of particular note:

a = new FastPWM(PWMA);
b = new FastPWM(PWMB);
c = new FastPWM(PWMC);

NVIC_EnableIRQ(TIM1_UP_TIM10_IRQn); //Enable TIM1 IRQ

TIM1->DIER |= TIM_DIER_UIE; //enable update interrupt
TIM1->CR1 = 0x40; //CMS = 10, interrupt only when counting up
TIM1->CR1 |= TIM_CR1_ARPE; //autoreload on,
TIM1->RCR |= 0x01; //update event once per up/down count of tim1
TIM1->EGR |= TIM_EGR_UG;

TIM1->PSC = 0x00; //no prescaler, timer counts up in sync with the peripheral clock
TIM1->ARR = 0x4650; //5 Khz
TIM1->CCER |= ~(TIM_CCER_CC1NP); //Interupt when low side is on.
TIM1->CR1 |= TIM_CR1_CEN;

This snippet of code sets up Timer 1 to run at 5Khz in center-aligned mode; that is, the center points of the switching waveforms when all three channels are off are aligned. This allows us to sample current when none of the phases are switching, hopefully reducing sensor noise. TIM1->ARR controls the switching frequency; halving its value doubles the switching frequency.

RCC->APB2ENR |= RCC_APB2ENR_ADC1EN; // clock for ADC1
RCC->APB2ENR |= RCC_APB2ENR_ADC2EN; // clock for ADC2

ADC->CCR = 0x00000006; //Regular simultaneous mode, 3 channels

ADC1->CR2 |= ADC_CR2_ADON; //ADC1 on
ADC1->SQR3 = 0x0000004; //PA_4 as ADC1, sequence 0

ADC2->CR2 |= ADC_CR2_ADON; //ADC2 ON
ADC2->SQR3 = 0x00000008; //PB_0 as ADC2, sequence 1

GPIOA->MODER |= (1 << 8);
GPIOA->MODER |= (1 << 9);
GPIOA->MODER |= (1 << 2);
GPIOA->MODER |= (1 << 3);
GPIOA->MODER |= (1 << 0);
GPIOA->MODER |= (1 << 1);
GPIOB->MODER |= (1 << 0);
GPIOB->MODER |= (1 << 1);
GPIOC->MODER |= (1 << 2);
GPIOC->MODER |= (1 << 3);

This bit configures the ADC's. We are cheating a little here; the ADC's are set up in sequence mode, but the sequences are length 1. Because only two values are necessary for basic operation we don't have to worry about dealing with the sequencing (somewhat useful, as mbed somehow goes and clobbers ADC_EOC flag functionality).

void zero_current(){
    for (int i = 0; i < 1000; i++) {
        ia_supp_offset += (float) (ADC1->DR);
        ib_supp_offset += (float) (ADC2->DR);
        ADC1->CR2 |= 0x40000000;
        wait_us(100);
    }
    ia_supp_offset /= 1000.0f;
    ib_supp_offset /= 1000.0f;
    ia_supp_offset = ia_supp_offset / 4096.0f * AVDD - I_OFFSET;
    ib_supp_offset = ib_supp_offset / 4096.0f * AVDD - I_OFFSET;
}

This function tries to compute an additional offset (caused by sensor drift, etc.) every time the controller resets; this is in addition to any measured calibration errors caused by resistor inaccuracies or AVDD error.

p = pos.GetElecPosition() - POS_OFFSET;
if (p < 0) p += 2 * PI;

float sin_p = sinf(p);
float cos_p = cosf(p);

ia = ((float) adval1 / 4096.0f * AVDD - I_OFFSET - ia_supp_offset) / I_SCALE;
ib = ((float) adval2 / 4096.0f * AVDD - I_OFFSET - ib_supp_offset) / I_SCALE;
ic = -ia - ib;

float u = CURRENT_U;
float v = CURRENT_V;

alpha = u;
beta = 1 / sqrtf(3.0f) * u + 2 / sqrtf(3.0f) * v;

d = alpha * cos_p - beta * sin_p;
q = -alpha * sin_p - beta * cos_p;

float d_err = d_ref - d;
float q_err = q_ref - q;

d_integral += d_err * KI;
q_integral += q_err * KI;

if (q_integral > INTEGRAL_MAX) q_integral = INTEGRAL_MAX;
if (d_integral > INTEGRAL_MAX) d_integral = INTEGRAL_MAX;
if (q_integral < -INTEGRAL_MAX) q_integral = -INTEGRAL_MAX;
if (d_integral < -INTEGRAL_MAX) d_integral = -INTEGRAL_MAX;
vd = KP * d_err + d_integral;
vq = KP * q_err + q_integral;
if (vd < -1.0f) vd = -1.0f; if (vd > 1.0f) vd = 1.0f;
if (vq < -1.0f) vq = -1.0f; if (vq > 1.0f) vq = 1.0f;


This is the juicy stuff that actually closes the d and q current loops. POS_OFFSET is a sensor offset measured by aligning the motor to the d-axis (phase A high, phases B and C low).

The rest of the code should be fairly self-explanatory; the mbed project should run out of the box on a Nucleo-F446RE.

Sunday, May 1, 2016

View Camera Part 1

Almost as important as the picture: the picture of the thing that took the picture
I take a lot of pictures of small things. Boards, product shots, machines and the like often warrant close-ups, and taking photos at close quarters necessarily implies a shallow depth of field. I'm also a resolution freak, and like having pixel-level sharpness (after all, what good are all those extra pixels if they have no information?). This necessarily implies having a large sensor, or an exotic ultra-fast lens and lots of tiny pixels.

The problem with having a large sensor is that at close range, depth of field falls off as focal length squared, so getting everything in focus on a KAF-22000 is a lot harder than getting everything in focus on a cell phone sensor. Now, I like buttery bokeh as much as the next person, but for applications like project documentation or web store photos having a 2mm slice of your subject in focus is not really acceptable. Stopping down doesn't fix the issue either; on most modern sensors you get to stop down somewhere between f/5.6 and f/11 before diffraction screws you.

The solution (to some extent at least) is to move the plane of focus with camera movements. For a surprising number of project documentation photos, this works great; boards shots in particular look natural, with the boards entirely in focus and the Z direction of focus approximately perpendicular to the board.

This post is motivated by a couple things. There is surprisingly little documentation on tabletop type photography with medium (1:5-1:20) reproduction ratios using movements and medium format sized image areas; most things out there are focused on shift-based wide angle photography for architectural or fine art landscape work. Optimizing a camera for wide-angle work is remarkably different from building one for close-up work; the movements involved are much, much smaller (a couple degrees instead of tens of degrees) and most cameras out there are pancake-type cameras with rangefinders and focus on portability. Precision of focus is also much less important since depth of field at f/5.6 on a 35mm lens focused at tens of meters is hundreds of times larger than an f/5.6 90mm lens focused at tens of centimeters. Secondly, not a lot of material out there is focused on building a "cheap" camera (relatively speaking, of course; my system still cost $2000+ and I got real lucky on the digital back). Lastly, a lot of information out on forums is posted by idiots or brand loyalists and just plain wrong.

Camera Body

You need half-degree positioning accuracy on all movements to resolve 9-micron pixels, or quarter-degree for 4.5-micron (A7R, D800) pixels. This means gears; cameras such as the Linhof Technica series, the Fuji GX680's or any of the numerous cheap monorails will not work. And no, your Speed Graphic will not work.

Cameras with all geared movements that aren't ass expensive include:

  • Sinar P/P2/X: Probably the best choice; the Sinar system is ubiquitous and the cameras are beautifully built. The X is probably a better choice than the original P simply because it is newer; the P2 is marginally better but is typically quite expensive.
  • Sinar P3: This is a P2 with smaller standards. Costs about twice as much as a P2 on the secondary market, or about four times as much as the X. If you have machine shop access there is little reason to get a P3, since you won't be able to afford any of the P3-specific accessories (CMV shutters, etc) anyway.
  • Cambo Legend Master: I used to have one of these; the gears aren't great and the camera is gigantic with no option to shrink because of the dovetail + L-bracket design. That being said they are dirt cheap, and form a baseline of sorts for what is acceptable.
  • Cambo Ultima: I've never used one, but appears to be the same price and build quality as a P3.
  • Rollei X-act2: This guy is one of two cameras (the other being the nonexistent Silvestri S5 Micron) designed from the ground up to be a digital system. It's tiny, but the accessories are expensive (albeit all easily made on a mill) and the limited range of movements are somewhat restrictive for tabletop work.
  • Linhof M679cs: If you're reading this post you can't afford it.

Lenses

Lenses for digital view cameras are sort of a mystery, shrouded in BS lore and almost-as-BS marketing materials. In general, they can be divided into three classes:

  • Symmetric 6-element lenses with a small image circle. This includes the Schneider Digitars, apochromatic enlarging lenses, Fujinon EBC GX lenses, and a handful of obscure 2x3 film lenses. They are usually cheap ($200-500) and resolve 9-micron pixels, but not much more. Among these the notable ones are:
    • APO Componon HM 4.5/90: The highest resolution of Schneider's symmetric lens line. It out-resolves its symmetric brethren slightly, and has such features as multicoating and a fancy blue ring on the lens.
    • Makro Symmar HM 5.6/120: Expensive, but the best macro choice. Even resolves 5-micron pixels as close range. Sometimes available in a barrel-type housing (aperture only, no shutter) on machine vision cameras for slightly cheaper.
    • Sinaron Digital 4/80: This is a 80mm Digitar in a Sinar DB mount. Awkward to use since the DB lensboards lack an externally accessible aperture, but dirt cheap (sub-$150) and ubiquitous since it was the cheapest lens in the old Sinarcam system.
    • Componons, in general, are the same optical design as the matching Digitars, but somewhat cheaper. The APO Componons are in fact the exact same lens; while the apochromatic design contributes very little to their resolving power, the modern coatings will improve contrast somewhat. That being said, they are much more expensive than their non-APO counterparts.
    • GX680 system lenses are rather good; however, their electronic leaf shutters will require nontrivial hacking (which as of this post's writing no one has done) to turn on.
  • Symmetric 8-element lenses based on a wide-angle design.:The lore behind these is unclear, but I've heard it said that the first generation Rodenstock Digarons (Apo-Sironar Digital) were rebranded Grandagons. These lenses have flatter fields of view than the 6-element lenses, but are rather rare and expensive. This line most notably contains:
    • Apo-Sironar Digital 90/5.6: As the longest focal length of the 8-element lenses, this lens can almost resolve 5-micron pixels. Marketed as the HR Digaron-W 90/5.6, it claims to be rated for 80 MP sensors, but MTF at 100lp/mm at the corners is less than stellar.
  • Retrofocus 12+ element lenses: Namely, the HR Digaron-S line of lenses. Not actually as expensive as you might think (the 100/4 is $1000 on a good day, including Copal 0 shutter). Also known as the Sinaron Digital HR. All good for 5-micron or smaller pixels. If you're using an A7R as a digital back and want any semblance of resolution, or if you have an IQ180 or a microstepping back, these lenses are more or less mandatory. It is worth noting that you will have limited movements with these lenses on 36x48mm sensors; they were designed for maximum performance on 33x44mm sensors (hence the off focal lengths such as the 60mm).

Sensor

Live view please! You'll have a miserable time focusing without it. Sadly coaxing live view out of a CCD with no electronic shutter whatsoever is exceedingly difficult. Out of the older CCD backs, Sinar has the best live view implementation, while Phase One had no live video until the P+ series. If you can afford them, the CFV-50C or the IQ3-100 are ideal, but if you are reading this post you probably can't. An alternative option is to use an A7R/II, which has a formidable, if somewhat small, sensor - this makes focusing somewhat awkward; if you want a normal focal length some cameras may not be able to focus to infinity without aggressively recessed lensboards and bag bellows.

(...more to come in the next post)

Friday, May 22, 2015

Turning Used Car Parts into Silly Vehicles Part 4: BREMSTHESIS 2

(the nonsense name comes from Bremschopper and the fact that some or all of this project is involved in Nick's bachelor's thesis)

So previously we had gotten field-oriented control more or less working - the next step was to get it reasonably stable, thereby reducing the number of firmware-induced injuries during vehicle testing and the number of dollars spent on replacement bricks.

What is FOC?

FOC (Field-Oriented Control) is a real fancy name for some operations to make current control on a motor better and easier. In math this would a coordinate transform and a rotation, but here in EE land they are given the names "Clarke Transform" and "Parke Transform" and are associated with some mystical hexagram or the other, but I digress...

The inspiration for the transforms comes as follows:

Suppose two of the phase currents with respect to electrical angle are


The Clarke Transform computes

which for the nominal phase currents above gives


The Parke Transform further computes


Because these values are DC, the loop no longer needs to run fast enough to sample the sinusoid. Furthermore, Id and Iq are related to the torque produced by the motor and the physical characteristics of the motor, which allows us to directly control and optimize torque generated by the motor for improved efficiency.

Loop Tweaking

The loop worked alright, but under heavy load there were strange clunking noises which happened at around 6Hz, as the plot below shows


There wasn't anything evident in the firmware that was running at 6Hz...

Turns out the clunking was happening due to internal overcurrent shutdown on the brick, which brought up the question why the internal overcurrent cutoff was tripping on our fairly low current set point. Then we remembered the current sensors weren't actually calibrated in the firmware...turns out they were off by a factor of 7 or so.

With the gross miscalibration taken care of the next step was to improve the loop response to allow for higher set points before transients caused the overcurrent protection to trip. Adding some manual low-passing on the throttle and dropping the gains by a factor of 100 or so more or less seemed to fix it (the former is probably equivalent to the latter but as the gains were quite low I was concerned about precision issues with the single-precision floats we were using for the math).

More Volts

The controller seemed to work at 300A - sadly performance of the vehicle was quite poor in the torque department. We checked our math, and discovered our gearing was off by a factor of 3 or so. One new sprocket later...

Such waterjet wow

...we decided it was time to for more volts. We soldered up a horrifying 120S3P pack:

Buttery
and installed it, upon which we were greeted with a whole slew of random overcurrent shutdowns on the road. Dropping the gains helped (3x the bus voltage made the gains effectively 3x higher) but not that much. A little further investigation led to the conclusion that the shutdowns only happened at high mechanical RPM, so it was time for another round of debugging.

One Last (?) Bug

There were a whole bunch of reasons why the controller could be failing: loop instability, electrical noise, outright errors in the code, some latent hardware issue that we had yet to think of, etc.

First we ruled out loop instability: the output of the loop (green below) was more or less pristine:


We ran some tests with different gains and such to try to get an idea of what was going on, but none of those proved to be too productive. Then we had the bright idea to remove the back wheel for some high-speed, no-load testing, upon which it was immediately revealed that the controller wasn't capable of spinning the motor past a few thousand RPM period, regardless of the load.

This pointed to a phase issue - any phase offset in the sensors would show up as phase offset in the motor position, which would cause all the assumptions of FOC to be violated and make the actual phase current go way up. Because the analog hall sensors had output amplifiers with a 17KHz -3dB point, it was reasonable to assume that the phase lag of the amplifier was nonzero even at a couple KHz. 

And indeed, tweaking the phase seemed to improve the situation, but I'll leave the excellent results of that for the next post...




Monday, March 23, 2015

Turning a Machine Vision Camera into a Digital Cinema Camera

(with a title that pretentious it has to be good, right?)

Intro

Recently Sony released the IMX174 sensor, a 16mm-format global shutter CMOS sensor good for 150FPS at 1920x1200. The sensor is current available in several machine vision cameras for around $1000, including the Point Grey Grasshopper3 and the Basler Ace.

I had been looking for a good high-speed video camera lately that could double as a day-to-day raw camera for general videography; while the Redlake was perfectly good camera, its absurd light requirements, short record times, and general bulk limited it to special occasions. Inspired by Shane's recent success with using a Grasshopper3 on a Freefly rig for general video, I decided to give it a go using a Basler camera and my own spin on what I thought the UI should look like.

The Camera

I managed to hunt the camera in question (acA1920-155uc) in stock at Graftek; it was overnighted to me and the next day I had it in hand. The first thing that struck me was how small the head was - at 29mm on a side the camera was barely larger than a dollar coin. The next thing that struck me was how bad the stock viewer was - AOI didn't work, and recording dumped debayered BMP files frame-by-frame to the disk. Basler promised me that AOI would be implemented soon, but didn't have a good answer for the recording issue ("use Labview" was not a valid answer!)

But as I often say, "its just code"...

First Light

Getting RAW images frame-by-frame out of the camera was easy - there was a Pylon SDK example for grabbing raw framebuffer data. Throughout this project, Irfanview proved to be invaluable for viewing and debayering raw frame data.

It was fairly straightforward to write the raw frames frame-by-frame to disk, but that was good for under 60FPS due to the overhead of file creation. The logical next step was to buffer the images to an in-memory buffer and then flush the buffer several hundred MB at a time to disk - using this strategy I was able to reliably grab at maximum frame rate...

...until about 5000 frames, at which point disk write speeds would plummet to <150MB/s and frames would start dropping like crazy. I spent a day trying to fix this, thinking it was a resource handling issue on my end, until I realized that the SSD on my development machine was nearly full. Most consumer SSD's have terrible write speeds towards the end of their capacities, as they run out of free blocks and have to start erasing and rewriting entire blocks for even small amounts of data. Deleting some files and trimming the drive seemed to fix the issue, and I could record continuously at maximum frame rate (with frames cropped from 1200p down to 1080p) so long as background processes were closed.

Getting a Preview

After much derping with GDI bitmaps and DirectX, I remembered that SDL existed and, as of version 2.0, was hardware-accelerated. A rough preview using terrible nearest-neighbor debayering was easy. Unfortunately, while the chunk-based disk writing scheme significantly boosted storage performance, it also had the side effect of freezing preview during the disk writes. This was OK for low data rates, but not for high frame rates where the disk would be mostly busy.The natural solution was to run the framebuffer update in a separate thread, which was easier said then done - Pylon ostensibly has callback functions that can be attached to its internal grab thread, but enabling callbacks seemed to disable Pylon's internal buffer update. The solution was to write a wrapper around the Pylon InstantCamera object that ran in an SDL_thread and updated an external buffer.

Finding a Host Computer

I wasn't particularly keen on paying $1000 for a Surface Pro 3 and the firesale on obsolete SP1's had largely ended, but fortunately the Dell Venue 11 Pro uses an M.2 SSD in the Core i3 and i5 versions and costs about $300 refurb. The stock SSD is somewhat lacking - it tops out at 300MB/s and needs a little bit of warming up in the form of a few hundred frames written before it can sustain this speed, but was OK to start with.

Also worth noting at this point I ran into random dropped frames, which turned out to be a crappy USB cable; unlike USB 2.0, USB 3.0 actually needs a good cable to sustain maximum transfer rate.

The horrible nearest-neighbor preview also looked pretty awful on the Venue's low-DPI screen (1080p on 11" versus 1600p on 13" on the laptop I had written the code on), so it was time to write a quick-'n-dirty software bilinear debayering function. It runs a bit hot (~40% CPU on the Venue) but seems good enough for 30FPS preview. There were also considerable tearing problems, which turned out to be a combination of thread contention in the buffer update code and a lack of Vsync during the screen update.

Better Post-processing

At this point I could grab raw ADC data to disk, but that wasn't too useful for producing viewable videos. I wrote a quick player that used bilinear debayering to preview the videos, and used Windows file associations to avoid having to write an actual GUI for it (this also made it more touch-friendly; also, cool trick: the Windows "Open With" menu option passes the file name as argv[1] to the program it calls). The 's' key in the player cycles between LUTs (currently there are linear, S-Log, Rec709, and an curve of my own creation that I thought "looked good"), 'd' dumps .RAW files for Irfanview or dcraw, and 'b' dumps bilinear debayered BMP's.

I also wanted better debayering without actually having to write a better debayer-er. After scrolling through more lines of pure global-variable based C than I ever wanted to (its somewhat horrific that most of the worlds' raw photo handling stems from that mess), I figured out how to add support for new formats to dcraw. For metadata-less files, dcraw picks the file type based on file size.

Sensor performance

The chart says it all:

Stops of DR versus stops of gain
The sensor has a base ISO of around 80.The plot above shows stops of DR versus gain in 12-bit mode (which is good for 78 FPS). Performance at base ISO is excellent; performance at higher ISOs is terrible enough to the point where digitally applying gain in post is probably better for most applications. At high gain there is severe banding noise originating from a yet unlocated source.

Downloads

Download the package here. You will need the Visual C++ runtime DLL's, Basler's Pylon 4 runtime (you'll probably want to uninstall Pylon 5), an acA1920-155uc camera, a USB 3.0 port, and a reasonably fast SSD. Use Pylon Viewer to set the bit depth and use either grab_12p_fs or grab_8_fs to record files. To use the player, associate .aca files with play_12p.exe and .ac8 files with play_8.exe. Inside the player, 'd' dumps RAW frames, 'b' dumps BMP's, and 's' cycles between curves.