Thursday, March 29, 2018

Cineforming!

Intro

Last November, the excellent Cineform codec went open-source. Cineform is a high-quality intermediate codec in the same spirit as DNxHR and Prores, with the notable distinction that it is based on wavelet, as opposed to DCT, compression.

Wavelets are great for editing; because the underlying transforms operate on the entire frame, wavelet codecs are free of the banding and blocking artifacts that other codecs suffer from when heavily manipulated. The best-known wavelet codec is probably RED's .R3D format, which holds up in post-production almost as well as uncompressed RAW.

Cineform has a few other cool tricks up its sleeve. Firstly, it is fast; the whole program is written using hand-tuned SSE2 intrinsics. It also supports RAW, which is convenient; encoded RAW files can be debayered during decoding into a large variety of RGB or YUV output formats, which helps in maintaining a simple workflow - any editor which supports Cineform can transparently load compressed RAW files.

Benchmarks

I wanted to do some basic benchmarking on 12-bit 4K RAW files to get an idea of what kind of performance the encoder is capable of. All tests were done on a single core of an i7-4712HQ, which for all intents and purposes is a 3GHz Haswell core. As encoding is trivially parallelized (each core encodes one frame), the results should scale almost perfectly to many-core systems.

The test image chosen was the famous 'Bliss' background from Windows XP:


As Bliss is only available in an 8-bit format, for 12-bit encoding tests, the bottom 4 bits were populated with noise (a worst case scenario for the encoder). Frame rates were calculated by measuring the time it took to encode the same frame 10 times with a high resolution timer. As the frames do not fit in L3, discrepancies caused by cached data should not be an issue.


Analysis

All four quality settings can fit a 4K 120 fps 12-bit stream under the bandwidth of a SATA 3.0 link. Furthermore, the data rates are under 350 MB/s, so there exist production SSD's that can sustain the requisite write speeds. Unfortunately, FILMSCAN1 and HIGH require pretty beefy processors (8+ cores) to sustain 120 fps; a 6c/12t 65W Coffee Lake is borderline even with HT (you don't get much headroom for running a preview, rearranging data, etc.). An 8700K (6c/12t, 95W) can handle it with room to spare, but at the expense of power consumption - 8700K's are actually more than 95W under heavy load. MEDIUM and LOW easily fit on a 65W processor. The upcoming Ice Lake (8c/16t, 10nm) processors should improve the situation, allowing for 4K 120 fps to be compressed on a 65W processor at the highest quality setting.

Going beyond, 4K 240 fps seems within reach. Using existing (Q2 '18) hardware, LOW and MEDIUM are borderline for a hotly clocked 8700K, with likelihood of consistent performance increasing if data reordering and preview generation are offloaded. Moving to more exotic hardware, the largest Skylake Xeon-D processors (D-2183IT, D-2187NT, and D-2191) should capable of compressing HIGH in real time, if not at 240 fps then almost certainly at 200 (a lot will depend on thermals, implementation, HT efficiency, and scaling, especially since Xeon-D is very much a constant current, not constant performance, processor).

Anything faster than 4K 240 fps (e.g. a full implementation of the CMV12000, which can do 4K 420 fps) will require some kind of tethered server with at least a 24c Epyc or 18c Xeon-SP processor (and the obvious winner here is Epyc, which is much cheaper than the Xeon).

Quick Update: a Faster Processor

Running a simple test on an aggressively tuned processor (8700K@4.9GHz) we get FILMSCAN1 25.5 fps, HIGH 28.9 fps, MEDIUM 39.6 fps, LOW 50.6 fps. 4.9 GHz is a little beyond the guaranteed frequency range of an 8700K (they can all do 4.7GHz, which is the max single core turbo applied to all cores), but practically all samples can do it anyway. This suggests a neat rule of thumb: LOW is good for twice the frame rate of FILMSCAN1, both in data rate and compression speed.

Addendum: Cineform's packed 12-bit RAW format

I have never seen such an esoteric way to pack 12-bit pixels (and after spending many hours trying to figure it out, I now understand why the poor guy who had to crack the ADFGVX cipher became physically ill while doing it).

The data is packed in rows of most significant bytes interleaved with rows of least significant nibbles (two to a byte). Furthermore, two rows of MSB's (each IMAGE_WIDTH bytes long) are packed, followed by one full-width row (also IMAGE_WIDTH bytes long) containing the least-significant nibbles of the previous two image rows.

To add to the confusion, the rows are packed as R R R ... R G G G ... G or G G G ... G B B B ... B (depending on the which row of the bayer filter the data is from); in other words, the even-column data is packed in a half row, followed by the odd-column data. This results in a final format like so:

R R R ... R G G G ... G
G G G ... G B B B ... B
LSN LSN LSN ... LSN

I am not sure why the data is packed like this (for all I know it's not, and there is a bug in my packing code...) but I suspect it is for some kind of SSE2 efficiency reasons. I also haven't deciphered how the least significant nibbles are packed (there is no easy way to inspect 12-bit image data), but hopefully it is similar to the most significant bytes...

3 comments:

  1. woho this would be awesome for my miniraw project :) i have the same camera like shane and use his software but cineform raw recording would be a dream you think 1920x1080 120fps 12bit raw is possible? cheers janosch

    ReplyDelete
  2. 1080p 120 fps on the highest setting should be possible if you use a recorder that has a 15W dual core processor (Surface Pro type devices, not the 7W Core M based ones). You will probably want to find some way to offload the preview to the GPU if it isn't already. Certainly MEDIUM and LOW are easy for anything with a Core processor.
    I have a similar camera to Shane's (Basler, not Point Grey, but the same sensor) and the next step clearly seems to be to rewrite my recorder to take advantage of Cineform, so I'll make an updated post when that happens.

    ReplyDelete
    Replies
    1. that would be awesome:) miniarw that shoots cineform raw and the hopefuly resolve will read the cineform raw im sadly very bad at coading but i guess there are similiar sdk samples from basler for your camera? i use shanes great software which uses slimdx for the preview hmm my miniraw has an intel core i7 3770t can i test the encoder somehow with a simple tool? damn i need to get my coading skills better :D any other way to contact you? email? facebook or sth? ::) cheers janosch

      Delete