A new display engine of Munipack
Munipack notes: a new display engine.
Munipack GUI development has a long suspend being blocked by the display engine – a part of code which renders and draws FITS images in View window.
It was my own fault. I supposed modern computers so fast that any image can be displayed and processed within single eye blink. It was so wrong.
I’ve developed the engine as the very first part of GUI having a little experience in event-driven processing, and with non-interactive batch operations patterns in my mind. Unfortunately, the approach leads to an inflexible, unresponsive design, and a poor image processing experience (slow displaying, tuning, etc).
The last drop into my full cup of patience was the preparation of a new Debian release. I have no free time due my teaching duties during Autumn semester ‘20. As a consequence, GUI in 0.5.14 version seems to be crippled: the unfinished experimental design of the floating magnifier as an undocked window has been issued into public.
To free my hands, and creativity, I’ve spend first quarter ‘21 with developing of a new display engine. The development finished with the revision 1604 uploaded on 4. April 2020.
FITS load
FITS files are considered as large files, the load takes a while. An interactive work is incompatible with way I had implemented (the batch processing). Every whole extension: header + image data, header + table data, … provided by fits_read_[2,3]d(), fits_read_colnull_*() routines, has been loaded as an uninterruptible block.
To get an appropriate GUI responsibility, I break the FITS load on chunks of multiples of the FITS basic buffer size of 2880 bytes, recommended by Chapter 13 Optimizing Programs of cfitsio library.
On the video, loading of file 2k ⨉ 2k pixels with an artificial delay between chunks of 250ms is captured. Moon phase in the extension list shows fraction of the loaded image. The image itself is intentionally dimmed; one is appropriatelly scaled in intensity immediately, when the load phase has finish.
Network FITS load
A slow load can be expected via a network. The following command use of cFITSIO internal driver directly:
xmunipack https://integral.physics.muni.cz:/ftp/praktikum/20150319/colour/IMG_8457.fits
The load in chunks seems to be not working at all. I found, by inspection of the library source code, that cFITSIO http driver does not transmit any chunk during load. Rather, one buffers the whole file. Perhaps, FITS is not intended generally to be read in serial way, like a stream. Notwithstanding I arranged the load by a stream, but it takes no place.
Only the way, how to enable serial load, is to use commands in strict order (fits_movabs_hdu()), and to follow again the recommendations of Chapter 13.
Just for a record, the use of an external utility which send downloaded data throughout a pipe, together with stream driver of cfitsio, provides a work-around:
curl https://integral.physics.muni.cz:/ftp/praktikum/20150319/colour/IMG_8457.fits | \ xmunipack stream://
Multi-Thread display rendering
All computers of these days, including mobile devices, has multiple cores, so it is natural to use the sleeping cores, in most cases, for a long computing operations – rendering of images. An implementation of rendering by such way was the primary goal of the new engine.
The video demonstrates rendering of an image, broke on chunks – tiles. The tiles are rendered in bursts of eight pieces (my computer has eight cores), followed by a 250ms artifical break, and the cycle is repeated. A rendering of a single tile can finish before others; the order of display is random, and some tiles can be seen temporary protrudes.
Reference counting classes
There is an important pitfall related to threads for classes which are using the reference counting technique.
Reference counting classes maintains an internal counter. The counter keeps references on included data (image data). One is increased when another object points on the same data, and decreased if an connected object disappear. It’s a handy way for a semi-automatic memory management (garbage-collector).
It works very smoothly for a single thread run, but it is untrustworthy for multi-threaded applications – every reference counter class should be protected against uncontrolled update by others threads.
I lost a lot of time with detecting of the problem. The counter operations are normally invisible, by design, and ones looks as side effects of objects creation and assignments. The only way how to reveal an activity on them is a careful debug (print inside) of constructors, destructors, and only the gently monitoring reveals the pitfall.
I consider that’s a principal issue of reference classes. Perhaps, the reason why B. Stroustrup does not include them in C++ core. The problem is not related to wxWidget implementation. I also plan to inspect the C++11 implementation of smart pointers. C++11 offers also atomic operations which can help to solve the related problem.
Having the pitfall uncovered, it is possible to utilise them with help of a synchronisation objects (mutexes). Much more worse is that all the operations frequently suspends threads leading to significantly slow-down threads processing, and including disturbing displaying effects.
I also discovered the important property: a blocks of non-overlapping memory can by updated by different threads without any risk. Mutexes protects whole array but it is important only when every thread is using the whole array, in my particular case, every memory cell is considered as an independent unit.
Under this light, I completely removed reference counting containers for rendering. The memory is used directly. It is much more effective, fast and less visually distracting.
I’m little bit frustrated. All C++ programming guides recommends the reference counting technique with glory. The glory has fade there, in the real world.
I’m being little bit vigilant to the wrapper classes approach. It proclaims false safety against to direct use of malloc/free for memory operations. On the other side, it can be very handy in many other cases.
Question of HW acceleration
The developed software rendering is slow for large (colour) frames. The acceleration by a graphical hardware is a seductive alternative.
There’s a problem that OpenGL standards does not supports, I meant, an arbitrary computation which I need for conversions of CIE XYZ and CIE Lab colour spaces, as well as for ITT curves (The intensity tables are widely used in 8-bit graphics, but there are floats and the functions generalises the tables). I know any computations can be done via OpenCL libraries, but it requires a lot of additional code, and it is also hardware specific.
The portability is important for me. I’m also believing that computers will have more a more cores in future so now I’m preferring the software rendering over the hardware alternative.
And finally, the most time consuming operation is shrinking of images in functions GetSubShrink(), MeanRitchie() of xmunipack/fitsgeometry.cpp.
A pit of coordinate origin
Any programming goal is very difficult and everybody expects a hard way with many pitfalls, but there’s the very best one; the pit of the Century.
Somebody, who has my respect, has choose screen coordinate origin at left top corner, and indeed, the vertical axis oriented from top to down. The choice may by inspired by writing, also reading, texts in Western world. So good idea, isn’t it? Unfortunately, the origin and the axis orientation is uncommon in mathematical Cartesian world.
It brings many, very funny, transformations:
where (x,y) are mathematical coordinates (important for FITS) and (x’,y’) screen coordinates, H is the image height.
And, now, what about coordinates (u’,v’) in a sub-window with (w,h) size and relative to a point (x’,y’)? (The solution can be found in xmunipack/display.cpp at OnMouseMotion()).
The question was warm-up. We are continuing. Lets suppose that we want to zoom, or shrink, the image? What now? (see xmunipack/render.cpp in Entry()).
I spend (lost) many days with deriving the transformations, which looks so trivially now. All the transformations are necessary for the tiles rendering and updating only related parts of the display, because each redraw of whole window is costly in time.
Screen video capture
All the video shots has been captured by GStreamer:
gst-launch-1.0 ximagesrc xid=0x... ! \ video/x-raw ! videoconvert ! vp8enc ! webmmux ! \ filesink location=...webm
A captured window has xid=0x... (acquired by xwininfo).