Mali-Cetus preview – Driving display
Here in the ARM Mali team we don’t usually talk about our products until they’re launched to the public and ready to start hitting silicon. Occasionally, however, we make an exception for something too exciting to keep to ourselves. This time last year we brought you a sneak preview of our latest video processor, Mali-V61, way ahead of launch and back when it was still known by its codename, Egil. This year is no different, and we’re so excited about our upcoming display processor, built on a brand new, innovative architecture, that we can’t wait another minute to give you the details.
Display – hiding in plain sight
Display processing isn’t an area that gets a lot of press, however we see it working right in front of us every day but take it so much for granted that we rarely consider what it is and how it works. Well, the display processor is the final stage in the application processor of a device like your smartphone. Its main function is to drive the pixels processed by the GPU and VPU to the display panel for viewing. Seems pretty simple right? It’s actually working a lot harder than you might think. As we demand more and more from our CPUs and GPUs, the display processor can take an even heavier role in offloading GPU/CPU from SW-centric functionality to fixed function hardware to ease real-time performance needs and optimize for the ever-shrinking mobile power envelope. The display processor performs many feature rich functions including multi-layer composition, orthogonal rotation, high quality up and downscaling, colour and gamut management operations and dual display mode. Many of our partners retain their own proprietary in-house display image processing algorithms to differentiate their offerings which means our display architecture must also provide a means to interface to such functions without going out to system memory. This is achieved through the co-processor interface that enables multiple stages for differentiation in the display pipeline. Functions can be daisy-chained to the co-processor interface in sequence if more than two stages are needed.
Codenamed Cetus, this next generation display processor builds on the awesome technology of our previous display products, Mali-DP500, Mali-DP550 and Mali-DP650, to address the growing requirements of the industry. So what’s driving these requirements? Several up and coming technologies, that’s what. We all know VR is pushing the boundaries of high performance graphics processing, but did you know it affects display as well? This is because the display sits so much closer to your eye than in traditional products that the perceived quality is harder to maintain and we therefore need more, higher quality pixels in the same space. The refresh rate required for an immersive VR experience is also a factor and means that modern display processors need to be able to run 4K x 2K displays at 60-90FPS. High Dynamic Range (HDR) is another factor in the evolution of display requirements. It is enabling a significantly higher fidelity viewing experience by driving displays to support HDR video standards such as HDR10+, HDR10, Hybrid-Log-Gamma, Dolby Vision etc. Other factors, such as multi-window mode allowing several activities to be visible on-screen at a time, and command mode panels which self-refresh content to save power, are also adding to these heightened requirements and demanding more from the display processor.
Driving the future of display
With these growing pressures, it became apparent that it was time to overhaul the very architecture on which our display processors are built and address the power and quality optimizations required. Now we’ve seen what we need and why, let’s take a look at how we addressed them.
As smartphones become the device of choice for more and more consumers, we need to be able to do more with them and, importantly, we need to be able to do more simultaneously. Mali-Cetus’s new architecture supports up to 8 separate composition layers for Android-N devices when driving a single display or up to 4 layers per display in dual display output mode. This is facilitated by improvements to both the Layer Processing Units (LPU) and the Composition Unit and added flexibility that enables intelligent resource sharing when driving a single display.
The main role of the two LPUs in Mali-Cetus is to read the video or graphics layers from the system buffer and direct them into separate pipelines before feeding them to the composition unit. In addition, the memory system can write back the composition result to memory for offline composition or further processing as well as for use as a virtual display by transmission over wifi. The AFBC Decoder Subsystem is built into the LPU and can decompress up to four display layers per LPU, so eight in total. The Composition Unit is then responsible for the alpha blending of these layers and management of the data flow to and from the Scaling Subsystem and Co-Processor Interface (CPI). It then directs the data to the display output.
The Composition Unit contains the Scaling Subsystem which is made up of two High-quality Scaling Engines with 12-bit per component precision software programmable filter coefficients and initial phase image enhancer with edge detection mechanism. This allows the system to perform simultaneous scaling before and after composition. It also enables the software to optimize the scaling efficiency by parallelizing on available scaling engines. For example, when performing downscaling on a 4K layer, it will do a horizontal split and perform in parallel on two separate scaling engines! When driving a single display, intelligent resource sharing between the two composition units means that 4 scaling operations can be performed in parallel.
Better Android window composition capabilities
Side by Side Processing
Mali-Cetus supports a new mode known as Side-By-Side (SBS) processing. The SBS feature splits a frame in half and processes each half in parallel whilst only enabling one display output. One half of the frame is processed by LPU0, CU0 and DU0 and the other half by LPU1, CU1 and DU1. Basically, this has the effect of halving the ACLK frequency for a given performance point resulting in the higher throughput necessary to target 4K90fps for next-gen premium mobile & VR devices and further power savings by targeting super underdrive voltage implementations.
Performing Orthogonal rotation on non-ARM Frame Buffer Compression (AFBC) video YUV layers severely impacts system performance and uses far more power than is necessary or sustainable. Mali-Cetus display architecture therefore removes the rotation of non-AFBC layers from the real-time path to reduce the risk of underrun which would result in visible artefacts. The AFBC Direct Memory Access (DMA) unit in the display processor is responsible for reading non-AFBC layers and converting them to AFBC (conversion from a linear to a block format) and then writing them back to system memory so that the LPU can then pick them up as part of the real-time path. Our new display architecture features an MMU cache with dramatically improved efficiency as well as a DRAM memory access pattern for rotated layers.
AFBC DMA Unit
Display Output Unit
The Display Output Unit (DOU) is the final stage in the display processor system and performs a variety of functions prior to sending the image out to the display. It’s capable of RGB to RGB conversion, gamma correction, RGB to YUV conversion, dithering, 4:4:4 to 4:2:2/4:2:0 chroma down-sampling for HDMI 2.x and eDP/DP, 1:2 display split for dual-link MIPI DSI panels. The backend subsystem within it is responsible for the display timing control and synchronization. It outputs 10 bits per component on the output and also introduces Tearing Effect input control logic for command mode panels which support panel self-refresh. Command mode panels are important because they can provide significant power saving. Instead of the application processor sending frames to the panel at 1/60s for 60FPS, the application will wait for the panel to provide input on when it needs a new frame. The panel self-refreshes the content that doesn't change instead of relying on the processor to continuously send the frame and generate the timing in a specific manner. Variable refresh rate is also supported through programming VFP signal that adjusts the blanking time between frames.
Display Output Unit (DOU)
System Performance boost with ARM SMMU
It’s not just the display that reaps the benefits of the new architecture however, the various optimizations provide better overall performance as part of the Multimedia Subsystem by using an SMMU strategy based on ARM Corelink. This allows simpler display processor and SMMU integration in the system as well as support for ARM Trustzone TZMPv1 and TZMPv2 architecture and a lower SMMU area compared to previous generation products. This optimized subsystem can result in latency tolerance of more than four times the previous Mali-DP650 display processor.
Finally, as we’ve mentioned before, HDR is a huge focus for the future of the industry. With this new architecture, we were able to seamlessly integrate ARM Assertive Display with Mali-Cetus using one of its coprocessor interfaces. This constitutes the first HDR solution available from ARM and a major step forward in display technology. It supports HDR10 and HLG HDR video, alpha-blending (composition) of HDR video with SDR layers as well as HDR tone mapping for both HDR and SDR display panels.
Coprocessor interface and HDR composition
We work closely with various display vendor companies & standard bodies to enable optimized display solutions and future emerging AR/VR display technologies with the aim to optimize system performance and power and reduce the complexities of porting and integration. The display processor is a complex beast so there are of course many other things we could talk about but, for now, you’ll have to wait for the official launch later in the year to see exactly how this translates into end user devices and the next generation of fantastic products from our broad and diverse ecosystem of partners.
Arm technology is at the heart of a computing and data revolution that is transforming the way people live and businesses operate. Our advanced, energy-efficient processor designs have enabled intelligent computing in more than 160 billion chips and our technologies now securely power products from the sensor to the smartphone and the supercomputer. In combination with our IoT device, connectivity and data management platform, we are also enabling customers with powerful and actionable business insights that are generating new value from their connected devices and data. Together with 1,000+ technology partners we are at the forefront of designing, securing and managing all areas of compute from the chip to the cloud.
All information is provided "as is" and without warranty or representation. This document may be shared freely, attributed and unmodified. Arm is a registered trademark of Arm Limited (or its subsidiaries). All brands or product names are the property of their respective holders. © 1995-2020 Arm Group.