24 Jun 2017

etnaviv/vivante Update

Lucas Stach suggested I also run the test (glmark2-es2) with --off-screen. Here are those results from a recent build.

Note that the etnaviv test
  • will sometimes segfault (usually somewhere in the [conditionals] tests)
  • the [ideas] test sometimes will cause a GPU hangcheck
  • and the [terrain] test complains saying "etna_draw_vbo:199: compiled shaders are not okay"




wandboard
GL_VENDORVivante Corporationetnaviv
GL_RENDERERVivante GC880Gallium 0.4 on Vivante GC880 rev 5106
GL_VERSIONOpenGL ES 3.0 V5.0.11.p8.41671OpenGL ES 2.0 Mesa 17.1.1







glmark2 score116521934756138
/proc/loadavg0.410.150.150.260.500.15









--fullscreen--off-screen
--fullscreen--off-screen
[build]20414145180136437
[build]25915871384160614
[texture]20111452376108294
[texture]20211349276100286
[texture]1991094787597259
[shading]20512239281119335
[shading]170611847076160
[shading]10839111515297
[shading]813083423969
[bump]124551285769119
[bump]230722817898236
[bump]203612237073163
[effect2d]621568332647
[effect2d]2352314916
[pulsar]1858441475102372
[desktop]31931151018
[desktop]10433122423971
[buffer]483253303346
[buffer]493252293247
[buffer]553562363851
[ideas]4444435100
[jellyfish]611962292338
[terrain]101212
[shadow]9229107383462
[refract]2092010811
[conditionals]209773836978169
[conditionals]611965332647
[conditionals]204763476576159
[function]11436130494790
[function]341135302442
[loop]10534119494688
[loop]10534119494588
[loop]551869302442

9 Jun 2017

GPU Support with OpenEmbedded (etnaviv/vivante)

 Introduction

This series of articles assumes some familiarity with OpenEmbedded, but if you haven't used it before, hopefully you can follow along. If you've never used OpenEmbedded before and would like to give these build instructions a try, start off by reading and trying the examples in The Yocto Project's Quick Start Guide. Hopefully that will get you and your machine setup.

There are many SoCs that incorporate the Vivante GPU. I happen to have the Wandboard Dual, so that's the board I'll be using for my tests.

Build Setup

To begin an OpenEmbedded build (assuming your Linux build machine has all the necessary packages), we need to choose some place on our computer to which we have rwx access, and grab the necessary metadata. The first chunks of metadata contains all the base, generic things:

$ git clone git://git.openembedded.org/openembedded-core
$ git clone git://git.openembedded.org/meta-openembedded

Then we need to add BSP metadata. Most BSPs consist of one layer, but in the case of the Wandboard we need the generic freescale BSP layer (meta-freescale) and the BSP that specifically supports the Wandboard (meta-freescale-3rdparty). The Wandboard has an i.MX6 SoC on it which was made by Freescale. In 2015 NXP merged with Freescale. Although the i.MX6 is, technically, an NXP SoC, the layer retains the "freescale" name.

$ git clone https://git.yoctoproject.org/git/meta-freescale
$ git clone https://github.com/Freescale/meta-freescale-3rdparty

How did I possibly know I needed those two layers? By consulting the layer index. The layer index is where layer maintainers go to make their layers known, and it's a great place for users to go to find support machines, software, distros, etc. If you're not working with the Wandboard, then you'll need to consult the layer index to figure out which layers you need for your specific hardware.

Now we need the tool that uses all this metadata and actually performs the build:

$ git clone  git://git.openembedded.org/bitbake

Now that we have all the pieces in place, we setup our shell

$ . openembedded-core/oe-init-build-env build bitbake/
You had no conf/local.conf file. This configuration file has therefore been
created for you with some default values. You may wish to edit it to, for
example, select a different MACHINE (target hardware). See conf/local.conf
for more information as common configuration options are commented.

You had no conf/bblayers.conf file. This configuration file has therefore been
created for you with some default values. To add additional metadata layers
into your configuration please add entries to conf/bblayers.conf.

The Yocto Project has extensive documentation about OE including a reference
manual which can be found at:
    http://yoctoproject.org/documentation

For more information about OpenEmbedded see their website:
    http://www.openembedded.org/


### Shell environment set up for builds. ###

You can now run 'bitbake <target>'

Common targets are:
    core-image-minimal
    core-image-sato
    meta-toolchain
    meta-ide-support

You can also run generated qemu images with a command like 'runqemu qemux86'

This creates the build directory ("build") that was specified on the shell setup line. Now we tell bitbake about our additional layers:

$ bitbake-layers add-layer ../meta-freescale
Parsing recipes: 100% |#########################################################| Time: 0:00:12
Parsing of 925 .bb files complete (0 cached, 925 parsed). 1408 targets, 149 skipped, 0 masked, 0 errors.
$ bitbake-layers add-layer ../meta-freescale-3rdparty
Parsing recipes: 100% |#########################################################| Time: 0:00:07
Parsing of 962 .bb files complete (0 cached, 962 parsed). 1445 targets, 185 skipped, 0 masked, 0 errors.
$ bitbake-layers add-layer ../meta-openembedded/meta-oe
Parsing recipes: 100% |#########################################################| Time: 0:00:13
Parsing of 1614 .bb files complete (0 cached, 1614 parsed). 2226 targets, 265 skipped, 0 masked, 0 errors.

Vivante Build

 By default, the "freescale" BSP layers assume the user wants to build an image using the vivante binary blob. This blob isn't "free", so in order to use it, you have to agree to its EULA. To do that, you have to read the "EULA" file you'll find at the top-level of the meta-freescale BSP that was cloned earlier. Once you've read that file and agreed to it, you can proceed with this build. If you don't or can't agree to the EULA, then you can proceed directly to the Etnaviv build.

When you setup your shell, earlier, it created a boilerplate configuration file for you. From the "build" directory that was created for you during setup, open the conf/local.conf file with your favourite text editor and add the following lines at the top (be sure to leave the rest of the file as-is!):

ACCEPT_FSL_EULA = "1"
CORE_IMAGE_EXTRA_INSTALL += "openbox glmark2"
DISTRO_FEATURES_append = " opengl x11"
IMAGE_FEATURES += "x11"

Once that's done you can run your build:

$ MACHINE=wandboard bitbake core-image-full-cmdline
Parsing recipes: 100% |#########################################################| Time: 0:00:14
Parsing of 1614 .bb files complete (0 cached, 1614 parsed). 2226 targets, 249 skipped, 0 masked, 0 errors.
NOTE: Resolving any missing task queue dependencies

Build Configuration:
BB_VERSION        = "1.34.0"
BUILD_SYS         = "x86_64-linux"
NATIVELSBSTRING   = "opensuse-42.2"
TARGET_SYS        = "arm-oe-linux-gnueabi"
MACHINE           = "wandboard"
DISTRO            = "nodistro"
DISTRO_VERSION    = "nodistro.0"
TUNE_FEATURES     = "arm armv7a vfp thumb neon callconvention-hard cortexa9"
TARGET_FPU        = "hard"
meta              = "master:186882ca62bf683b93cd7a250963921b89ba071f"
meta-freescale    = "master:98d57b06d88cb22129bd417a9a3edbaf24612460"
meta-freescale-3rdparty = "master:fd3962a994b2f477d3e81fa7083f6b3d4e666df5"
meta-oe           = "master:41cf832cc9abd6f2293a6d612463a34a53a9a52a"

Initialising tasks: 100% |######################################################| Time: 0:00:04
NOTE: Executing SetScene Tasks
NOTE: Executing RunQueue Tasks
NOTE: Tasks Summary: Attempted 4369 tasks of which 2317 didn't need to be rerun and all succeeded.

From this successful build we'll find our device image located at:

tmp-glibc/deploy/images/wandboard/core-image-full-cmdline-wandboard.wic.gz

If you unzip this wic file, it can be dd'ed directly to a microSD card, this microSD card can be inserted into the Wandboard, which can then be powered up.

$ gzip -d < tmp-glibc/deploy/images/wandboard/core-image-full-cmdline-wandboard.wic.gz > core-image-full-cmdline-wandboard.wic
$ su
Password:
# dd if=core-image-full-cmdline-wandboard.wic of=/dev/sdi bs=10M
21+1 records in
21+1 records out
222298112 bytes (222 MB, 212 MiB) copied, 85.6691 s, 2.6 MB/s

I like to interact with embedded boards via a serial console. On the Wandboard, this means using a DE-9 serial cable. Once the board boots up I login and run the benchmark application:

OpenEmbedded nodistro.0 wandboard /dev/ttymxc0

wandboard login: D-BUS per-session daemon address is: unix:abstract=/tmp/dbus-fd5NI0apUa,guid=772b87be1cee0f1d2acde6c25938e674
Using calibration data stored in /etc/pointercal.xinput
Invalid format 42060
unable to find device EETI eGalax Touch Screen
INFO: width=1920, height=1080
Obt-Message: Failed to open an Input Method
Openbox-Message: X server does not support locale.
Openbox-Message: Cannot set locale modifiers for the X server.

root
root@wandboard:~# uname -a
Linux wandboard 4.1.15-1.1.0-ga-wandboard+g8b015473d340 #1 SMP PREEMPT Wed Jun 7 23:42:49 EDT 2017 armv7l armv7l armv7l GNU/Linux
root@wandboard:~# export DISPLAY=:0
root@wandboard:~# glmark2-es2
=======================================================
    glmark2 2014.03
=======================================================
    OpenGL Information
    GL_VENDOR:     Vivante Corporation
    GL_RENDERER:   Vivante GC880
    GL_VERSION:    OpenGL ES 3.0 V5.0.11.p8.41671
=======================================================
[build] use-vbo=false: FPS: 206 FrameTime: 4.854 ms
[build] use-vbo=true: FPS: 246 FrameTime: 4.065 ms
[texture] texture-filter=nearest: FPS: 200 FrameTime: 5.000 ms
[texture] texture-filter=linear: FPS: 200 FrameTime: 5.000 ms
[texture] texture-filter=mipmap: FPS: 199 FrameTime: 5.025 ms
[shading] shading=gouraud: FPS: 205 FrameTime: 4.878 ms
[shading] shading=blinn-phong-inf: FPS: 170 FrameTime: 5.882 ms
[shading] shading=phong: FPS: 108 FrameTime: 9.259 ms
[shading] shading=cel: FPS: 81 FrameTime: 12.346 ms
[bump] bump-render=high-poly: FPS: 124 FrameTime: 8.065 ms
[bump] bump-render=normals: FPS: 220 FrameTime: 4.545 ms
[bump] bump-render=height: FPS: 203 FrameTime: 4.926 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 62 FrameTime: 16.129 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 23 FrameTime: 43.478 ms
[pulsar] light=false:quads=5:texture=false: FPS: 183 FrameTime: 5.464 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 31 FrameTime: 32.258 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] effect=shadow:windows=4: FPS: 103 FrameTime: 9.709 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 49 FrameTime: 20.408 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 49 FrameTime: 20.408 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 57 FrameTime: 17.544 ms
[ideas] speed=duration: FPS: 44 FrameTime: 22.727 ms
[jellyfish] <default>: FPS: 61 FrameTime: 16.393 ms
[terrain] <default>: FPS: 1 FrameTime: 1000.000 ms
[shadow] <default>: FPS: 92 FrameTime: 10.870 ms
[refract] <default>: FPS: 20 FrameTime: 50.000 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 209 FrameTime: 4.785 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 61 FrameTime: 16.393 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 203 FrameTime: 4.926 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 114 FrameTime: 8.772 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 34 FrameTime: 29.412 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 105 FrameTime: 9.524 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 105 FrameTime: 9.524 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 55 FrameTime: 18.182 ms
=======================================================
                                  glmark2 Score: 115
=======================================================

Some of the relevant packages in this build include:
  • libegl-mesa_2:17.1.1
  • libgles2-mesa_2:17.1.1
  • libgl-mesa_2:17.1.1
  • xserver-xorg_2:1.19.3
  • kernel-4.1.15-1.1.0-ga-wandboard+g8b015473d340
  • libc6_2.25
  • glmark2_2014.03+0+7215c0f337
  • cross-compiler: gcc-6.3.0

Etnaviv Build

Switching to a build that uses etnaviv isn't very hard. Keeping the bottom part of the configuration file as it was found, modify the top of conf/local.conf so that it looks like:

MACHINEOVERRIDES .= ":use-mainline-bsp"
CORE_IMAGE_EXTRA_INSTALL += "openbox glmark2"
DISTRO_FEATURES_append = " opengl x11"
IMAGE_FEATURES += "x11"

Since you're no longer using the binary blob, agreeing to the EULA is no longer required. Telling the build you want to switch to more "upstream" components is just a matter of adding the MACHINEOVERRIDES line.

Building:

$ MACHINE=wandboard bitbake core-image-full-cmdline
Parsing recipes: 100% |#########################################################| Time: 0:00:17
Parsing of 1614 .bb files complete (0 cached, 1614 parsed). 2226 targets, 265 skipped, 0 masked, 0 errors.
NOTE: There are 199 recipes to be removed from sysroot wandboard, removing...
NOTE: Resolving any missing task queue dependencies

Build Configuration:
BB_VERSION        = "1.34.0"
BUILD_SYS         = "x86_64-linux"
NATIVELSBSTRING   = "opensuse-42.2"
TARGET_SYS        = "arm-oe-linux-gnueabi"
MACHINE           = "wandboard"
DISTRO            = "nodistro"
DISTRO_VERSION    = "nodistro.0"
TUNE_FEATURES     = "arm armv7a vfp thumb neon callconvention-hard"
TARGET_FPU        = "hard"
meta              = "master:186882ca62bf683b93cd7a250963921b89ba071f"
meta-freescale    = "master:98d57b06d88cb22129bd417a9a3edbaf24612460"
meta-freescale-3rdparty = "master:fd3962a994b2f477d3e81fa7083f6b3d4e666df5"
meta-oe           = "master:41cf832cc9abd6f2293a6d612463a34a53a9a52a"

Initialising tasks: 100% |######################################################| Time: 0:00:07
NOTE: Executing SetScene Tasks
NOTE: Executing RunQueue Tasks
NOTE: Tasks Summary: Attempted 4396 tasks of which 1407 didn't need to be rerun and all succeeded.

Unpack the wic file, dd it to a microSD card, and boot it up on the Wandboard:

OpenEmbedded nodistro.0 wandboard /dev/ttymxc0

wandboard login: Error: No calibratable devices found.
Obt-Message: Failed to open an Input Method
Openbox-Message: X server does not support locale.
Openbox-Message: Cannot set locale modifiers for the X server.

root
root@wandboard:~# uname -a
Linux wandboard 4.9.21-fslc+gb69ecd63c123 #1 SMP Thu Jun 8 02:34:26 EDT 2017 armv7l armv7l armv7l GNU/Linux
root@wandboard:~#
export DISPLAY=:0
root@wandboard:~# glmark2-es2
=======================================================
    glmark2 2014.03
=======================================================
    OpenGL Information
    GL_VENDOR:     etnaviv
    GL_RENDERER:   Gallium 0.4 on Vivante GC880 rev 5106
    GL_VERSION:    OpenGL ES 2.0 Mesa 17.1.1
=======================================================
[build] use-vbo=false: FPS: 81 FrameTime: 12.346 ms
[build] use-vbo=true:[   59.956033] random: crng init done
 FPS: 91 FrameTime: 10.989 ms
[texture] texture-filter=nearest: FPS: 80 FrameTime: 12.500 ms
[texture] texture-filter=linear: FPS: 78 FrameTime: 12.821 ms
[texture] texture-filter=mipmap: FPS: 75 FrameTime: 13.333 ms
[shading] shading=gouraud: FPS: 87 FrameTime: 11.494 ms
[shading] shading=blinn-phong-inf: FPS: 68 FrameTime: 14.706 ms
[shading] shading=phong: FPS: 51 FrameTime: 19.608 ms
[shading] shading=cel: FPS: 42 FrameTime: 23.810 ms
[bump] bump-render=high-poly: FPS: 57 FrameTime: 17.544 ms
[bump] bump-render=normals: FPS: 74 FrameTime: 13.514 ms
[bump] bump-render=height: FPS: 66 FrameTime: 15.152 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 34 FrameTime: 29.412 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 14 FrameTime: 71.429 ms
[pulsar] light=false:quads=5:texture=false: FPS: 75 FrameTime: 13.333 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 15 FrameTime: 66.667 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] effect=shadow:windows=4: FPS: 42 FrameTime: 23.810 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 30 FrameTime: 33.333 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 29 FrameTime: 34.483 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 34 FrameTime: 29.412 ms
[ideas] speed=duration:[  253.131165] etnaviv-gpu 130000.gpu: hangcheck detected gpu lockup!
[  253.137434] etnaviv-gpu 130000.gpu:      completed fence: 11419
[  253.143413] etnaviv-gpu 130000.gpu:      active fence: 11420
[  253.150061] etnaviv-gpu 130000.gpu: hangcheck recover!
[  257.691146] etnaviv-gpu 130000.gpu: hangcheck detected gpu lockup!
[  257.697403] etnaviv-gpu 130000.gpu:      completed fence: 11420
[  257.703374] etnaviv-gpu 130000.gpu:      active fence: 11421
[  257.709247] etnaviv-gpu 130000.gpu: hangcheck recover!
[  263.931124] etnaviv-gpu 130000.gpu: hangcheck detected gpu lockup!
[  263.937380] etnaviv-gpu 130000.gpu:      completed fence: 11423
[  263.943352] etnaviv-gpu 130000.gpu:      active fence: 11425
[  263.949221] etnaviv-gpu 130000.gpu: hangcheck recover!
[  269.131129] etnaviv-gpu 130000.gpu: hangcheck detected gpu lockup!
[  269.137383] etnaviv-gpu 130000.gpu:      completed fence: 11425
[  269.143355] etnaviv-gpu 130000.gpu:      active fence: 11427
[  269.149324] etnaviv-gpu 130000.gpu: hangcheck recover!
 FPS: 0 FrameTime: inf ms
[jellyfish] <default>: FPS: 29 FrameTime: 34.483 ms
[terrain] <default>:error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!

etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
error: compile failed!
etna_draw_vbo:199: compiled shaders are not okay
 FPS: 2 FrameTime: 500.000 ms
[shadow] <default>: FPS: 39 FrameTime: 25.641 ms
[refract] <default>: FPS: 10 FrameTime: 100.000 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 70 FrameTime: 14.286 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 33 FrameTime: 30.303 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 65 FrameTime: 15.385 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 50 FrameTime: 20.000 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 30 FrameTime: 33.333 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 49 FrameTime: 20.408 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 49 FrameTime: 20.408 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 30 FrameTime: 33.333 ms
=======================================================
                                  glmark2 Score: 47
=======================================================

Some of the relevant packages in this build include:
  • libegl-mesa_2:17.1.1
  • libgles2-mesa_2:17.1.1
  • libgl-mesa_2:17.1.1
  • xserver-xorg_2:1.19.3
  • kernel-image-4.9.21-fslc
  • libc6_2.25
  • glmark2_2014.03+0+7215c0f337
  • cross-compiler: gcc-6.3.0

Results

My test currently consists of running glmark2-es2 (i.e. the OpenGL ES2 version of glmark2). As of today, the etnaviv support isn't as full-featured as the binary blob. However, using etnaviv doesn't require any EULAs, and it lets you use a newer kernel. Thanks to how well the freescale layers are organized/maintained, switching between the two builds is quite easy.

Here's a side-by-side comparison:

vivanteetnaviv
GL_VENDORVivante Corporationetnaviv
GL_RENDERERVivante GC880Gallium 0.4 on Vivante GC880 rev 5106
GL_VERSIONOpenGL ES 3.0 V5.0.11.p8.41671OpenGL ES 2.0 Mesa 17.1.1

glmark2(-es2) is a set of individual tests that are run back-to-back. They can be run individually, but calling "glmark2-es2" by itself simply invokes all of them sequentially.

vivanteetnaviv
glmark2-es2 score11547
[build]20681
[build]24691
[texture]20080
[texture]20078
[texture]19975
[shading]20587
[shading]17068
[shading]10851
[shading]8142
[bump]12457
[bump]22074
[bump]20366
[effect2d]6234
[effect2d]2314
[pulsar]18375
[desktop]3115
[desktop]10342
[buffer]4930
[buffer]4929
[buffer]5734
[ideas]44(hangcheck)
[jellyfish]6129
[terrain]12 (shader compile failed)
[shadow]9239
[refract]2010
[conditionals]20970
[conditionals]6133
[conditionals]20365
[function]11450
[function]3430
[loop]10549
[loop]10549
[loop]5530

7 Jun 2017

GPU Support with OpenEmbedded (Introduction)

Synopsis

Traditionally, an embedded device that included a couple buttons and a 2x16 text display was considered state-of-the-art. These days, an increasing number of embedded projects are using graphic displays; potentially touch-enabled. This trend appears to be growing. If an embedded product is going to use a graphics system, it would be best if as much of the graphics processing as possible were offloaded from the CPU to the GPU.

Being able to quickly put together a basic image for an embedded device that includes accelerated graphics support is the starting point for more and more projects. Ideally the project's time should be spent developing the application which runs on the device, rather than on trying to build the basic image with functioning accelerated graphics.

Modern GPUs include multiple logical subunits for different jobs: multimedia units for video playback, compute units for computation offloading, rendering units for drawing, and many others. My primary interest is with rendering on X11.

OpenEmbedded (OE) is a great tool for building and maintaining images for embedded devices (as well as for building and maintaining embedded distributions). In this series of articles I want to take a look at how well (or not) OE supports GPUs and GPU acceleration. GPU drivers and acceleration are huge topics, and I won't pretend to know or write much about them. Rather, I'll be looking at this topic from an "image building" point-of-view.

GPU Support Options

When a vendor ships a GPU, they usually provide some sort of software for it. But usually that software is in the form of a binary blob exposed via a high-level API (such as OpenGL). From a software point-of-view, interfacing with a GPU requires many moving parts. On the one side is the kernel, on the other side is the application itself; in between are many other components. When a vendor ships a binary blob, it is built against a specific version/branch of each of these components. This means that the moment you pick a specific board/SoC for your project, you are already locking into a specific kernel version for your product. Your product will forever be locked to that version, unless the GPU vendor decides to release a newer version of the blob for your given GPU. Worse still, even though the kernel that you're being locked into says (for example) "3.10", in most cases you're forced to use your vendor's branch of "3.10". Which really means: "at some point this was 3.10, but now (1000+ patches later) it could only be best described as '3.10-ish'".

Many embedded projects like to use (or at least experiment with using) the PREEMPT_RT patch. But not every kernel that is released has an associated PREEMPT_RT patch. So if the kernel you're being forced to use doesn't have an associated PREEMPT_RT patch, you'll either have to invest the effort in trying to get the closest PREEMPT_RT patch working with your specific kernel, or forgo using PREEMPT_RT altogether. In some cases, although your kernel might be advertised as a given version, and although there might be a PREEMPT_RT patch for that kernel version, the vendor patches that have been added make applying the PREEMPT_PT patch difficult.

Similarly, support for new features is being added to the kernel every day. If your GPU vendor is locking you into an older kernel, you'll either have to back-port the new features to the older kernel yourself, or not be able to take advantage of the new features in your product.

Another potential "gotcha" when using a GPU vendor's binary blob is device support. Sometimes a GPU vendor will only decide to support a specific OS (Android, and not Linux at all) or a specific display server (Xorg vs Wayland vs Mir...) or API (OpenGL vs OpenGL ES (1, 2, 3?) vs Vulcan...) in their binary blob (or some small subset there-of). In many companies, the people who develop the product aren't the same people who choose the board/SoC (and there might be no communication between these two groups). Meaning the SoC gets chosen based on factors such as availability, size, or price without any consideration for how the product will need to be coded if such restrictions are in place.

There are also security implications of using older kernels...

...and the list continues.

An open-source GPU driver provides you with the most flexibility in choosing which version of which components you want to use in your product, as well as the most flexibility in how to implement your product. You can choose to use the pure upstream sources, or any variation there-of. You can decide to use OpenGLES on X11, if that's what you prefer. As well, it lets you experiment with various projects the wider community is working on. Do you want to create a product that uses virtualization, accelerated graphics, PREEMPT_RT, and supports the latest TPM2.0 devices? No problem. Want to try that with a binary blob that locks you into some version of a 3.4 kernel...? That might be a little more difficult. Your GPU vendor can't possibly predict what sort of product you'll want to create or how you'll want to create it.

In summary there are two options: use the vendor-supplied binary blobs which limit your flexibility, or use an open-source graphics driver and get to make more of the decisions yourself.

Open-Source GPU Projects

There are a number of projects whose goal is to create open-source drivers for a GPU family:
Additionally, Intel already provides and supports free and open-source drivers for the GPUs in their chipsets. Yay Intel! If only all companies who produce GPUs were so like-minded! For one thing, there would be no need for a write-up such as this one.

Note: not all open-source GPU projects provide support for every subunit or function a GPU implements nor provide support for every API (etc). Most of these projects are "works in progress". Having said that, however, most of these projects are quite mature and offer excellent capabilities (in some cases exceeding the capabilities of the vendor blobs!) and at least offer the ability to adapt to your needs.



Why OpenEmbedded?

Getting the right versions of each of these components configured with the correct options, installing them to the correct locations, setting up a cross-compiler, cross-compiling all the code, and tweaking them with proper configuration files in the image is not a trivial undertaking. Just assembling the right set of components isn't trivial because the implementation details of how acceleration is achieved for different GPUs varies!

OpenEmbedded provides the metadata, the "recipes", that describe the low-level details of how to configure and build various components. It allows the user to focus on higher-level details, instead of getting bogged down in the minutiae of setting up sysroots for cross-compilation and making sure the compiler gets passed the right parameters. Do you want your image to include the "xdpyinfo" program? Just add it to the list. Do you want to build an image with musl instead of glibc? Just add the correct layer and set the variable indicating which C library to use. Then let OpenEmbedded handle the details; the commands you type are the same regardless.

There are, of course, other build systems for generating images. The point of this article, however, is to survey the state-of-the-art in graphics support with respect to OpenEmbedded. This is not meant to be a series of articles on the state of open-source graphics support in general nor a comparison of graphics support from various build systems.

Summary

For each GPU family, I would like to write an article describing how to use OpenEmbedded to create one of two images: one image using the vendor blob, and one image using the open-source replacement. As a basis, it would be great if it were easy for anyone to create either of these images. This would allow the user to quickly start their base images and choose their GPU support.

Going further, I'd like to then run the same software on each image and provide performance statistics and general feedback.

Hopefully the information in these articles will:
  • provide concise information to help users get their images built and running easily and quickly
  • provide a comparison between the various GPU families and provide a software support matrix
  • help make it easy for developers to become involved in developing and debugging open-source graphics drivers

Caveat

As always, please try to remember that software is an ever-evolving entity. As I write this article (early June 2017) I try to be as correct as possible. But that doesn't mean I'm always correct, and that doesn't mean that what is correct right now, is still correct an hour from now. So if you're reading these articles many years into the future, please try to remember that everything evolves and there will be a time at which all of what's written here stops being true, or possible, or whatever.