DoodleTs: May 2015

Does it matter? To me it does. When I'm trying to read something technical and I come across errors in spelling and grammar (non-technical errors), it throws my concentration out the window. I'm not claiming to be an expert in the rules and spelling of the English language (or any language) and although you'll find errors in my writing, I do try and I do care.

I try to read as much as I can, and I particularly like books. In the computer field there are particular publishers, everyone in this field knows who they are, and everyone has their favourites. There's one in particular, unfortunately, who seems to do quite a bad job of editing. Their books seem to have higher-than-normal levels of non-technical errors making their books difficult for me to read. Some of their books are fine, which leads me to believe that most of the editing is left to the author, and we're simply observing her/his mastery of both the technical topic at hand, as well as the English language.

So I was quite pleased when this particular publisher got in touch with me last year to ask if I'd be interested in being a reviewer for some books that were in the process of being written. Here was my chance to not only review the technical details of a book, but to also try to make sure all the non-technical stuff was good too. Therefore, initially, I said "yes".

Then they sent me an email describing how to be a reviewer for them. One of the very first instructions clearly stated that my job was to review the technical aspects of the material and barred me from making any suggestions or corrections to any non-technical aspects of the book. So I then changed my mind and said "no". Looking at some of their publications, I wouldn't want someone to see my name as a "reviewer" and wonder how I didn't notice so many obvious mistakes!

This same publisher had a sale recently, so for a very cheap price I purchased a couple ebooks from them, one of which had not yet been published. I figured for such a low price I could take the risk they were poorly written and hope to glean any nuggets of technical information despite the distractions. The book was published the other day, and in what I see as a very ironic twist, they sent me the following email to inform me of the change in status:

:-)

PS I hope this post doesn't dissuade a technical person (perhaps a non-native speaker of the English language) from writing. I'm not trying to poke fun at peoples' language abilities. But as a publisher of books in the English language I think there's a higher standard to which the publisher needs to be held!

Everyone knows that doing an OE build can take a bit of time (there are good reasons for this being true) so it follows that performing an OE build in a VM will take even longer. But when you do a build "natively" you get to potentially use all of the computer's free memory and all of its processing power. A VM running on that same machine will, at most, only allow you to use half its processing power and memory.

The question I wanted to answer was: if I performed a build "natively" but only used half the memory+cpu of a computer, how would that compare to a build in a VM that thought it was using the full computer's resources but had been constrained to only half the resources due to the use of the VM?

When performing an OE build there are two variables you can set that allow you to restrict the amount of cpu resources that are used: PARALLEL_MAKE and BB_NUMBER_THREADS. But using these variables isn't really the same as building on a machine with half the processing resources; the initial parse, for example, will use all available cpu resources (it can't know how much you want to restrict the build until it has actually completed parsing all the configuration files). Plus, there are no OE variables you can tweak to say "only use this much memory during the build".

So in order to better measure a VM's performance we need to find a better way to perform a restricted "native" build (other than just tweaking some configuration parameters). Answer: cgroups.

To be honest, I was hoping that I could demonstrate that if I used qemu+kvm and used virtio drivers everywhere I could (disk, network, etc) that the performance of a VM would at least approach that of doing a "native" build that was restricted via cgroups to the same amount of resources as was available to the VM. My findings, however, didn't bear that out.

First I'll present my results (since that's really want you want to see), then I'll describe my test procedure (for you to pick holes in ;-) ). I tried my tests on two computers and ran each test 5 times.

First I ran a build on the "raw/native" computer using all its resources (C=1):

	Computer 1	Computer 2
1	00:24:19	01:02:42
2	00:25:03	01:02:19
3	00:24:51	01:02:34
4	00:24:55	01:02:21
5	00:25:00	01:02:57
avg	*00:24:50*	*01:02:35*

I'm using "C" to represent the CPU resources; "1" meaning "all CPUs", i.e. using a value of "oe.utils.cpu_count()" for both PARALLEL_MAKE and BB_NUMBER_THREADS. A value of "0.5" means I've adjusted these parameters to be half of what "oe.utils.cpu_count()" would give.

Then I ran the same build again on the "native" machine but this time using BB_NUMBER_THREAD/PARALLEL_MAKE to only use half the cpus/threads (C=0.5):

	Computer 1	Computer 2
1	00:24:20	00:57:10
2	00:24:34	00:57:32
3	00:25:11	00:57:08
4	00:24:39	01:03:31
5	00:24:31	01:01:50
avg	*00:24:39*	*00:59:26*

Counter-intuitively, when restricting the resources, the builds performed ever-so-slightly better than allowing the build to use of all the computer's resources. Perhaps these builds aren't CPU-bound, and this set just happened to come out slightly better than the "full resources" builds. Or (for this workload) these Intel CPUs aren't really able to make much use of CPU threads, it's CPU cores that count. (??)

Then I performed the same set of builds on the "native" computers, but after having restricted their resources via cgroups.

C=1 (restricted by half via cgroups, same for memory):

	Computer 1	Computer 2
1	00:28:05	01:05:57
2	00:28:18	01:05:33
3	00:28:21	01:05:46
4	00:28:14	01:05:22
5	00:27:37	01:05:37
avg	*00:28:07*	*01:05:39*

C=0.5 (further restricted by half (again) via cgroups, same for memory):

	Computer 1	Computer 2
1	00:27:20	01:04:08
2	00:27:15	01:07:39
3	00:27:08	01:07:42
4	00:27:17	01:07:57
5	00:27:13	01:08:16
avg	*00:27:15*	*01:07:08*

So there's obviously a difference between restricting a build's resources via BB_NUMBER_THREAD/PARALLEL_MAKE versus setting hard limits using cgroups. That's not too surprising. But again there's very little difference between using "cpu_count()" number of CPU resources versus using half that amount, in fact, for Computer 1 the build time improved slightly.

Now here's the part where I used a VM running under qemu+kvm. I had been hoping these times would be comparable to the times I obtained when restricting the build via cgroups, but that wasn't the case.

C=1 (restricted by half via VM, same for memory):

	Computer 1	Computer 2
1	00:41:36	01:45:42
2	00:41:22	01:47:52
3	00:41:41	01:44:31
4	00:41:16	01:50:25
5	00:41:12	01:41:41
avg	*00:41:25*	*01:46:02*

C=0.5 (further restricted again by half via VM, same for memory):

	Computer 1	Computer 2
1	00:42:02	01:30:23
2	00:42:07	01:34:43
3	00:43:05	01:31:14
3	00:43:14	01:36:46
4	00:42:12	01:42:05
avg	*00:42:32*	*01:35:02*

Analysis

Using the first build as a reference (letting the build use all the resources it wants on a "raw/native" machine):

constraining the build to use half the machine's resources via cgroups results in build times that are from 4.90% to 13.22% slower.
performing the same build in a qemu+kvm VM results in build times that are from 59.90% to 72.55% slower.

Specifics

bitbake core-image-minimal
fido release, e4f3cf8950106bd420e09f463f11c4e607462126, 2138 tasks
DISTRO=poky (meta-poky)
a "-c fetchall" was performed initially, then all directories but "conf" were deleted and the timed build was performed with "BB_NO_NETWORK=1"
between each build everything would be deleted except for the "conf" directory; therefore no sstate or tmp or cache etc
/tmp implemented on a tmpfs
because the VMs had limited disk space, all builds were performed with "INHERIT+=rm_work"
to help load/manage the VMs I use a set of scripts I created here: https://github.com/twoerner/qemu_scripts
in the VMs, both the "Download" directory and the source/recipes are mounted and shared from the host (using virtio)

To restrict a build using cgroups I created a cgroup named "oebuild" in both the "cpuset" and "memory" groups and placed my shell in them.

So, for example, say I'm running a shell, bash, and its PID is 1234. As root goto /sys/fs/cgroup and:

mkdir cpuset/oebuild
mkdir memory/oebuild

If your system has 8 CPUs and 16GB of RAM (adjust your numbers accordingly):

echo "0-3" > cpuset/oebuild/cpuset.cpus
echo 0 > cpuset/oebuild/cpuset.mems
echo 8G > memory/oebuild/memory.limit_in_bytes

And finally ("1234" is the PID of the shell I want to put in the "oebuild" cgroup, I then use this shell to run the build):

echo 1234 > cpuset/oebuild/tasks
echo 1234 > memory/oebuild/tasks

To confirm your shell is now running in this cgroup, run the following command in your shell:

$ cat /proc/self/cgroup
10:hugetlb:/
9:perf_event:/
8:blkio:/
7:net_cls,net_prio:/
6:freezer:/
5:devices:/
4:memory:/oebuild
3:cpu,cpuacct:/
2:cpuset:/oebuild
1:name=systemd:/user.slice/user-1000.slice/session-625.scope

Here you see both the "memory" and "cpuset" cgroups are constrained by the oebuild cgroup for this process (i.e. /proc/self).

DoodleTs

27 May 2015

ARM SBCs and SoCs

22 May 2015

Work Area Wire Spool Holder

14 May 2015

Spelling and Grammar in a Technical Publication

8 May 2015

OE Builds in a VM

Analysis

Specifics