跳到主要内容

CI, and CI, and CI, oh my! (then more CI)

· 阅读需 9 分钟

podman logo

CI, and CI, and CI, oh my! (then more CI)

By Chris Evich GitHub

I wanted to write a detailed post about the CI setup we use for exercising proposed changes to libpod (podman repo).  Unfortunately this topic (and automation in general) is so big, most readers would end up on the floor, sound asleep, in a puddle of their own drool.  Instead, I will keep your fidget-spinner twirling, by jumping around several topics.

Starting with an overview on why we chose to use Cirrus CI, I'll provide a short 3-step walk-through of how it works, along with lots of links.  Then, we'll go into more detail regarding VM Image orchestration, before connecting that back to our Cirrus-CI configuration.

Why Cirrus-CI

I once said "testing code is at least 10x harder than writing it". This is especially true when a software-engineer believes their code is "perfectly good" (meaning, tons of bugs). At the same time, test automation is generally as reliable, as the inverse of its simplicity (especially when it's never simple). Which brings me back to around July/August of '18:

The libpod project was considered by many to be "perfectly good", but its automation was definitely not simple. At least one part or another constantly jacked-up. At the time, automation was split across two totally different services, operating with incompatible yet duplicate configurations. The third service is a downstream consumer of libpod, but at the time was also under consideration to take over pull-request automation from the first two:

  • Travis

    • With Ubuntu Trusty only a few years old, we ran tests on a platform version nobody was using, with bleeding edge-code jammed on top.  Some OS-X tests ran, and we think at least one person looked at the results, some of the time, every once in a while.
    • Required a contrived containerized-environment to workaround host-side limitations.  Fixes for fake environments almost never improve reality. e.g. impossible to test or fix AppArmor or SELinux problems from inside a container.
    • The tests did not represent reality.  Most people would never run container tools within a container, and certain security tools like SELinux and AppArmor would not be tested running inside this environment.
  • PAPR

    • An internal "maintenance mode" service, meaning only bug-fixes, no new features. Supported by a single, talented engineer, from another group, perfectly happy to be working on something else.
    • Fortunately it does have great support for running things on Atomic Host, which we still use to maintain our insanity...I mean, double-check some things.
    • The underlying infrastructure is unpredictably reliable. Mainly due to frequent dog-food poisoning.
  • OpenShift

    • An elegant, impressive piece of machinery, with tests so numerous that most other projects would have trouble calling up enough drool.
    • Fantastic at testing containers and at-scale orchestration.  However way too complex for our low-level, host-side poking of runtimes, and userspace.
    • Downstream from libpod by weeks or months depending on the platform, like RHEL for example.
    • Both Travis and PAPR already demonstrated the pain of testing host-side libraries/tools within a container, no further lessons or reruns required.

As if this vegetarian sausage wasn't already dripping with liquid goodness.  The smallest little network blip, and you have to re-run the entire suite again.  The importance of network speed and robustness can never be overstated. So I set out on a mission against complexity, toward being able to reliably and frequently ruin engineer's "perfectly good" code before it merges.

GET OFF MY LAWWWWWN!

The Cirrus CI killer feature.  You can selfishly bring your own cloud and everything else to make it work, and not have to share with Billy Bob's Used tire and doughnut shop.  You're the master of the entire host and runtime environment, OS, kernel, packages, updates, everything!  Then, with the Cirrus CI app on your code repository, testing follows this simple automated sequence:

  1. Create VMs (or containers) in your cloud, using your encrypted credentials.
  2. Follow instructions you've spelled out like B-A-S-H.
  3. Show green on exit(0) - the "pretty" engineer's code is properly spoiled (i.e. functional).

So Cirrus CI gives all the power for success, and/or blasting giant, perfectly round, holes in your own two feet! Our CI experience can be as simple or complex as we like, and reliability will match that of major cloud providers and the inverse of our cleverness. What could possibly go wrong? :D

VM Image Orchestration

Implementing the bowels of any CI/Automation stack usually begins with orchestrate the initial operating system state.  Therefore, for efficiency-sake, it's handy to cache this work before, exercising project-code changes. Otherwise, it's a complete waste of (expensive) engineer-time to constantly install, update, and configure all aspects of the system during every test run.

As recommended by Cirrus CI , we utilize a tool by the inventors of Vagrant: Packer.  I was able to make it do things in a matter of minutes, as packer is fairly brain-dead-simple.  It accepts a JSON file, which I have simplified as YAML for readability. A simple (non-functional) example will demonstrate the basic ideas: {% raw %}

---

variables:  # all up-front, no guessing allowed!
   foo: "bar" # simple
   build_image_suffix: "-libpod-{{env `COMMIT_SHA`}}"# from env. var

builders:  # Where to do stuff

   - type: "googlecompute"   # TONS of others supported too
     image_name: '{{build_name}}{{user `build_image_suffix`}}'
     # ... more details ...

   - type "googlecompute"
     # ...other OSes...

provisioners:  # How to do stuff

- type: "shell"
script: "/path/to/{{build_name}}_setup.sh"  # macro looks up OS

post-processors:  # Where to stick stuff
- - type: 'googlecompute-export'
paths: ... # name of storage bucket where VM Image will rest.
```{% endraw %}

In English, the above translates to:

1. Using some provided variables like `foo`, but fill the variable `build_image_suffix`
using the env. vars `$COMMIT_SHA`
2. Spin up some VMs in GCE.
3. Upload and execute a shell script on each VM (in parallel).
4. Assuming success, store the resulting VM image into a storage bucket for
later use as needed, or will expire and get automatically deleted after a time.

Perhaps that's over-simplifying things a little, but
packer provides mostly [just the bear-necessities](https://www.packer.io/docs/provisioners/index.html)
(sorry, [song is stuck in my head](https://www.youtube.com/watch?v=08NlhjpVFsU)). Roughly ten
minutes after running a simple packer build command, the VMs are automatically torn down, and their disks
saved.  At a time of our choosing, an image can be imported from the storage bucket,
then a small PR tossed up to activate the images for Cirrus.

### Packer → Cirrus-CI Connection

Next up the stack, we'll dig into some basic details of the Cirrus CI system.  If you've used
services like Travis before, this example .cirrus.yml file won't be too surprising (simplified
somewhat for example purposes):

```yaml
---

# Safely stored details about accessing our cloud
gcp_credentials: ENCRYPTED[blahblah]

env:  # environment and behavioral values for all tasks and scripts
   # Where to clone the source code into
   CIRRUS_WORKING_DIR: "/var/tmp/go/src/github.com/containers/libpod"
   SCRIPT_BASE: ./contrib/cirrus  # saves some typing (below)

testing_task:  # One particular set of things to do

   gce_instance:  # What kind of VM to use
       image_name:  # Same as image_name produced by packer (above)

   script:  # Step by step
       - $SCRIPT_BASE/setup_environment.sh   # does what it says
       - $SCRIPT_BASE/unit_test.sh           # this too
       - $SCRIPT_BASE/integration_test.sh    # and this

With Cirrus CI "installed" on a GitHub repository, upon any pull request change, Cirrus CI will step in to kick things within GCE, then report back results in your pull request.

However, we also need to test more than one OS.  This is easily accomplished in Cirrus CI, by using what they call a matrix modification. Roughly translated into simple country-folk speak as: "we done messed up our YAML parser to do more fancier things, and stuff". Illustrated in part by looking at an excerpt from our actual .cirrus.yml file in the libpod repository:

...cut...

testing_task:

  gce_instance:
       image_project: "libpod-123456"
       zone: "us-central1-a"
       cpu: 2
       memory: "4Gb"
       disk: 200
       matrix:
           image_name: "ubuntu-18-libpod-a250386d" # <-- name from packer
           image_name: "fedora-28-libpod-a250386d"
           image_name: "fedora-29-libpod-a250386d"
...cut...

The above will automatically duplicate the testing_task three times, running a different VM image for each. You can run a matrix across other items as well, like environment variables. There are also options for filtering your matrix, and adding dependencies between tasks. I'd spell those our for you, but it's liable to suck the lubrication from your fidget-spinner.

Good looks and clean presentation

Another Cirrus CI feature we utilize, has to do with the way the scripting output is presented. This includes what you don't see, like extraneous buttons and widgets. The way details are presented can be critical for debugging. Here's how we leverage that simplicity:

testing_task:

   ...cut...

   setup_environment_script: $SCRIPT_BASE/setup_environment.sh

   unit_test_script: $SCRIPT_BASE/unit_test.sh
   integration_test_script: $SCRIPT_BASE/integration_test.sh

   ...cut...

It's possible to have multiple scripts or commands per _script section.  Because we dedicate one per, the output is presented in bite-size pieces:

This makes it super easy to find what you're looking for. If the unit-tests fail with a complaint about some invalid environment variable. It's easier to drop down that box than to go scrolling through a giant wall of text (though that's sometimes necessary also). On the other hand, if the output was all jammed into a single _script block, tracking down problems might get too challenging for my old-fogy sensibilities. Mind I've only celebrated my 38th birthday four times so far...and remember exactly zero of what happened those nights.

Conclusion

There are many other details I could get into, but sadly, my coffee mug is empty and I can see that I forgot to wash it (again).  Nevertheless, if you need some simple nuts-and-bolts automation, I highly recommend Cirrus-CI. It's (beer) free to use for open-source projects. The Google Cloud Engine is also pseudo-free for quite a while, since they give you a very generous, and substantial startup credit.

Other than finding a new mug or my soap, if there are any burning questions here, or snide remarks there, please feel free to find me in #podman on Freenode (IRC). Unless the question is too-smart, I might even be able to answer it. Until then, may your pretty code keep its bugs well hidden and out of sight.