RPM Version Comparison

I was doing some research into how RPM compares versions, as it appeared to be more complex than simple semver comparisons. Turns out is super whacky. One of the Puppet authors wrote a blog post to much better explain what’s going on. I’m going to just copy it here so I have my own copy in case the blog ever goes away.


Package Naming and Parsing

RPM package names are made up of five parts; the package name, epoch, version, release, and architecture. This format is commonly referred to as the acronym NEVRA. The epoch is not always included; it is assumed to be zero (0) on any packages that lack it explicitly. The format for the whole string is n-e:v-r.a. For my purposes, I was only really concerned with comparing theEVR portion; Puppet knows about package names and the bug herein was with what Puppet calls the “version” (EVR in yum/rpm parlance). Parsing is pretty simple:

  • If there is a : in the string, everything before it is the epoch. If not, the epoch is zero.
  • If there is a - in the remaining string, everything before the first - is the version, and everything after it is the release. If there isn’t one, the release is considered null/nill/None/whatever.

How Yum Compares EVR

Once the package string is parsed into its EVR components, yum calls rpmUtils.miscutils.compareEVR(), which does some data type massaging for the inputs, and then calls out to rpm.labelCompare() (found in rpm.git/python/header-py.c).labelCompare() sets each epoch to “0” if it was null/Nonem, and then uses compare_values() to compare each EVR portion, which in turn falls through to a function called rpmvercmp() (see below). The algorithm for labelCompare() is as follows:

  1. Set each epoch value to 0 if it’s null/None.
  2. Compare the epoch values using compare_values(). If they’re not equal, return that result, else move on to the next portion (version). The logic within compare_values() is that if one is empty/null and the other is not, the non-empty one is greater, and that ends the comparison. If neither of them is empty/not present, compare them using rpmvercmp() and follow the same logic; if one is “greater” (newer) than the other, that’s the end result of the comparison. Otherwise, move on to the next component (version).
  3. Compare the versions using the same logic.
  4. Compare the releases using the same logic.
  5. If all of the components are “equal”, the packages are the same.

The real magic, obviously, happens in rpmvercmp(), the rpm library function to compare two versions (or epochs, or releases). That’s also where the madness happens.

How RPM Compares Version Parts

RPM is written in C. Converting all of the buffer and pointer processing for these strings over to Ruby was quite a pain. That being said, I didn’t make this up, this is actually the algorithm that rpmvercmp() (lib/rpmvercmp.c) uses to compare version “parts” (epoch, version, release). This function returns 0 if the strings are equal, 1 if a (the first string argument) is newer than b (the second string argument), or -1 if a is older than b. Also keep in mind that this uses pointers in C, so it works by removing a sequence of 0 or more characters from the front of each string, comparing them, and then repeating for the remaining characters in each string until something is unequal, or a string reaches its end.

  1. If the strings are binary equal (a == b), they’re equal, return 0.
  2. Loop over the strings, left-to-right.
    1. Trim anything that’s not [A-Za-z0-9] or tilde (~) from the front of both strings.
    2. If both strings start with a tilde, discard it and move on to the next character.
    3. If string a starts with a tilde and string b does not, return -1 (string a is older); and the inverse if string b starts with a tilde and string a does not.
    4. End the loop if either string has reached zero length.
    5. If the first character of a is a digit, pop the leading chunk of continuous digits from each string (which may be ” for b if only one a starts with digits). If a begins with a letter, do the same for leading letters.
    6. If the segement from b had 0 length, return 1 if the segment from a was numeric, or -1 if it was alphabetic. The logical result of this is that if a begins with numbers and b does not, a is newer (return 1). If a begins with letters and b does not, then a is older (return -1). If the leading character(s) from a and b were both numbers or both letters, continue on.
    7. If the leading segments were both numeric, discard any leading zeros and whichever one is longer wins. If a is longer than b (without leading zeroes), return 1, and vice-versa. If they’re of the same length, continue on.
    8. Compare the leading segments with strcmp() (or <=> in Ruby). If that returns a non-zero value, then return that value. Else continue to the next iteration of the loop.
  3. If the loop ended (nothing has been returned yet, either both strings are totally the same or they’re the same up to the end of one of them, like with “1.2.3” and “1.2.3b”), then the longest wins – if what’s left of a is longer than what’s left of b, return 1. Vice-versa for if what’s left of b is longer than what’s left of a. And finally, if what’s left of them is the same length, return 0.

I also found a GitHub repo for a pure Python implementation of this, instead of loading in the C library to python. Here is the main code (again, just copying to make sure I have my own copy)

#
# Copyright (c) SAS Institute Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from __future__ import print_function
from __future__ import unicode_literals
import re


class Vercmp(object):
    R_NONALNUMTILDE = re.compile(br"^([^a-zA-Z0-9~]*)(.*)$")
    R_NUM = re.compile(br"^([\d]+)(.*)$")
    R_ALPHA = re.compile(br"^([a-zA-Z]+)(.*)$")

    @classmethod
    def compare(cls, first, second):
        first = first.encode("ascii", "ignore")
        second = second.encode("ascii", "ignore")
        while first or second:
            m1 = cls.R_NONALNUMTILDE.match(first)
            m2 = cls.R_NONALNUMTILDE.match(second)
            m1_head, first = m1.group(1), m1.group(2)
            m2_head, second = m2.group(1), m2.group(2)
            if m1_head or m2_head:
                # Ignore junk at the beginning
                continue

            # handle the tilde separator, it sorts before everything else
            if first.startswith(b'~'):
                if not second.startswith(b'~'):
                    return -1
                first, second = first[1:], second[1:]
                continue
            if second.startswith(b'~'):
                return 1

            # If we ran to the end of either, we are finished with the loop
            if not first or not second:
                break

            # grab first completely alpha or completely numeric segment
            m1 = cls.R_NUM.match(first)
            if m1:
                m2 = cls.R_NUM.match(second)
                if not m2:
                    # numeric segments are always newer than alpha segments
                    return 1
                isnum = True
            else:
                m1 = cls.R_ALPHA.match(first)
                m2 = cls.R_ALPHA.match(second)
                isnum = False

            if not m1:
                # this cannot happen, as we previously tested to make sure that
                # the first string has a non-null segment
                return -1  # arbitrary
            if not m2:
                return 1 if isnum else -1

            m1_head, first = m1.group(1), m1.group(2)
            m2_head, second = m2.group(1), m2.group(2)

            if isnum:
                # throw away any leading zeros - it's a number, right?
                m1_head = m1_head.lstrip(b'0')
                m2_head = m2_head.lstrip(b'0')

                # whichever number has more digits wins
                m1hlen = len(m1_head)
                m2hlen = len(m2_head)
                if m1hlen < m2hlen:                     return -1                 if m1hlen > m2hlen:
                    return 1

            # Same number of chars
            if m1_head < m2_head:                 return -1             if m1_head > m2_head:
                return 1
            # Both segments equal
            continue

        m1len = len(first)
        m2len = len(second)
        if m1len == m2len == 0:
            return 0
        if m1len != 0:
            return 1
        return -1


def vercmp(first, second):
    return Vercmp.compare(first, second)
Advertisements

VirtualBox Addon Installation in CentOS

OK, so you have a minimal install of CentOS in VirtualBox and you want to install the Guest Addons to help your VM keeps its clock correct, and share folders with your host OS. What are the dependencies? How do you get this to install?

These directions work for CentOS 5, 6 and 7.

Install some packages needed to build the kernel module.

yum install epel-release
yum install dkms gcc make bzip2 perl
yum install kernel-devel-$(uname -r)

We need EPEL to get dkms

We install dkms to ensure that the addons are rebuild when we upgrade the kernel in the future.

We do the fancy bit with the uname to ensure we get the right version of kernel-devel for the currently running kernel. If you’ve updated the kernel since booting you should reboot before installing kernel-devel and the addons or else you might have some problems building them.

Mount the Guest Addons CD and run the installer.

mkdir /mnt/cdrom
mount /dev/sr0 /mnt/cdrom/
/mnt/cdrom/VBoxLinuxAdditions.run

Answer yes a few times,  and you are done!

Virtualbox Guest Addons Manual

CentOS VirtualBox HowTo

What is a minimal install?

In my post Basics of Kickstart I talked about doing a minimal install. So what is included in the @core group for each major version of CentOS? I’ll document here for the current versions as it’s possible these values will change.

CentOS 5.11

  • Mandatory Packages
    • SysVinit
    • authconfig
    • basesystem
    • bash
    • centos-release
    • coreutils
    • cpio
    • dhclient
    • dhcpv6-client
    • e2fsprogs
    • ed
    • file
    • filesystem
    • glibc
    • hdparm
    • hmaccalc
    • initscripts
    • iproute
    • iputils
    • kbd
    • kudzu
    • libgcc
    • libhugetlbfs
    • libtermcap
    • mkinitrd
    • openssh-server
    • passwd
    • policycoreutils
    • prelink
    • procps
    • readline
    • redhat-logos
    • rootfiles
    • rpm
    • selinux-policy-targeted
    • setools
    • setserial
    • setup
    • shadow-utils
    • sysklogd
    • termcap
    • util-linux
    • vim-minimal
    • yum
  • Default Packages
    • Deployment_Guide-en-US
    • grub
    • sysfsutils
    • udftools

CentOS 6.8

  • Mandatory Packages
    • acl
    • attr
    • audit
    • authconfig
    • basesystem
    • bash
    • coreutils
    • cpio
    • cronie
    • dhclient
    • e2fsprogs
    • filesystem
    • glibc
    • initscripts
    • iproute
    • iptables
    • iptables-ipv6
    • iputils
    • kbd
    • ncurses
    • openssh-server
    • passwd
    • policycoreutils
    • procps
    • rootfiles
    • rpm
    • rsyslog
    • selinux-policy-targeted
    • setup
    • shadow-utils
    • sudo
    • system-config-firewall-base
    • util-linux-ng
    • vim-minimal
    • yum
  • Default Packages
    • aic94xx-firmware
    • atmel-firmware
    • b43-openfwwf
    • bfa-firmware
    • efibootmgr
    • grub
    • ipw2100-firmware
    • ipw2200-firmware
    • ivtv-firmware
    • iwl100-firmware
    • iwl1000-firmware
    • iwl3945-firmware
    • iwl4965-firmware
    • iwl5000-firmware
    • iwl5150-firmware
    • iwl6000-firmware
    • iwl6000g2a-firmware
    • iwl6050-firmware
    • kernel-firmware
    • kexec-tools
    • libertas-usb8388-firmware
    • netxen-firmware
    • postfix
    • ql2100-firmware
    • ql2200-firmware
    • ql23xx-firmware
    • ql2400-firmware
    • ql2500-firmware
    • rdma
    • rt61pci-firmware
    • rt73usb-firmware
    • xorg-x11-drv-ati-firmware
    • zd1211-firmware

CentOS 7.2.1511

  • Mandatory Packages
    • audit
    • basesystem
    • bash
    • biosdevname
    • btrfs-progs
    • coreutils
    • cronie
    • curl
    • dhclient
    • e2fsprogs
    • filesystem
    • firewalld
    • glibc
    • hostname
    • initscripts
    • iproute
    • iprutils
    • iptables
    • iputils
    • irqbalance
    • kbd
    • kexec-tools
    • less
    • man-db
    • ncurses
    • openssh-clients
    • openssh-server
    • parted
    • passwd
    • plymouth
    • policycoreutils
    • procps-ng
    • rootfiles
    • rpm
    • rsyslog
    • selinux-policy-targeted
    • setup
    • shadow-utils
    • sudo
    • systemd
    • tar
    • tuned
    • util-linux
    • vim-minimal
    • xfsprogs
    • yum
  • Default Packages
    • NetworkManager
    • NetworkManager-team
    • NetworkManager-tui
    • aic94xx-firmware
    • alsa-firmware
    • dracut-config-rescue
    • ivtv-firmware
    • iwl100-firmware
    • iwl1000-firmware
    • iwl105-firmware
    • iwl135-firmware
    • iwl2000-firmware
    • iwl2030-firmware
    • iwl3160-firmware
    • iwl3945-firmware
    • iwl4965-firmware
    • iwl5000-firmware
    • iwl5150-firmware
    • iwl6000-firmware
    • iwl6000g2a-firmware
    • iwl6000g2b-firmware
    • iwl6050-firmware
    • iwl7260-firmware
    • iwl7265-firmware
    • kernel-tools
    • libsysfs
    • linux-firmware
    • microcode_ctl
    • postfix
    • rdma

So if you use --nobase in your kickstart file, you will get everything in Mandatory and Default lists. You can choose to exclude some of the defaults with minus (-) to slim up your install though, or use --nodefaults to skip them entirely.

To find these lists yourself, just ask yum! yum groupinfo core

Basics of Kickstart

If you need to install CentOS over and over, one useful thing is to create a kickstart file. This is a text config file that directs the install program and can make an install entirely unattended.

You can find a full reference of all options on Red Hat’s documentation site.23.3. Kickstart Syntax Reference.

Sources

So what do we need? And what are some nice addons?

First, we can say we want to install. This is optional, but encouraged. Followed by a source for the packages. This can be a local media like cdrom or harddisk, or a network share like nfs, or my favorite url.

The URL you specify here should be to the os folder on a mirror and have as a subfolder repodata. This URL will have all the packages needed to install CentOS. You can find a list of mirrors on the CentOS site, or just provide a mirrorlist URL instead. A mirrorlist URL will give YUM a place to fetch a list of mirrors to try and it will attempt to get the fastest one.

You can also specify additional repos for the installer to pull packages from as it sees fit or that you specify. I like to at least include the updates repo, so that we install the latest packages on the first try, and don’t have to do a yum update after the install. Here is our kickstart file so far.

# Do an install
install

# From this hard coded URL
# url --url=http://mirror.its.sfu.ca/mirror/CentOS/7/os/x86_64/
# Or better yet, from a mirrorlist with variables
url --mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
# Extra repos let us install the latest versions
repo --name="Updates" --mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
# Optional if you want packages from EPEL
repo --name="epel" --mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=epel-$releasever&arch=$basearch

Install Settings

We can select the kind of display we want from the installer. graphical, text, or cmdline. I choose cmdline as it provides the most helpful debug output. text mode gives you the classic ncurses display.

We can also set what the installer should do after finishing. I find reboot to be the most helpful, but you can also choose halt and poweroff.

Lastly, we’ll disable firstboot. That’s the “helper” you get on the first boot up asking you to make a user and such. Since we are trying to automate things, we don’t want to be bothered.

# Install mode
cmdline

# reboot when finished the install
reboot

# Disable firstboot
firstboot --disabled

System Config

Here we specify some settings for the system we are building. We’ll set the language, keyboard layout, timezone, and SELinux. We’ll also set the default password storage policy to the strongest available.

# System settings
lang en_US.UTF-8
keyboard us
authconfig --enableshadow --passalgo=sha512
selinux --disabled
timezone UTC

Network

Most places I use Linux there is Software Defined Networking (SDN) and it handles all the firewalling, so I just disable it in the system. We also want to turn off IPv6 as its just extra junk we don’t need. And we’ll stick to DHCP here.

network --bootproto dhcp --noipv6
firewall --disabled

Root Password

This just sets the root password. You can grab the hash for an existing user from /etc/shadow or just use a plaintext password.

#rootpw --plaintext mycoolpassword
rootpw --iscrypted $6$BHils6Q1$hTRN8PUTpmQG6y7bkeSPqWrWxCV9uja9EMhsmf5qk4rDhdnKHznYiz5CxBmFqiaO14I7utwu7ToH6y7gMwFeq/

Disk Space

Now we want to specify the disk layout. Do we want basic partitions or LVM? How big should stuff be? I usually go with a 1GB swap, 1GB /tmp and the rest as root disk. I add some safety options to /tmp to make sure evil things don’t try and exec from there.

# Set up the drive
bootloader --location=mbr
zerombr
clearpart --all --initlabel
part swap --asprimary --size=1024
part /tmp --fstype=ext4 --asprimary --size=1024 --fsoptions="defaults,nosuid,noexec"
part / --fstype=ext4 --grow --asprimary --size=100

Package Selection

I like to install the minimal possible and handle the rest with config management. The minimal install uses the @core group and not the @base group. @core includes a lot of packages by default that we probably don’t need. WiFi drivers, RAID card drivers, and junk like that. I’m usually building a image for VM use, so can exclude most of that by putting a minus (-) in front of the name. You can also use an asterisk (*) as a wildcard to match a bunch of packages. There are a few packages from @base I do like to include though, like acpid.

%packages --nobase
acpid
-aic94xx-firmware
-alsa-firmware
-bfa-firmware
-ivtv-firmware
-iwl*-firmware
-rdma
%end

Post Script

After the install is complete, you can run some shell scripts before the reboot to help get your system just right. I make some tweaks to grub and re-install it. Then I import all the RPM keys, so that when I run yum it doesn’t ask about importing them the first time.

%post
# Reduce timeout for faster boot
sed -i 's/GRUB_TIMEOUT=5/GRUB_TIMEOUT=1/' /etc/default/grub
# Set consoles for proper logging and vnc
# Be noisey to help debugging
sed -i 's/GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet"/GRUB_CMDLINE_LINUX="crashkernel=auto console=ttyS0,115200n8 console=tty0"/' /etc/default/grub
# Rebuild grub config
grub2-mkconfig -o /boot/grub2/grub.cfg
# Import all the keys
/bin/rpm --import /etc/pki/rpm-gpg/*
%end

And with all that, we are done a basic kickstart. Be sure to read the docs and customize as you see fit!

yum clean all – will miss stuff

So I learnt that yum clean all won’t delete folders from /var/cache/yum for repos that aren’t currently enabled. This can bloat VM images you are building. So it’s best to just rm -rf /var/cache/yum to make sure you get everything after a yum clean all.

Here is the full set of yum cleaning I do in my image build scripts to make them as small as possible.

yum history new
yum clean all
rm -f /var/lib/rpm/__db*
rm -rf /var/cache/yum
rm -rf /var/lib/yum