Git Config

Here is my fancy Git setup to make things pretty and easy. First is the .gitconfig

[user]
	name = Steven Barre
	email = steven@stevenbarre.com
[core]
	editor = vim
[merge]
	tool = vimdiff
[color]
	ui = auto
[alias]
        br = branch
        ci = commit
        co = checkout
        dc = diff --cached
        di = diff
        last = log -1 HEAD
        lr = log --pretty=format:"%C(yellow)%h\\ %ad%Cred%d\\ %Creset%s%Cblue\\ [%cn]" --decorate --date=relative --graph
        ls = log --pretty=format:"%C(yellow)%h\\ %ad%Cred%d\\ %Creset%s%Cblue\\ [%cn]" --decorate --date=short --graph
        la = log --pretty=format:"%C(yellow)%h\\ %ad%Cred%d\\ %Creset%s%Cblue\\ [Committed\\ by:\\ %cn]%Cgreen\\ [Authored\\ by:\\ %an]" --decorate --date=short --graph
        search = grep --break --heading --line-number -P
        st = status
        unstage = reset HEAD --
[push]
        default = current

This gives you a bunch of nice aliases, colorizes the output, and sets VIM as the default editor and merger.

The default push action lets you just say “git push” on a new branch and have it create a matching repo on the remote and set it as the upstream. git-config docs for reference.

Next we’ll add some goodies to .bashrc to give us a pretty prompt.

First, we need to get the extra git shell functions loaded. We do this by linking the contrib file into /etc/profile.d

# sudo ln -s /usr/share/git-core/contrib/completion/git-prompt.sh /etc/profile.d/

Then put this into your .bashrc

# Show colors for branch name and indicators
export GIT_PS1_SHOWCOLORHINTS=1
# Show unstaged (*) and staged (+)
export GIT_PS1_SHOWDIRTYSTATE=1
# Show if something is stashed ($)
export GIT_PS1_SHOWSTASHSTATE=1
# Show if there are untracked files (%)
export GIT_PS1_SHOWUNTRACKEDFILES=1

# User@Host:pwd $
# User@host:pwd (branch) $
export PROMPT_COMMAND='__git_ps1 "\[\033[31;1m\]\u\[\033[0m\]@\[\033[34;1m\]\h\[\033[0m\]:\[\033[33m\]\w\[\033[0m\]" " \$ "'

Here is what it looks like.

git-terminal

Why .bashrc ? Because it’s loaded for all bash shells, not just ones created when logging in. This is important if you are using screen or a GUI terminal.

I’ve created a repo on GitHub to hold all my dotfiles to keep track of things like this.

Advertisements

RPM Version Comparison

I was doing some research into how RPM compares versions, as it appeared to be more complex than simple semver comparisons. Turns out is super whacky. One of the Puppet authors wrote a blog post to much better explain what’s going on. I’m going to just copy it here so I have my own copy in case the blog ever goes away.


Package Naming and Parsing

RPM package names are made up of five parts; the package name, epoch, version, release, and architecture. This format is commonly referred to as the acronym NEVRA. The epoch is not always included; it is assumed to be zero (0) on any packages that lack it explicitly. The format for the whole string is n-e:v-r.a. For my purposes, I was only really concerned with comparing theEVR portion; Puppet knows about package names and the bug herein was with what Puppet calls the “version” (EVR in yum/rpm parlance). Parsing is pretty simple:

  • If there is a : in the string, everything before it is the epoch. If not, the epoch is zero.
  • If there is a - in the remaining string, everything before the first - is the version, and everything after it is the release. If there isn’t one, the release is considered null/nill/None/whatever.

How Yum Compares EVR

Once the package string is parsed into its EVR components, yum calls rpmUtils.miscutils.compareEVR(), which does some data type massaging for the inputs, and then calls out to rpm.labelCompare() (found in rpm.git/python/header-py.c).labelCompare() sets each epoch to “0” if it was null/Nonem, and then uses compare_values() to compare each EVR portion, which in turn falls through to a function called rpmvercmp() (see below). The algorithm for labelCompare() is as follows:

  1. Set each epoch value to 0 if it’s null/None.
  2. Compare the epoch values using compare_values(). If they’re not equal, return that result, else move on to the next portion (version). The logic within compare_values() is that if one is empty/null and the other is not, the non-empty one is greater, and that ends the comparison. If neither of them is empty/not present, compare them using rpmvercmp() and follow the same logic; if one is “greater” (newer) than the other, that’s the end result of the comparison. Otherwise, move on to the next component (version).
  3. Compare the versions using the same logic.
  4. Compare the releases using the same logic.
  5. If all of the components are “equal”, the packages are the same.

The real magic, obviously, happens in rpmvercmp(), the rpm library function to compare two versions (or epochs, or releases). That’s also where the madness happens.

How RPM Compares Version Parts

RPM is written in C. Converting all of the buffer and pointer processing for these strings over to Ruby was quite a pain. That being said, I didn’t make this up, this is actually the algorithm that rpmvercmp() (lib/rpmvercmp.c) uses to compare version “parts” (epoch, version, release). This function returns 0 if the strings are equal, 1 if a (the first string argument) is newer than b (the second string argument), or -1 if a is older than b. Also keep in mind that this uses pointers in C, so it works by removing a sequence of 0 or more characters from the front of each string, comparing them, and then repeating for the remaining characters in each string until something is unequal, or a string reaches its end.

  1. If the strings are binary equal (a == b), they’re equal, return 0.
  2. Loop over the strings, left-to-right.
    1. Trim anything that’s not [A-Za-z0-9] or tilde (~) from the front of both strings.
    2. If both strings start with a tilde, discard it and move on to the next character.
    3. If string a starts with a tilde and string b does not, return -1 (string a is older); and the inverse if string b starts with a tilde and string a does not.
    4. End the loop if either string has reached zero length.
    5. If the first character of a is a digit, pop the leading chunk of continuous digits from each string (which may be ” for b if only one a starts with digits). If a begins with a letter, do the same for leading letters.
    6. If the segement from b had 0 length, return 1 if the segment from a was numeric, or -1 if it was alphabetic. The logical result of this is that if a begins with numbers and b does not, a is newer (return 1). If a begins with letters and b does not, then a is older (return -1). If the leading character(s) from a and b were both numbers or both letters, continue on.
    7. If the leading segments were both numeric, discard any leading zeros and whichever one is longer wins. If a is longer than b (without leading zeroes), return 1, and vice-versa. If they’re of the same length, continue on.
    8. Compare the leading segments with strcmp() (or <=> in Ruby). If that returns a non-zero value, then return that value. Else continue to the next iteration of the loop.
  3. If the loop ended (nothing has been returned yet, either both strings are totally the same or they’re the same up to the end of one of them, like with “1.2.3” and “1.2.3b”), then the longest wins – if what’s left of a is longer than what’s left of b, return 1. Vice-versa for if what’s left of b is longer than what’s left of a. And finally, if what’s left of them is the same length, return 0.

I also found a GitHub repo for a pure Python implementation of this, instead of loading in the C library to python. Here is the main code (again, just copying to make sure I have my own copy)

#
# Copyright (c) SAS Institute Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from __future__ import print_function
from __future__ import unicode_literals
import re


class Vercmp(object):
    R_NONALNUMTILDE = re.compile(br"^([^a-zA-Z0-9~]*)(.*)$")
    R_NUM = re.compile(br"^([\d]+)(.*)$")
    R_ALPHA = re.compile(br"^([a-zA-Z]+)(.*)$")

    @classmethod
    def compare(cls, first, second):
        first = first.encode("ascii", "ignore")
        second = second.encode("ascii", "ignore")
        while first or second:
            m1 = cls.R_NONALNUMTILDE.match(first)
            m2 = cls.R_NONALNUMTILDE.match(second)
            m1_head, first = m1.group(1), m1.group(2)
            m2_head, second = m2.group(1), m2.group(2)
            if m1_head or m2_head:
                # Ignore junk at the beginning
                continue

            # handle the tilde separator, it sorts before everything else
            if first.startswith(b'~'):
                if not second.startswith(b'~'):
                    return -1
                first, second = first[1:], second[1:]
                continue
            if second.startswith(b'~'):
                return 1

            # If we ran to the end of either, we are finished with the loop
            if not first or not second:
                break

            # grab first completely alpha or completely numeric segment
            m1 = cls.R_NUM.match(first)
            if m1:
                m2 = cls.R_NUM.match(second)
                if not m2:
                    # numeric segments are always newer than alpha segments
                    return 1
                isnum = True
            else:
                m1 = cls.R_ALPHA.match(first)
                m2 = cls.R_ALPHA.match(second)
                isnum = False

            if not m1:
                # this cannot happen, as we previously tested to make sure that
                # the first string has a non-null segment
                return -1  # arbitrary
            if not m2:
                return 1 if isnum else -1

            m1_head, first = m1.group(1), m1.group(2)
            m2_head, second = m2.group(1), m2.group(2)

            if isnum:
                # throw away any leading zeros - it's a number, right?
                m1_head = m1_head.lstrip(b'0')
                m2_head = m2_head.lstrip(b'0')

                # whichever number has more digits wins
                m1hlen = len(m1_head)
                m2hlen = len(m2_head)
                if m1hlen < m2hlen:                     return -1                 if m1hlen > m2hlen:
                    return 1

            # Same number of chars
            if m1_head < m2_head:                 return -1             if m1_head > m2_head:
                return 1
            # Both segments equal
            continue

        m1len = len(first)
        m2len = len(second)
        if m1len == m2len == 0:
            return 0
        if m1len != 0:
            return 1
        return -1


def vercmp(first, second):
    return Vercmp.compare(first, second)

UTF8 in PHP and MySQL

Iñtërnâtiônàlizætiøn

Can your code handle that? I found a lot of great info on the PHP WACT site for making your code and database work well with complex character sets. This is important even if your intended audience is North American english speaking only. MS Word uses smart quotes which are UTF8 characters and will cause havoc on your site.

So what’s the tl;dr of the PHP WACT site?

  1. All MySQL tables and columns set to utf8-general-ci (or other sorting of your choice).
  2. Ensure your connection to the database is UTF8 with SET NAMES 'utf8';
  3. Send a header to declare your page is UTF8. This also ensures POST content is sent to you in UTF8. The browser will help convert for you. header('Content-Type: text/html; charset=utf-8');
  4. Use htmlspecialentities() for making user submitted or untrusted text safe to display in HTML or XML. It will do the bare minimum and nothing more. htmlspecialchars($utf8_string, ENT_COMPAT, 'UTF-8');

Max Woolf has a GitHub repo of Naughty Strings you can use in testing your code to make sure it supports everything.

Bonus reading: Emoji and MySQL use utf8mb4