First commit for GSoC

June 1, 2009 in en, google, gsoc, open source, projects, software

Recently I finished all of my duties as a student for this term and I could therefore spend the weekend catching up on GSoC (since I am one week behind schedule). In the end it turned out to be pretty productive weekend.

I’ll summarize basic architecture without any images (I’ll create them later this week probably when everything will settle down). There are two core packages:

  • Matchbox
  • Tinderbox

Matchbox is master server that knows what still needs to be compiled and collects all information. There is always only one Matchbox. There can however be more Tinderboxes. These machines connect to Matchbox and ask for next package to emerge (compile). After emerging package they collect information about files in the package, use flags, emerge environment and error logs from compile phase. This information is then sent back to Matchbox. Tinderbox then asks for another file to emerge. repeat while true.

First thing I did was create basic data model for storing data about compiled packages. What use flags were used, error logs and stuff like that. Lot of things are not in the model, for example information about tinderboxes, but for now this will do. UML diagram is on following picture:

This model should allow efficient storage of data and a lot of flexibility to boot. There can be more versions of the same package (of course) and also packages can change package category (happens quite often). We can also collect different data sets based on USE flags.

With basic data model in place it was time for some serious prototyping :-) Naturally I decided to split implementation into two parts, one for each core modules (more to come later). Matchbox is simple listening server waiting for incoming connections. I wanted to simplify network communication for myself, so I used python module pickle. This module is able to create string representation of classes/functions and basic data types. Because of this I was able to use objects as  network messages. Objects representing Matchbox command set:

class MatchboxCommand(object): pass

class GetNextPackage(MatchboxCommand):
    pass

class AddPackageInfo(MatchboxCommand):
    def __init__(self, package_info):
        self.package_info = package_info

On the other side Tinderbox understands these commands (for now):

class MatchboxReply(object): pass

class GetNextPackageReply(MatchboxReply):
    def __init__(self, package_name, version, use_flags):
        self.package_name = package_name
        self.version = version
        self.use_flags = use_flags

Communication (simplified) goes something like this:
Tinderbox
msg = GetNextPackage()
msg_pickled = pickle.dumps(msg)
sock.sendall(msg_pickled)

Matchbox
data = sock.recv()
command = pickle.loads(data)
if type(command) is GetNextPackage:
        package = get_next_package_to_emerge()
        msg = GetNextPackageReply(package)
        msg_pickled = pickle.dumps(msg)
        sock.sendall(msg_pickled)

There is one BIG caveat to this kind of communication. It is very easy tampered with. This is directly from pickle documentation:

Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

We will have to decide whether to reimplement this part, or trust Gentoo infrastructure. So what do we have for now?
  • Basic communication between Matchbox/Tinderbox
  • Compiling works with file list/emerge environment/stdout/stderr/etc being send back to Matchbox

There is still much more ahead of us:

  • package selection on Matchbox side
  • block resolution on Tinderbox
  • rest of services (web interface, client, etc)
Since GSoC students didn’t get git repositories on gentoo servers just yet you can see the code in gentoo-collagen@github. So long and thanks for all the fish (for now)