jctld -- A Job Control Daemon

[ about | configuration | control | protocol | download ]

Latest is jctld 0.9.6

About jctld

jctld is a job/process control system for clusters of machines. It appeared due to the need for a semi-capable job control system, with the the sorts of features that we need/want on our cluster (the TUNA pi-cluster), and that's free software. The desired features:

The whole system comprises mostly of two programs -- 'jctld' and 'jcli'. Machines are divided mostly into three categories:

Configuration is handled using a couple of XML files, and geared such that clients and servers (and if needed, users) can share the same config file. Public/private key pairs are used to identify clients and users to servers, and vice versa; both programs take a '--genkeypair' argument that can be used to generate keys. In a real-world installation (as it's aiming for), a user will create themselves a key-pair, which an administrator adds to the server config.

Access and control are handled through sets of privileges, which can be assigned to single users or groups of users. The privileges specify what the associated user or group is allowed to do, or not allowed to do -- 'deny's are searched before 'allow's. The way it works should be apparent in the sample configuration shown below.

Configuring jctld

The general structure of a configuration file as as follows:

    <?xml version="1.0" encoding="iso-8859-1"?>
        <keys path="/path/to/keys/"/>
        <!-- user definitions -->
        <!-- group definitions -->
        <!-- privilege definitions -->
        <!-- server definitions -->
        <!-- host/client definitions -->

Because the configuration can be for a client, the top-level element may be <jcli> rather than <jctld>. The <keys> entry specifies where keys may be found, this is prepended to keyfiles unless the latter starts with a leading slash or tilde (interpreted as the invoking user's home-directory).

The 'users' section defines the known users. For an end-user (using 'jcli'), this might only contain a single entry for themselves. For example:

    <!-- user definitions for the server -->
        <user name="frmb" realname="Fred" pubkey="frmb.pub" priv="superuser" />
        <user name="phw" realname="Peter" pubkey="phw.pub" />
        <user name="pssc" realname="Phill" pubkey="pssc.pub" priv="superuser" />
        <user name="ats" realname="Adam" pubkey="ats.pub" />
        <user name="mig" realname="minimum intrusion grid" pubkey="mig.pub" unixuser="miguser" jobshell="/bin/bash -c" />

    <!-- user definitions for the user -->
        <user name="frmb" realname="Fred" privkey="frmb.key" />

As far as controlling jobs is concerned, at least some users need to be real users (i.e. the 'name' attribute matches the system username, or the 'unixname' attribute does). Other users may be entirely artificial (e.g. scripts and other automated things). The "mig" user above additionally has a 'jobshell' attribute, indicating that bash should be used to interpret commands (normally this might be some useful local wrapper script).

The 'groups' section defines logical groups of users -- unrelated to user-groups on unix systems. The main use of groups is for assigning privileges to whole sets of users, and is only to jctld (not jcli). For example:

        <group name="tunausers" description="tuna-project users" privs="tunauser">
            <user name="frmb" />
            <user name="phw" />
            <user name="ats" />

This gives all 'tunausers' (the three listed) privileges specified by 'tunauser'. Like groups, privileges are only relevant for jctld. The above identifies two privileges, 'superuser' and 'tunauser', for example:

        <priv name="superuser">
            <grant operations="*">
                <!-- allow superusers to do anything to anyone in 'allusers' -->
                <group name="allusers" />
        <priv name="tunauser">
            <grant operations="suspend,resume,list">
                <!-- allow tuna-users to suspend, resume and list each other's jobs -->
                <group name="tunausers" />
        <priv name="monitor">
            <grant operations="list">
                <!-- allow monitors to list jobs -->
                <group name="allusers" />
            <restrict operations="start,kill,suspend,resume">
                <!-- do not allow monitors to start, kill, suspend or resume jobs -->
                <group name="allusers" />
        <priv name="@self">
            <grant operations="start,kill,suspend,resume,list" />

The penultimate set of privileges defined, 'monitors', restricts any user with that privilege to listing jobs only (assuming they are not in 'allusers'). A special privilege group '@self' defines to what extent users control their own jobs -- restrictions will always override this however.

The 'servers' section lists jctld servers, to which users and clients/hosts connect. When initialising, 'jctld' will search this section for an entry matching its own hostname (overridden with '-N, --name'). Server definitions may appear in all configurations. For example:

    <!-- server definitions for the server -->
        <server name="tadpole" privkey="tadpole.key">
            <inet hostname="*" port="5020" />
            <unix path="/tmp/.jctldsocket" />

    <!-- server definitions for the user -->
        <server name="tadpole" pubkey="tadpole.pub">
            <inet hostname="tadpole" port="5020" />

'inet' servers may also control what is allowed to connect, for example:

    <inet hostname="tadpole" port="5020">
        <allow hostname="tadpole" />
        <deny hostname="" />
        <allow network="" netmask="" />

When jctld accepts a connection, it will search through the list of allows/denys in the given order -- the above, for instance, will allow "" but deny "". The default behaviour is to allow; to change this insert a match-all 'deny' rule at the end:

    <deny network="" netmask="" />

The 'hosts' section lists jctld clients/hosts -- the machines that actually run jobs. As with 'servers', jctld will search this section for a matching entry (by hostname). Host definitions are only relevant for jctld -- jcli does not use them. For example:

        <host name="tadpole" privkey="tadpole-cli.key" pubkey="tadpole-cli.pub" />

Controlling jobs

Once jctld is up and running (on at least one machine), the 'jcli' command can be used to control jobs. If only one server is found in a configuration file, jcli will attempt to connect to it. If multiple servers are found, one must be selected on the command-line with '-s, --server'. If no configuration is used, the server options (including public key) can be specified on the command-line; user information must come from a configuration file, however (currently). For example:

   bash$ jcli -s tadpole:5020

If started without any additional command-line arguments, jcli goes into interactive mode and displays the prompt as shown (once connected to the server). Otherwise it attempt to execute the command given, with any optional arguments. For example:

   bash$ jcli version
   server pieship (jctld) version 0.9.1, user frmb

Many of the commands need a way of specifying jobs, both single jobs and groups of jobs. To this end, jctld supports fairly rudimentary job specifications, 'jobspec's. These are built up in the following way:

     [ user | group ] [ $name ] [ @host ] [ :pid ]

For example:

   frmb   all 'frmb's jobs (user)
@tadpoleall jobs on host 'tadpole'
tunausers@pi02   all jobs belonging to users in the group 'tunausers' on host 'tadpole'
@tadpole:25963job with PID 25963 on host 'tadpole'
frmb$viall jobs called 'vi' belonging to user 'frmb'
$xtermall jobs called 'xterm'
$bash@pi25all jobs called 'bash' on host 'pi25'

The supported commands are as follows:

CommandDescriptionPrivileges required
   help [command]   display command summary, or more information on a particular command   
versionrequest version from server
clistlist all active hosts (job-capable machines)
ulistlist users with brief job information'list'
jlist [jobspec]list jobs, all jobs if no specification given'list'
jsummary [jobspec]job summary, of all jobs if no specification given'list'
usummary [jobspec]job summary by user, of all jobs if no specification given'list'
jdetail [jobspec]detailed job summary, of all jobs if no specification given'list'
suspend <jobspec>suspend job(s), sends SIGSTOP to them'suspend'
resume <jobspec>resume job(s), sends SIGCONT to them'resume'
renice <jobspec> <nicespec>renice a job (alter its scheduling priority), range within that supported by the system'renice', 'hinice' to set higher-than-normal priority
kill <jobspec> [signal]kill job(s), sends SIGTERM if no 'signal' given'kill'
start <jobspec> <command> [args ...]start job, 'command' and 'args' will be passed to the user's configured 'jobshell' if set, otherwise executed outright; 'jobspec' is used to specify a user and/or host'start'
refreshrefresh jobs'reload'
reloadask server/hosts to reload configuration and clear caches'reload'
The following commands are not yet supported:
lock <jobspec> <lock-release>lock the state of a process to prevent others changing, 'lock-release' specifies who (user/group) can release the lock (in addition to lock setter)'lock'
unlock <jobspec>unlock the state of a process'lock', or specified in lock's 'lock-release'

Interactive mode uses GNU's readline library, so command-line editing should be sane. The output generated by commands is particular to the type of client -- only a stdio client is supported currently (later ones may include fluffy graphical applications for monitoring, etc.). A RAPP client is currently under construction. For example:

    pieship> clist
    host-name     njobs  idle    load      memory       swap      last-contact
    pi02          213    48.79   0.97     873/1009      57/854    18    
    pi03          65     0.17    2.45     997/1009      25/854    18    
    pi04          14     0.20    2.41     963/1009      12/854    18    
    pi05          13     49.57   1.00     872/1009      12/854    18    
    pi06          71     0.15    2.13     947/1009      29/854    18    
    pi07          17     0.23    2.34     938/1009      11/854    17    
    pi08          14     49.82   1.00     853/1009      17/854    17    
    pi09          15     0.22    2.40     976/1009      11/854    17    
    pi10          12     49.68   0.99     978/1009       0/854    17    
    pi11          11     49.75   1.00     821/1009       0/854    17    
    pi13          14     49.10   1.72     998/1009       0/854    14    
    pi14          15     0.18    2.33     998/1009       0/854    13    
    pi15          18     0.42    2.08     975/1009       0/854    13    
    pi16          15     0.23    2.24     990/1009       0/854    13    
    pi17          3      49.87   0.99     774/1009       0/854    13    
    pi18          19     49.74   1.03     795/1009       0/854    12    
    pi19          15     0.27    2.61     967/1009       0/854    12    


jctld protocol

The protocol used over TCP or UNIX sockets deals in whole-line chunks. For UDP, the equivalent lines are sent as UDP packets (with some exceptions).


Download jctld

Several additional libraries are needed to compile jctld. These are: libstatgrab, libreadline, libexpat, and libgcrypt.

Last modified: Tue Aug 29 16:29:28 2006 by Fred Barnes.