jctld -- A Job Control Daemon
[ about |
configuration |
control |
protocol |
download ]
Latest is jctld 0.9.6
jctld is a job/process control system for clusters of machines. It appeared due to the need for a semi-capable job control system, with the
the sorts of features that we need/want on our cluster (the TUNA pi-cluster),
and that's free software. The desired features:
- TCP, UDP or UNIX-socket operation
- Fairly lightweight in terms of CPU and memory
- See what jobs are running on machines
- Be able to start/kill/suspend/resume/renice jobs on machines
- Basic node monitoring (e.g. uptime/load)
- Security (trust by private-public key pairs)
- Anything else that we might want to add later on..
The whole system comprises mostly of two programs -- 'jctld' and 'jcli'. Machines are divided mostly into three categories:
- Clients: the nodes in a cluster that run jobs and other things, including 'jctld'
- Servers: machines that monitor and control clients, using 'jctld'
- Users: machine from which a user controls jobs and other things, using 'jcli'
Configuration is handled using a couple of XML files, and geared such that clients and servers (and if needed, users) can share the same config file.
Public/private key pairs are used to identify clients and users to servers, and vice versa; both programs take a '--genkeypair' argument that
can be used to generate keys. In a real-world installation (as it's aiming for), a user will create themselves a key-pair, which an administrator adds
to the server config.
Access and control are handled through sets of privileges, which can be assigned to single users or groups of users. The privileges specify what the
associated user or group is allowed to do, or not allowed to do -- 'deny's are searched before 'allow's. The way it works should be apparent in the
sample configuration shown below.
The general structure of a configuration file as as follows:
<?xml version="1.0" encoding="iso-8859-1"?>
<jctld>
<keys path="/path/to/keys/"/>
<!-- -->
<!-- -->
<!-- -->
<!-- -->
<!-- -->
</jctld>
Because the configuration can be for a client, the top-level element may be <jcli> rather than <jctld>. The <keys>
entry specifies where keys may be found, this is prepended to keyfiles unless the latter starts with a leading slash or tilde (interpreted as the invoking user's home-directory).
The 'users' section defines the known users. For an end-user (using 'jcli'), this might only contain a single entry for themselves. For example:
<!-- -->
<users>
<user name="frmb" realname="Fred" pubkey="frmb.pub" priv="superuser" />
<user name="phw" realname="Peter" pubkey="phw.pub" />
<user name="pssc" realname="Phill" pubkey="pssc.pub" priv="superuser" />
<user name="ats" realname="Adam" pubkey="ats.pub" />
<user name="mig" realname="minimum intrusion grid" pubkey="mig.pub" unixuser="miguser" jobshell="/bin/bash -c" />
</users>
<!-- -->
<users>
<user name="frmb" realname="Fred" privkey="frmb.key" />
</users>
As far as controlling jobs is concerned, at least some users need to be real users (i.e. the 'name' attribute matches the system username, or the 'unixname' attribute does).
Other users may be entirely artificial (e.g. scripts and other automated things). The "mig" user above additionally has a 'jobshell' attribute, indicating that bash should be
used to interpret commands (normally this might be some useful local wrapper script).
The 'groups' section defines logical groups of users -- unrelated to user-groups on unix systems. The main use of groups is for assigning privileges to whole sets of users,
and is only to jctld (not jcli). For example:
<groups>
<group name="tunausers" description="tuna-project users" privs="tunauser">
<user name="frmb" />
<user name="phw" />
<user name="ats" />
</group>
</groups>
This gives all 'tunausers' (the three listed) privileges specified by 'tunauser'. Like groups, privileges are only relevant for jctld. The above identifies two privileges, 'superuser' and 'tunauser',
for example:
<privileges>
<priv name="superuser">
<grant operations="*">
<!-- -->
<group name="allusers" />
</grant>
</priv>
<priv name="tunauser">
<grant operations="suspend,resume,list">
<!-- -->
<group name="tunausers" />
</grant>
</priv>
<priv name="monitor">
<grant operations="list">
<!-- -->
<group name="allusers" />
</grant>
<restrict operations="start,kill,suspend,resume">
<!-- -->
<group name="allusers" />
</restrict>
</priv>
<priv name="@self">
<grant operations="start,kill,suspend,resume,list" />
</priv>
</privileges>
The penultimate set of privileges defined, 'monitors', restricts any user with that privilege to listing jobs only (assuming they are not in 'allusers'). A special
privilege group '@self' defines to what extent users control their own jobs -- restrictions will always override this however.
The 'servers' section lists jctld servers, to which users and clients/hosts connect. When initialising, 'jctld' will search this section for
an entry matching its own hostname (overridden with '-N, --name'). Server definitions may appear in all configurations. For example:
<!-- -->
<servers>
<server name="tadpole" privkey="tadpole.key">
<inet hostname="*" port="5020" />
<unix path="/tmp/.jctldsocket" />
</server>
</servers>
<!-- -->
<servers>
<server name="tadpole" pubkey="tadpole.pub">
<inet hostname="tadpole" port="5020" />
</server>
</servers>
'inet' servers may also control what is allowed to connect, for example:
<inet hostname="tadpole" port="5020">
<allow hostname="tadpole" />
<deny hostname="192.168.16.20" />
<allow network="192.168.16.0" netmask="255.255.255.0" />
</inet>
When jctld accepts a connection, it will search through the list of allows/denys in the given order -- the above, for instance, will allow "192.168.16.21" but deny "192.168.16.20". The default
behaviour is to allow; to change this insert a match-all 'deny' rule at the end:
<deny network="0.0.0.0" netmask="0.0.0.0" />
The 'hosts' section lists jctld clients/hosts -- the machines that actually run jobs. As with 'servers', jctld will search this section for a matching entry (by hostname). Host
definitions are only relevant for jctld -- jcli does not use them. For example:
<hosts>
<host name="tadpole" privkey="tadpole-cli.key" pubkey="tadpole-cli.pub" />
</hosts>
Once jctld is up and running (on at least one machine), the 'jcli' command can be used to control jobs. If only one server is found in a configuration file, jcli will attempt to
connect to it. If multiple servers are found, one must be selected on the command-line with '-s, --server'. If no configuration is used, the server options (including
public key) can be specified on the command-line; user information must come from a configuration file, however (currently). For example:
bash$ jcli -s tadpole:5020
jcli>
If started without any additional command-line arguments, jcli goes into interactive mode and displays the prompt as shown (once connected to the server). Otherwise it attempt to execute
the command given, with any optional arguments. For example:
bash$ jcli version
server pieship (jctld) version 0.9.1, user frmb
bash$
Many of the commands need a way of specifying jobs, both single jobs and groups of jobs. To this end, jctld supports fairly rudimentary job specifications, 'jobspec's. These are
built up in the following way:
[ user | group ] [ $name ] [ @host ] [ :pid ]
For example:
| | frmb | | all 'frmb's jobs (user) |
| @tadpole | | all jobs on host 'tadpole' |
| tunausers@pi02 | | all jobs belonging to users in the group 'tunausers' on host 'tadpole' |
| @tadpole:25963 | | job with PID 25963 on host 'tadpole' |
| frmb$vi | | all jobs called 'vi' belonging to user 'frmb' |
| $xterm | | all jobs called 'xterm' |
| $bash@pi25 | | all jobs called 'bash' on host 'pi25' |
The supported commands are as follows:
| Command | | Description | | Privileges required |
| | help [command] | | display command summary, or more information on a particular command | | |
| version | | request version from server | | |
| clist | | list all active hosts (job-capable machines) | | |
| ulist | | list users with brief job information | | 'list' |
| jlist [jobspec] | | list jobs, all jobs if no specification given | | 'list' |
| jsummary [jobspec] | | job summary, of all jobs if no specification given | | 'list' |
| usummary [jobspec] | | job summary by user, of all jobs if no specification given | | 'list' |
| jdetail [jobspec] | | detailed job summary, of all jobs if no specification given | | 'list' |
| suspend <jobspec> | | suspend job(s), sends SIGSTOP to them | | 'suspend' |
| resume <jobspec> | | resume job(s), sends SIGCONT to them | | 'resume' |
| renice <jobspec> <nicespec> | | renice a job (alter its scheduling priority), range within that
supported by the system | | 'renice', 'hinice' to set higher-than-normal priority |
| kill <jobspec> [signal] | | kill job(s), sends SIGTERM if no 'signal' given | | 'kill' |
| start <jobspec> <command> [args ...] | | start job, 'command' and 'args' will be passed to the user's configured
'jobshell' if set, otherwise executed outright; 'jobspec' is used to specify a user and/or host | | 'start' |
| refresh | | refresh jobs | | 'reload' |
| reload | | ask server/hosts to reload configuration and clear caches | | 'reload' |
|
| The following commands are not yet supported: |
|
| lock <jobspec> <lock-release> | | lock the state of a process to prevent others changing, 'lock-release'
specifies who (user/group) can release the lock (in addition to lock setter) | | 'lock' |
| unlock <jobspec> | | unlock the state of a process | | 'lock', or specified in lock's 'lock-release' |
| .. | | .. | | .. |
Interactive mode uses GNU's readline library, so command-line editing should be sane. The output generated by commands is particular to the type of
client -- only a stdio client is supported currently (later ones may include fluffy graphical applications for monitoring, etc.). A RAPP client is
currently under construction. For example:
pieship> clist
host-name njobs idle load memory swap last-contact
--------------------------------------------------------------------------
pi02 213 48.79 0.97 873/1009 57/854 18
pi03 65 0.17 2.45 997/1009 25/854 18
pi04 14 0.20 2.41 963/1009 12/854 18
pi05 13 49.57 1.00 872/1009 12/854 18
pi06 71 0.15 2.13 947/1009 29/854 18
pi07 17 0.23 2.34 938/1009 11/854 17
pi08 14 49.82 1.00 853/1009 17/854 17
pi09 15 0.22 2.40 976/1009 11/854 17
pi10 12 49.68 0.99 978/1009 0/854 17
pi11 11 49.75 1.00 821/1009 0/854 17
pi13 14 49.10 1.72 998/1009 0/854 14
pi14 15 0.18 2.33 998/1009 0/854 13
pi15 18 0.42 2.08 975/1009 0/854 13
pi16 15 0.23 2.24 990/1009 0/854 13
pi17 3 49.87 0.99 774/1009 0/854 13
pi18 19 49.74 1.03 795/1009 0/854 12
pi19 15 0.27 2.61 967/1009 0/854 12
pieship>
incomplete
The protocol used over TCP or UNIX sockets deals in whole-line chunks. For UDP, the equivalent lines are sent as UDP packets (with some exceptions).
incomplete
Several additional libraries are needed to compile jctld. These are: libstatgrab,
libreadline, libexpat,
and libgcrypt.
|