Fred's Home Page

Main
About me
Crumble
csh is bad
Debian
FredCam
Guppy
Hardware
Help out
Java-glossary
Job-control
KRoC/Linux (old)
Lambda
Life-stuff
Links
Linux
Mesh
Misc
Music
occam
occam upgrades
occam tutorial
OpenUPSd
Pictures
Programming
Projects (PhD)
Publications
Quick gdb
RAPP
RMoX
Software
UNIX crashcourse
UWG
UWGBuilder
WG / WGBuilder
XDM-Choose
XHPD

oScript: A proposal for an occam-like scripting language

F.R.M. Barnes and P.H. Welch
Computing laboratory, University of Kent, Canterbury. CT2 7NF
September 2003. Last modified December 2003

Abstract: The `Communicating Sequential Processes' (CSP) process algebra, as described by Hoare in [1], provides a powerful semantics for reasoning about concurrent processes and their interactions. As such, CSP has formed the basis of concurrency handling in several languages and language extensions. Most notably, occam [2], whose compiler strictly manages concurrency to prevent aliasing and race-hazard errors. For Java [3], JCSP [4] exists and provides a means of managing concurrent Java `Thread' objects -- abstracted into `CSProcess' objects. Mechanisms for channel communication and event alternation (external choice) are provided, of course. As well as occam and Java run-time systems, CCSP [9, 10] and C++CSP [11] provide libraries for C and C++ respectively.

Network distribution of JCSP-based Java programs is provided by `JCSP.net' [5, 8], enabling the easy construction of vast parallel processing systems. Unlike occam, Java (and JCSP) cannot enforce certain safety guarantees, aliasing in particular. Therefore, a certain amount of care is required on the part of the programmer to ensure that any object aliasing is safe. Automatic checking of Java is possible, but non-trivial -- commercial checking tools exist for JCSP.net, produced by Quickstone Technologies Limited. Quickstone also produce a version of JCSP.net for Microsoft's J#, `J#.NET', now included in the JCSP.net package. Similar support for the KRoC [6] occam environment is currently under development in the form of KRoC.net [7].

Connecting (composing) concurrent systems using these networking infrastructures is still somewhat specific, although various different `transport' layers can be supported. A more general abstraction is provided by `scripting' languages, in particular `bash' and `sh', that provide mechanisms for executing (typically UNIX) programs concurrently, using pipes to communicate data between programs. Scripting languages provide for rapid development of code, often being interpreted by a suitable (and sometimes complex) run-time system.

This document describes a simplistic occam-like scripting language, that aims to facilitate the composition of programs and processes using occam/CSP style concurrency. This initial proposal concentrates on scripting to `wire-up' components on the local machine. A longer-term aim is to provide a distributed and dynamic behaviour, in order to facilitate the rapid `wiring-up' of networked components. Target applications include, but are certainly not limited to: general scripting, internet-aware embedded devices, and business process modelling.


[ introduction | types | syntax | caveats | future work | references ]


1. Introduction and Motivation

This document presents a proposal for an occam-like scripting language, with CSP-based semantics. The primary aim of this language is to facilitate the easy composition of concurrent programs and processes. In the `sh' scripting language, for example, a pipeline of processes is easily expressed:

    cat input.file | sed -e 's/foo/bar/g' | tr '[]' '<>' | sort | uniq > output.file

Such process pipelines can also be viewed as networks of communicating processes:

Fig 1: process pipeline
Fig 1: `sh' process pipeline.

Unlike occam, the `channels' in this network are actually pipes (buffered streams). Furthermore, some processes (such as `sort') must completely buffer all input before any output can be produced -- that would be un-characteristic of an occam process.

It would seem reasonable that network diagrams of communicating `programs' (as shown above), can be translated into `sh' code for script-style execution. And in many cases, this is indeed true. For the above example, the reverse transformation is obvious. However, there are types of network (data-flow) which are easily designed but hard to implement in this `sh' scripting world. For example:

hard to implement process pipeline
Fig 2: a potentially desirable `sh' process pipeline.

The implementation problems come from the two processes `delta' and `merge' -- programming equivalent functionality in `sh' (or more probably `bash') is difficult. One solution is to use temporary files and sequential composition:

modified process pipeline
Fig 3: an implementable `sh' process pipeline.

Even this arrangement of processes is not entirely trivial to implement -- the `SEQ' can generally only follow a single process, or pipeline of processes. A working translation of this network (additionally removing the temporary files), could be:

    cat input.file | grep '^X' | sed -e 's/xxx/yyy/g' > tmpfile1
    cat input.file | grep -v '^X' | sed -e 's/foo/bar/g' > tmpfile2
    cat tmpfile1 tmpfile2 | sort | uniq > output.file
    rm tmpfile1 tmpfile2

This style of implementation is almost lock-step parallelism, but with a restriction that the parallelism is a process pipeline. More accurately, however, this implementation is a sequential program with a limited form of parallelism.

The scripting language proposed here would allow the network shown in Fig 2 to be implemented easily. As an example, this program could be written as:

    #! /usr/local/bin/oscript --c-style

    stream a, b, c, b2, c2, d, e, f
    par {
        exec ("cat input.file", null, a!, null)
        delta (a?, {b!, c!})

        stream local
        par {
            exec ("grep '^X'", b?, local!, null)
            exec ("sed -e 's/xxx/yyy/g'", local?, b2!, null)
        }

        stream local
        par {
            exec ("grep -v '^X'", c?, local!, null)
            exec ("sed -e 's/foo/bar/g'", local?, c2!, null)
        }

        merge ("\n", {b2?, c2?}, d!)
        exec ("sort", d?, e!, null)
        exec ("uniq", e?, f!, null)
        write (f?, "output.file")
    }

Although powerful, this is a slightly cumbersome representation -- and scripting languages are designed to be quick. As such, a special syntax for `pipeline' construction could be introduced:

    #! /usr/local/bin/oscript --c-style

    pipeline ("cat input.file" || < \
        ("grep '^X'" || "sed -e 's/xxx/yyy/g'), \
        ("grep -v '^X'" || "sed -e 's/foo/bar/g') \
        > || "sort" || "uniq" || write (_?, "output.file"), null, null, null)

A formal definition of how processes are constructed in a pipeline like this is given in the syntax below. It is also the case that the `delta' and `merge' processes are created and placed automatically, through the use of `<' (for delta) and `>' (for merge).

The two code examples above are the most expanded and succinct versions. A realistic implementation would probably fall somewhere in-between, for example:

    #! /usr/local/bin/oscript --c-style

    stream a, b
    par {
        exec ("cat input.file", null, a!, null)

        # delta and merge pipe-lines
        pipeline (< ("grep '^X'" || "sed -e 's/xxx/yyy/g'"), \
            ("grep -v '^X'" || "sed -e 's/foo/bar/g'")>, \
            a?, b!, null)

        # remaining process pipeline
        pipeline ("sort" || "uniq" || write (_?, "output.file"), b?, null, null)
    }

2. Type System

The type-system of oScript is similar to occam's, supporting basic-types and channel-types. Additionally, three other types are supported: tuples, streams and processes.

2.1. Tuples

Tuples provide a form of anonymous record type for oScript. This also allows for limited type polymorphism -- i.e. channels of an arbitrary tuple whose type is checked at run-time. The cost of any run-time type-check will be constant (based on a type-hash, rather than equality of a type-tree).

The type structure of tuple channel or variable need not remain the same; it can change at run-time as required by the application.

The following shows an example code fragment demonstrating the use of tuple types:

    chan tuple c, d, e
    par {
        seq {
            c ! 42
            c ! "hello, world!"
        }

        string x; int y
        seq {
            c ? y
            c ? x
            d ! [y,x]
        }

        tuple v
        seq {
            d ? v
            v[0] = 36
            e ! v
        }

        string x; int y
        seq {
            e ? [y,x]
        }
    }

The above code fragment also gives an example of channel communication in oScript. The special `tuple' type keyword is used in declarations of tuple channel-protocols and tuple variables. Tuples are formed using square brackets -- as a constant record would be formed in occam. In cases where a tuple channel is used with a non-tuple variable/expression, oScript assumes a one-element tuple.

... to be completed ...

2.2. Streams

The `stream' type is roughly equivalent of a buffered counted byte-array channel in occam, to be implemented using operating-system `pipes'. Streams are used to transfer data between executing programs (started using `exec' or `pipeline') and/or built-in processes.

Like channels in occam, streams have two conceptual ends -- a reading end and a writing end. The declaration of a stream, for example:

    stream c

declares two ends, referred to as `c?' (reading) and `c!' (writing).

The unit of communication in a `stream' is the `string'. When a stream is used for input, the run-time system will attempt to gather an entire line of characters before the input completes. This generally makes more sense for a script that is processing the text in some way.

The run-time system will provide three default streams, `stdin', `stdout' and `stderr'. These are the three standard streams available to most applications. If these streams are unavailable, the run-time system will provide `null' alternatives.

2.3. Processes

... to be completed ...

3. Syntax

The general rules for program layout follow that of other scripting languages:

  1. Comments begin with `#' and extend to the end of the line.
  2. Statements are separated using line-breaks or the semi-colon `;' character.
  3. Lines may be broken using a backslash `\'.
  4. Blank lines are ignored.

The following table gives a BNF style grammar for the proposed scripting language, ignoring blank-lines, comments and continuations. Line-breaks and semi-colons are deemed to be the `break' token. Furthermore, `{' and `}' may provide or absorb a `break' on either side, in order to fit the grammar. Replicated structures are given as "{n,token}", where `n' is the minimum number (usually 0 or 1) and `token' is the replicated token -- separated with `break's. A non-keyword sequence of characters produces the `name' token, and double-quoted sequences (allowing all characters with backslash escapes) produce the `string' token.

TokenExpansion(s)Description
start::=processscript start symbol
a.param::=streamstream parameter
|nameother named parameter
|stringstring literal
a.params::=(null)empty parameter list
|a.paramsingle/last actual parameter
|a.param  `,'  a.paramsparameter list
builtin::= `delta'  `('   stream  `,'  stream.array  `)'split a single stream into multiple streams
| `exec'  `('   string  `,'   stream  `,'   stream  `,'   stream  `)'execute program connected to three streams
|`merge'  `('   stream.array  `,'  stream  `)'merge multiple streams into a single stream
|`pipeline'  `('   pipeline  `,'   stream  `,'   stream  `,'   stream  `)'      process pipeline
|`read'  `('   string  `,'   stream  `)'read file to stream
|`write'  `('   stream  `,'   string  `)'write stream to file
c.process::=`par'  `{'parallel processes
    {1,process}
`}'
|`seq'  `{'sequential processes
    {1,process}
`}'
declaration      ::=    type  namesvariable/stream declarations
|definitionnamed process definition
definition::=name  `('  f.params  `)'process definition
`{'
    process
`}'
f.name::=nameordinary name
|name`?'stream input name
|name`!'stream output name
f.names::=f.namesingle/last formal name
|f.name  `,'  f.nameslist of formal names
f.param::=type  f.namesformal parameter(s)
f.params::=(null)empty parameter list
|f.paramsingle/last parameter
|f.param  `,'  f.paramsparameter list
names::=namesingle/last name
|name  `,'  nameslist of names
pipe.item::=stringexecute program
|processparallel process
|`<'  pipelines  `>'delta-in and merge-out of sub pipelines
pipeline::=pipe.itemsingle/last pipeline component
|pipe.item  `||'  pipelinepipeline of processes
pipelines::=`('  pipeline  `)'single/last pipeline
|`('  pipeline  `)'   `,'  pipelineslist of pipelines
process::=c.processprocess constructor
|builtinpre-defined process
|{1,declaration} break
process
declaration(s) in scope for the body of the process
|name  `('  a.params  `)'instance
|name  `?'  nameinput
|name  `!'  a.paramoutput
|name  `='  a.paramassignment
stream::=name `?'input stream identifier
|name `!'output stream identifier
|`null'null stream
|`_?'anonymous input stream identifier (for pipelines)
|`_!'anonymous output stream identifier (for pipelines)
stream.array::=`{'  streams  '}'array of streams
streams::=streamsingle/last stream
|stream  `,'  streamslist of streams
type::=`stream'stream (pipe) type
|`string'variable-length string type

The grammar given is sufficient for expressing arbitrary configurations of processes and `streams'. More will be required, however. In particular, we would wish to allow connections to JCSP/KRoC network-aware applications, via a suitable `channel name server' (CNS) or other directory-style service.

One CSP feature that is currently missing is support for alternation (external choice). This will be added (and will follow in a similar style to `seq' and `par'), once these preliminary ideas have been investigated and some of it implemented.

4. Caveats

... to be completed ...

5. Future Work

The only communication mechanism provided in the language, so far, is a `stream'. This is largely for convenience -- most operating-systems support a similar concept of `streams', often implemented as `pipes'. Controlled buffering is not guaranteed, however. A related OS feature, the socket-pair, does have controllable buffering, that the application can make use of (if required).

A much tigher (and largely internal) communication mechanism will also be provided. Specifically one that has CSP synchronisation semantics (and zero buffering). In order to communicate with networked occam and Java applications, specific communication handling will be required -- to decode occam's PROTOCOLs and Java's serialised objects, or to generate items of the same. Importantly, controllable synchronisation will remain.

References

[1]      `Communicating Sequential Processes', C.A.R. Hoare, 1985. Published by Prentice-Hall. ISBN: 0-13-153271-5.
[2] `occam2 Reference Manual', Inmos Limited, 1988. Published by Prentice-Hall. ISBN: 0-13-629312-3.
[3] `The Java Language Specification', B. Joy, J. Gosling and G. Steele, 1996. Published by Addison-Wesley. ISBN: 0-20-163451-1.
[4] `CSP for Java (JCSP)', P.H. Welch, 1999. Available at http://www.cs.ukc.ac.uk/projects/ofa/jcsp/.
[5] `CSP Networking for Java (JCSP.net)', P.H. Welch, J.R. Aldous and J. Foster, 2002. In proceedings of `Computational Science - ICCS 2002', P.M.A. Sloot, C.J.K. Tan, J.J. Dongarra and A.G. Hoekstra (editors), Springer-Verlag, `Lecture Notes in Computer Science' vol. 2330, April 2002. Pages 695-708. ISSN: 0302-9743.
[6] `The KRoC Home Page', P.H. Welch, J. Moores, F.R.M. Barnes and D.C. Wood, 2000. Available at http://www.cs.ukc.ac.uk/projects/ofa/kroc/.
[7] `Flexible, Transparent and Dynamic occam Networking with KRoC.net', M. Schweigler, F.R.M. Barnes and P.H. Welch, 2003. In proceedings of `Communicating Process Architectures 2003', IOS Press, `Concurrent Systems Engineering' vol. 61, September 2003. Pages 199-224. ISSN: 1383-7575. ISBN: 1-58603-381-6.
[8] JCSP Network Edition, Quickstone Technologies Limited. http://www.quickstone.com/xcsp/jcspnetworkedition/.
[9] `The Design and Implementation of occam/CSP Support for a Range of Languages and Platforms', J. Moores, Ph.D. thesis, Computing Laboratory, University of Kent, Canterbury, Kent, UK.
[10] `CCSP -- a Portable CSP-based Run-time System Supporting C and occam', J. Moores, 1999. In proceedings of `WoTUG-22: Architectures, Languages and Techniques for Concurrent Systems', IOS Press, `Concurrent Systems Engineering' vol. 57, April 1999. Pages 147-168. ISSN: 1383-7575. ISBN: 90-5199-480-X.
[11] `An Introduction to the Kent C++CSP Library', N.C.C. Brown and P.H. Welch, 2003. In proceedings of `Communicating Process Architectures 2003', IOS Press, `Concurrent Systems Engineering' vol. 61, September 2003. Pages 139-156. ISSN: 1383-7575. ISBN: 1-58603-381-6.
Last modified: 2003-12-17 01:03:56.000000000 +0000 by Fred Barnes [ds] [plain]
Page generated: Sun Apr 28 11:39:34 2013
Valid XHTML 1.0! Valid CSS!