# Possible PhD Project Outlines

This page describes in outline the kinds of research projects that I am currently interested in supervising for students who wish to work towards a PhD.  Any such work would be done within the Verification and Testing research group (the VT group), to which I belong, and so each of these projects fits into one or more of the main themes in which this group is interested.

These descriptions are only outlines, as an essential feature of any research work at this level is that the student has to develop the project into something that they own, rather than the supervisor.  Also, while outlines are given for the following project titles, these are not intended to be restrictive.  Thus, if potential students have already made some investigation into the background of these topics, and have identified related ideas that they might wish to pursue, then I would be interested to discuss these with them.

# Causality in the Data Flow Algebra

This project would primarily contribute to the formal systems development theme of the VT group, although it could be developed in a direction where it might also link to the software testing theme.

The Data Flow Algebra (DFA) is an approach that has been developed within this department to modelling the design of software systems.  One of its features is that it is structured into three layers (the topological, event and computational layers), and an important part of the basis of this project is the way in which the event and computational layers relate to each other.  Specifically, in the event layer choices between alternative sequences of actions are non-deterministic, and it is this feature that gives rise to the axioms
s1; (s2 | s3) = (s1 ; s2) | (s1 ; s3) and
(s1 | s2) ; s3 = (s1 ; s2) | (s1 ; s3).
In the early stages of developing a design this assumption of non-deterministic choice is reasonable, since it is unlikely to have been decided just how or where in a system the choices are to be made. As the design process progresses, though, so these decisions need to be fixed, in order to allocate the responsibility for making each choice to a specific process.  It would therefore be useful to extend the event layer notation so as to allow these design decisions to be documented, and the goal of this project is to investigate how such an extension would affect the DFA framework.

In particular, there are at least two, and possibly three, issues that need to be investigated.  One is that there seems to be a "natural" form of an event layer expression for any allocation of responsibility for a choice to a particular process. Thus, the form
s1 ; (s2 | s3)
could be taken as implying that the choice is made by the source processes of s2 and s3, whereas the form
(s1 ; s2) | (s1 ; s3)
could be taken as implying that the choice is made by the source process of s1.  Given that the axioms of the event layer must still hold, though, a consequence is that the allocation of determining process must be denoted explicitly.  This then suggest that there may be some sort of "well-formedness" criterion that can be defined, to indicate which of the various equivalent forms of an event layer expression naturally matches the decisions about determining processes.

The second issue is that decisions about which processes make which choices are clearly important for the computational layer of a DFA specification, as this layer will (amongst other things) need to specify for each process how it makes the required decisions.  What is not clear, though, is how this information from an extended event layer would be used within the computational layer: for instance, would it be best used in a forward direction (ie to determine the structure of the computation layer specification), or in a backwards direction (ie to check that the computation layer specification is consistent with the extended event layer)?  What is obvious, though, is that the precise way in which it is used will depend on the specification language being used for the computation layer, and so investigating this issue will almost certainly require comparing the approaches in several different computation layer specification languages.

The third issue is then that, having made design decisions about the allocation of causality within a system, these decisions would presumably affect the sets of tests that one would want to apply to that system.  At this stage it is not clear precisely how they would affect the testing, beyond the observation that one would probably need to test whether choices between the different possible behaviours of the system were actually being made in a way that was consistent with these decisions about causality.  How this might be done is still very much an open question.

# Components in the Data Flow Algebra

This project would primarily contribute to the formal systems development theme of the VT group, although it could be developed in direction where it might also link to the software testing or empirical software engineering themes.

The Data Flow Algebra (DFA) is an approach that has been developed within this department to modelling the design of software systems.  One of its features is that it is structured into three layers (the topological, event and computational layers), and an important part of the basis of this project is that currently the topological layer is defined in terms of a flat structure of processes.  In many respects this is an over-simplified model, and an obvious extension would be to allow a hierarchical structure, so that a process within the specification of one system could be a complete sub-system, that then had its own DFA specification.  Such an extension would then give rise to the need for consistency checking at the event layer, to ensure that the behaviour of a sub-system was consistent with the way in which it was being used at the next layer up in the hierarchy.

In the simplest case, such consistency rules would require an exact correspondence between the behaviour of a sub-system and the way in which it was used, but such a simple case is not necessarily appropriate to a system that is being developed using components-off-the-shelf (COTS).  Here, it is often the case that a system will not need to use all of the facilities that a COTS provides, and so the consistency rules will need to reflect this.  On the other hand, COTS may in their turn require the systems in which they are used to support certain patterns of behaviour that they can generate, and so the consistency rules would need to reflect these sorts of requirements as well.  Furthermore, there may be other kinds of requirements for the relative patterns of behaviour that have not yet been identified in this framework.

Thus, the goal of this project is to develop ways of characterising this notion of "patterns of behaviour", so as to allow the necessary rules for this consistency checking to be developed within the existing DFA framework.  It then seems fairly obvious that these rules should affect the way in which such components are tested, both in isolation and when they are integrated into larger systems, but it is certainly not clear yet precisely how these interactions between testing and formal consistency checking should be structured, and what the best balance should be between testing and formal checking.  As part of this, it might also be appropriate to conduct some empirical work within the framework of the Observatory, to examine how COTS are actually used in software development, and hence to establish what other requirements for consistency might be needed between COTS and the systems in which they are used.

# Integrating Communicating Stream X-Machine Systems and the Data Flow Algebra

This project would primarily contribute to the formal systems development theme of the VT group, although it could also be developed in a direction where it would have some links to the software testing theme.

Communicating Stream X-Machine Systems (CSXMS) and the Data Flow Algebra (DFA) are both approaches have been developed within this department to modelling the design of software systems, and particularly systems that could potentially be either concurrent or distributed, or both, although they are both general models in the sense that systems do not necessarily have to be either concurrent or distributed in order for them to be applicable.  In focusing on these domains, though, the two kinds of models take rather different approaches:  CSXMS are essentially constructed in a bottom-up fashion, by starting from individual X-machines and composing them into complete systems;  while a DFA model is essentially a top-down one, in that it starts from a description of how the processes within a system are intended to interact, and then models the required behaviour of individual processes within this framework.  While the precise relationship between top-down and bottom-up methods of working is still the subject of osme debate, a common perception is that in practice designers often wish to use both, switching from one to the other as appropriate.  This suggest that it would be useful to integrate these two models, to try to produce an approach that could support both top-down and bottom-up methods of working together, and this is the goal of this project.

At the very least, developing such an integration would require three aspects to be explored, and possibly four.  One aspect is that of supporting top-down working within the CSXMS approach, which requires (amongst other possibilities) a way of checking whether a CSXMS model is consistent with the topological and event layers of a DFA specification.  It appears that the simplest approach to this would probably be to work on a process-by-process basis, but what is not clear yet is whether this could be done most effectively by working within one of the X-machine or DFA notations, and translating the other to it, or whether there might be advantages in translating both to some third notation, such as CCS or another process algebra.

The second aspect is that of supporting bottom-up working within the DFA approach, by allowing the computational layer of a DFA specification to be expressed in terms of the X-machine components of a CSXMS.  Here, the main issue that is not yet clear is which of the different variants of the CSXMS notation would be most suitable for this purpose, and since these variants are still being developed, it could well be that the most appropriate approach might actually be to create some new variant of the CSXMS model for this purpose.

The third aspect is that there currently exist separate toolsets for CSXMS models and for DFA models, and so in principle an integration of the modelling approaches should also involve integration of the toolsets.  In practice, though, it is not appropriate for PhD projects to have software development as their main focus, and so this aspects has to be regarded as secondary to the theoretical development in the first two aspects.  On the other hand, the practical aspect of making the integrated modesl usable must not be ignored, and so it would be necessary for this project at least to aim to develop the requirements for such an integrated toolset, and possibly also to produce some prototypes as part of this process of establishing requirements.

The fourth aspect then stems from one of the main reasons for being interested in these kinds of models, which is that they have enabled very strong results to be obtained about the amount of testing that needs to be done in order to achieve complete testing of certain kinds of systems.  There is much more to be done in this direction, though, and one of the reasons for wanting to integrate these two kinds of model is the belief that such an integration should provide a good basis for extending the existing results in useful ways.  Thus, if satisfactory solutions can be obtained to at least some of the issues described above, then extending the work to examine the testing results would be one possible way of building on such solutions.

# Relational X-Machine Models

This project would primarily contribute to the formal systems development theme of the VT group, although it also has strong links to the software testing theme.

Communicating Stream X-Machine Systems (CSXMS) are an approach that has been developed within this department to modelling the design of software systems, which has at its core the use of X-machines to model individual processes.  This X-machine model defines a very precise structure for the relationships between inputs, outputs, state transitions and processing functions, but it then leaves very open the internal structures of the machine memory and the processing functions that operate over it.  Since all of the results that have been obtained for the testing of X-machines and of CSXMS are based on the assumption that one can completely test all of the processing functions, it is obviously important to have a systematic method for testing these.  This therefore suggests that there may be advantages in constraining the structures of the memory and processing functions, so as to facilitate the derivation of the required test cases.

The goal of this project is therefore to investigate the feasability of one particular approach to imposing such constraints, which is based on the semantic structures that underlie the specification language Z.  These semantic structures relie heavily on a relational model, and so it is suggested that the memory variables should be restricted to storing either primitive values or relations.  Then, in the most restricted form, the processing functions would in principle be limited to simple sequences of primitive operations on these, namely extracting tuples from relations or values from tuples, assigning new values to variables, or inserting, replacing or deleting tuples within relations.  In practice, though, it might be necessary (or at least convenient) to relax some of these restrictions a little:  for instance, by extending the relations to allow orderings for the tuples to be defined, so that individual tuples could be identified by their position within the ordering;  or by allowing processing functions to include conditional constructions as well as simple sequences.  A trickier aspect of this is whether processing functions shoudl also be allowed to include iterative constructions:  on the one hand it can be argued that an important part of the machine model is the use of the control states to represent the different stages in any iteration, but on the other hand machines that incorporate iterations that are modelled in this way may as a result not possess the stream property, which is important for the generation of test cases.

To explore these issues, it is envisaged that the research might involve using three different approaches in some combination.  One of these approaches will be to construct case studies, probably by adapting existing ones (of which there are many) rather than by having to create completely new ones.  Another approach will be to try to determine how the derivation of test cases for such processing functions would be affected by the different possibilities for these restrictions, which ideally should lead to the development of systematics methods for producing complete sets of test cases for the functions.  The third approach would involve extending the ideas from single X-machines to CSXMS, and then trying to determine how far they could be used to provide general semantic models for at least some of the design notations that are incorporated into the UML.  This third approach could, however, possibly involve a lot of work, and so if the first two approaches appeared to be producing useful results then following up this aspect fully could well need to become a separate project in its own right.

# Design Transformations and Communicating Stream X-Machine System Models

This project would primarily contribute to the formal systems development theme of the VT group, although it could also be developed in a direction where it would have some links to the software testing theme.

Communicating Stream X-Machine Systems (CSXMS) are an approach that has been developed within this department to modelling the design of software systems, which has at its core the use of X-machines to model individual processes.  One of the strengths of this approach is that, as well as modelling the design of any software system, these machine models can also be used on the one hand to define the specification for a system, and on the other to model in some detail the actual structure of the code that implements a system.

Of course, if the same kind of model is being used for each of these stages in developing a system (viz specification, design and implementation), then the different models need in some way to be consistent with each other, since they are all describing the same system.  Indeed, in practice it will usually be the case that the design model for a system will be developed from the model used for the initial specification of the requirements, and similarly that the implementation model wll be developed from the one used for the design.  This project is concerned with studying the steps by which this development process happens, where each step will involve some sort of transformation of the model.

Many of these transformations are well-known at an informal level, where they involve the use of what are often called "design patterns" to replace some aspect of a requirements model by a design that has been found by experience to be appropriate for meeting that particular kind of requirement.  At the level of formal models, though, the use of such transformations is not always as well defined:  in some simple cases the introduction of a design pattern corresponds to the mathematical process of a refinement step, but in many cases the transformations represent what one might describe as "sideways" steps, rather than the "forwards" steps that are implicit in the notion of a chain of refinements.

The aim of this project is therefore to explore the process of using design transformations, by analysing how different transformations could be represented in terms of CSXMS models.  There are various approaches possible to this:  a very formal approach might lead to what would effectively be a theory of refinement for CSXMS models, while a more practically oriented approach might lead to a way of classifying design patterns in terms of the kinds of transformations of the models that they involved, and hence the kinds of properties of the models that they preserve.