Co-Investigator / Researcher:
Prof. Pierre Dupont
(Université Catholique de Louvain)
Dr. Leon Moonen
(Delft University of Technology)
Dr. Andy Zaidman
(Delft University of Technology)
This is an EPSRC-funded project (EP/F065825/1) that is based at the Department of Computer Science at the University of Sheffield. It will run from 2009 to 2012. For further information please contact one of the investigators.
REGI follows on from the AutoAbstract grant (EP/C511883/1), which ran from October 2005 - December 2008.
Software systems pervade modern life; they control everything from fly-by-wire aircraft and financial transfer systems to ABS breaking systems in cars and cooking modes in microwaves. It is an accepted fact that, as software systems evolve and their requirements change, they become increasingly complex and difficult to maintain. Different developers carry out (sometimes conflicting) changes to address different bugs or add different features, and before long the original design of the system is neglected and it becomes impossible to understand exactly how the system operates.
One way around this problem is to keep a high-level design document that specifies exactly how the system is supposed to behave. A major benefit of developing such specifications along with the actual software itself is the fact that powerful testing and verification techniques exist that can be used to determine whether the actual system conforms to the specification. However, one major problem hampers the routine use of such specifications: as the software system evolves, they become tedious and time-consuming to generate and maintain separately from the code.
With this project we will produce a technique that addresses the above problem by automating the process of generating a specification. Given a system that has no specification (or only a partial one), our technique will analyse and probe the system by both running it and looking at its static structure. It will produce a complete document specifying the behaviour of the system as a collection of state machines. Often, depending on the facet of behaviour that is of interest, it is useful to display certain aspects at a greater level of detail than others (for a financial system for example, the developer might be interested in details sub-prime loan management and not the transaction processing system). For this reason, state machines that are generated will be presented as a hierarchy.
Devising a practical technique to automatically reverse engineer such specifications is challenging; the system in question needs to be suitably sampled to identify relevant behaviour, and the final specification will have to be a valid generalisation of these samples. How do we identify the set(s) of samples that can be used as a basis for generating the specification? How do we go about building an accurate specification of the system from this set of samples? One key realisation that will be investigated in this work is the fact that identical challenges to these arise in the field of grammar inference, where the challenge is to build a grammar (which can be represented by a state machine) from a sample of sentences in the target language. Grammar inference is a very mature field, with many very powerful techniques, but has never been linked to the similar problem of reverse-engineering specifications from software systems. This work will explore the relationship between the two fields of reverse-engineering and grammar inference, and will produce a set of approaches based on and extending existing techniques that, when combined, will produce a practical technique that generates complete and accurate hierarchy of state machines of a software system.
StateChum - our grammar inference framework