JWalk software testing tool suite

Lazy systematic unit testing for agile methods

You are here: JWalk Home / Weblog / 2007 Entries /
Department of Computer Science

The JWalk Weblog, 2007

This is the JWalk weblog for 2007! It charts the rise of JWalk from the prototype version 0.7 to the much more successful 0.8 version, which went live on 9th March. This, the first widely-publicised version, introduced robust empirical techniques for determining whether an object's state changed, or did not change, rather than estimates based on the signatures of methods. It saw the first journal publication devoted to JWalk, starting with a set of mixed reviews (in January) and ending with the Springer publication in Automated Software Engineering in December.

Monday, 3 December 2007

Finally, we know the volume and number in Automated Software Engineering, in which the large JWalk article will appear in print (it has already been published by SpringerLink in their Early View online system - see 8 September this year). The final published citation for this paper is: A J H Simons, JWalk: a tool for lazy systematic testing of Java classes by introspection and user interaction, Automated Software Engineering, 14 (4), December, ed. B. Nuseibeh, (Boston: Springer, 2007), 369-418.

Thursday, 15 November 2007

An undergraduate student, Neil Griffiths, has been working on an integrated Java editor that combines this with JWalk. So far, the work has been slow, but he has now an architecture that allows for the usual kind of syntax-sensitive editing. Watch this project for news about a visual JWalk tool.

Wednesday, 24 October 2007

Prof. Mathai Joseph and colleagues from the Tata Research Development and Design Centre (Pune, India) visited us today. They showed some interest in JWalk; and reported on major challenges that Tata Consultancy Services were facing.

Wednesday, 17 October 2007

Today has finally brought some good news: I have managed to fix the problem in misclassifying functions that are secretly mutators for only some of the time. Now, I don't need to modify a secret transaction-count variable to signal mutations, because it no longer matters if a function appears to change from an observer to a mutator, on different invocations! JWalk now judges each test sequence individually, to determine whether methods in the prefix were observers or mutators, at the point they were executed. There is no longer any need for methods to be classified uniformly as mutating, or non-mutating, as before. I have made a new interim build of JWalk with this fix included.

The fix is to have two kinds of keys into the saved oracles. One is a full key for the actual method sequence, and the other is a short key for a shorter sequence that falls into the same equivalence-class (with no observers in the prefix). As before, JWalk first tries to access previously-stored oracle values using the full key, but falls back to using a short key, if it needs to predict this result. However, when confirming a new result, JWalk always stores this under the long key. This avoids the problem we had with later confirmations overwriting the earlier confirmations. This had been due to the use of a single key (the short key) for all sequences in the same equivalence class.

Monday, 1 October 2007

I came up with one approach that would allow JWalk to infer the correct algebraic status of methods. If a design-for-test condition is imposed that mutators must always modify something (such as a transaction-count, in the case of Wallet), then JWalk can always identify mutators correctly. This is slightly intrusive, but less bother than annotating all methods with their algebraic category. I would still like to find a better solution that did not require this design-for-test condition.

Friday, 28 September 2007

I have now replicated the strange behaviour reported by Arne-Michael, when testing his Wallet class. I have some ideas about this. I think at the moment the mutator method pay(int) is being misclassified as an observer in some early test cases, when by luck it fails to modify the state of the Wallet, after being asked to pay out too much; and then reclassified as a mutator in later test cases. I haven't worked out in detail what the consequences are for oracle confirmations, but it looks as if the later confirmations must be undoing the oracles learned for the early confirmations, which is why the same oracle set involving this method is confirmed repeatedly, upon retesting.

This is a new kind of issue not previously encountered. In early versions of JWalk, we used the method signatures alone to estimate whether a method was a mutator or an observer, supposing that programmers would adhere to clean programming styles (c.f. the style recommended by Eiffel). Later, we admitted the possibility of functional methods that modified the test class as a side-effect. This was because we could now determine independently whether the state of test objects had been modified, or not. Now, we have a new subtlety to the problem, which is that the same method can appear to be a mutator, or an observer, on different occasions. To fix this might require further annotation of methods, to indicate their intention; however, I am unhappy at that prospect, because one of the charms of JWalk is how it infers the specification automatically.

Tuesday, 25 September 2007

I had a message today from Arne-Michael Toersel (Fachhochshule Stralsund, Germany) to say that he'd discovered some unusual behaviour when testing a bank-account style of class called Wallet. JWalk seems to learn a set of test confirmations in the earlier test cycles, which fail to be useful in later cycles, where the tool requests confirmation of similar results again. Furthermore, after confirming these results in the later cycle, JWalk appears to forget the confirmations from the earlier test cycle, when the class is retested. This shouldn't happen! The only thing that should cause repeated re-confirmation is if the test class is producing random results.

Thursday, 13 September 2007

JWalk 0.8 Beta was on demonstration today at the poster session at the industrial and academic testing conference TaicPart '07, held in Cumberland Lodge, Windsor Great Park. Christopher Thomson and I had an A1 poster entitled: Lazy systematic unit testing: JWalk versus JUnit, in which we showed how it was possible to test up to two orders of magnitude more test cases using JWalk's automatic test case construction, than using an expert's manual test cases written for JUnit, for the same invested time and effort.

Crowds gathered round the JWalk stand; and several people promised that they would be downloading and experimenting with the tool. See the JWalk home page for a picture of this event! I may work out eventually how to select a random photo from this set. We were busy from 5.30 to 7.00pm, without a break.

Wednesday, 12 September 2007

I drove down to TaicPart in the small hours of this morning. This is a testing conference, equally devoted to academia and industry. The short paper by myself and Christopher Thomson appears in the proceedings: Lazy systematic unit testing: JWalk versus JUnit; and a poster describing the same piece of work will be on display tomorrow.

Just a word about here: Cumberland Lodge is a fantastic old building, donated by Queen Elizabeth the Queen Mother (wife of King George VI) to an educational foundation, for the building of international trust among young people. Well, we have delegates from as far afield as Texas and New Zealand. Also, the situation in Windsor Great Park is fantastic. Unfortunately, there has been an outbreak of foot-and-mouth disease in the nearby village of Egham, so we are forbidden to walk in the park, in case we spread the disease.

Saturday, 8 September 2007

Today, Springer Online published the first major JWalk article! This is now available from the Early View online section as: JWalk: a tool for lazy systematic testing of Java classes by introspection and user interaction, and will later appear in the December, 2007 printed version of the Springer journal Automated Software Engineering. My thanks to the copy-editing staff for the quick turnaround, after initial proofs were received. This is the first article to read, if you're interested in the details of how JWalk works.

Wednesday, 5 September 2007

Today, Peter Lamb demonstrated his extension to JWalk during the poster session for MSc projects. It works as advertised in his dissertation! One of the best parts was his detailed evaluation of how cycling the order of presentation of test values affects the overall testing process. It seems that doing this blind has the effect of constructing large numbers of initial test cases which fail (due to out-of-bounds values supplied at construction, for example). So, for some classes, testing using an explicit equivalence-partition approach is less effective than testing with the default value generation approach, which allows more, and longer test sequences to be constructed by the tool.

This seems to indicate that searching randomly for the ideal orderings of test inputs is much less effective than supplying these using custom generators. However, this requires more manual effort in designing the right kind of generator to test the equivalence partitions of the class, which goes slightly against the desire for automation. We think more work is needed on this, to find the right level of programmer interaction with the tool to provide suitable values from equivalence partitions.

Wednesday, 29 August 2007

Today, Peter Lamb handed in his dissertation on Extending the JWalk testing tool by addition of category partition testing. Peter has done a very good job here. The main testing assumption of the existing JWalk tool is state-based, namely that faults are mainly introduced by the unexpected interleavings of an object's methods with its state. Peter has considered the alternative testing assumption, that faults may be caused by method arguments being supplied from different equivalence classes, which result in different behaviour.

Peter's solution involves wrapping each standard JWalk test cycle inside an outer loop, which selects different sequences of test values chosen from equivalence classes. Peter has some default partitions and also an automated way of selecting high, low and mid-range values near, or on a boundary. Then, the sets of values are presented in different orders, on each subsequent test cycle. To make this work, a separate saved oracle has to be built for each presented order of values, so a clever indexing system was developed to prevent the number of combinations exploding.

Wednesday, 18 July 2007

Prof. Bashar Nuseibeh, the editor for the Springer journal Automated Software Engineering, has just mailed me with the good news that the first major JWalk article has now been accepted for publication! It will appear in the December, 2007 edition. This paper has been revised twice and has the title: JWalk: a tool for lazy, systematic testing of Java classes by design introspection and user interaction. It's a long paper, giving the background, the architecture of the tool, the test generation algorithms, the different testing modes, examples of generated tests, and a comparison with other current research into automatic specification inference and state-based object-oriented testing.

Monday, 18 June 2007

Today I received confirmation from Silvia Ceballos, Conference Publishing Services Editor for the IEEE Computer Society, that the short paper submitted by myself and Chris Thomson has been accepted for publication at TaicPart '07 and will appear in the IEEE proceedings. This is a shorter version of our original submission, now entitled: Lazy systematic unit testing: JWalk versus JUnit. It compares how much testing can be accomplished using JWalk, compared to an expert testing using JUnit. JWalk wins by two orders of magnitude! The JUnit expert nearly exercises the transition cover of the tested classes. JWalk tests all states and transitions to depth three, in less time than it took the expert to think up the test cases.

Tuesday, 5 June 2007

Aaargh! The long paper submitted to TaicPart '07 has been rejected in favour of a shorter submission. One of the referees was concerned that there might be an overlap with the longer paper submitted to Automated Software Engineering, which in any case hasn't yet been accepted. In fact, the two papers concentrate on quite different areas - one on the architecture of the tool, and the other on testing experiences using JWalk in competition with JUnit. Oh well, Chris Thomson and I shall have to rewrite the ten-page paper as a shorter one, with all the salient results in a single table. Must remember to cite submitted work as well as published work, in future.

Friday, 18 May 2007

Today an improved release of JWalk 0.8 Beta went live! This includes the bug-fix for reporting failed test cases at the end of the cycle, and also incorporates the improvement suggested by Dimitris Dranidis to the LazyOracle, so that both the last pass-result and fail-result may coexist in a testing cycle. Previously, only one of these results would be used (the pass-result, or fail-result, if the test had previously passed or failed). This saves re-training oracles to recognise a correct result again after incorrect results have also been confirmed. It was actually easy to offer this improvement, since the LazyOracle already saved pass- and fail-results separately. The other change is that failure to detect all states in the anticipated state-space now dispatches a DynamicAnalysis report before raising the StateSpaceException. This allows the human tester to see more easily which of the states have, and have not, been discovered so far.

Thursday, 17 May 2007

Finally I had some time to look at the problem with listing the failed tests again at the end of a test cycle. Test failures are certainly being indicated interactively, but the tool is supposed to list all failures, including those that were confirmed automatically, at the end of the cycle.

Eventually, I tracked the fault down to the last refactoring of the code (finished on 9 March). The dispatchTestFailures method was iterating correctly over all failed tests, but it was calling dispatchTestReport with a lower priority than expected, because the test-results had been automatically confirmed. Normally, this allows a ReportListener to choose to ignore the report (because it does not need manual verification). However, in this case, we do want the ReportListener to print the failed test again. The fix was simple: dispatch the TestReport without any indication of its having been confirmed. It is then treated just like an event for manual inspection, with a higher priority, so is displayed to the human tester.

Friday, 11 May 2007

The JWalk seminar yesterday went down so well, that I was invited to give a guest lecture at City College, to the second-year Java programmers learning about algorithms and data structures. I did a series of tests on the standard Stack example, showing the protocol-waling, algebra-walking and state-walking modes. We also did algebra-testing and state-testing, based on the oracle developed earlier. The latter tests large numbers of test-cases, predicted from the earlier algebraic information captured by the test oracle. The list of failed tests is still not being repeated at the end.

Thursday, 10 May 2007

I was in Thessaloniki today, visiting the South-East European Research Centre. I gave a seminar on JWalk, entirely conducted from the website pages and by running example tests. Everything worked much as expected. We found we could capture the output from the command-line utility version of JWalk inside a basic TextPad editor, which is another way of integrating JWalk with third-party tools. This worked for the exploring modes, but did not work with the interactive testing modes, for which we had to open a proper DOS command console and interact with the tool.

When performing testing with oracles, the failed test cases are all supposed to be listed again at the end of the test. I found that this list was not being displayed properly. This may be a fault in JWalk and I need to investigate it when I get back to Sheffield. Also, Dimitris Dranidis suggested an improvement to the way LazyOracle recalls previous saved results. At the moment, it remembers the last saved result for each test sequence. If it retained the correct and incorrect results independently, then it would not have to be retrained during mutation testing. An oracle could be constructed for the correct specification, then prompt the tester to confirm incorrect results after a mutation, then automatically check the corrected object. Previously, the correct answers had to be re-confirmed. This is an obvious piece of maintenance to perform on LazyOracle.

Wednesday, 2 May, 2007

Today, Ben Solway handed in his undergraduate project dissertation. This is the project in which we had prototyped the orthogonal predicate system for detecting object states, which I first introduced in JWalk 0.8 Alpha (see earlier blog for 23 January). As a result of this change, the completeness check after a state-space search had to be reconsidered. Ben's work deals with this issue, as described in the blog for 25th April. In brief, every detected predicate must have returned true or false at least once, during a state-space search, otherwise JWalk thinks that there are undetected states. We released a minor upgrade to JWalk 0.8 to reflect the new failure-detection policy.

A consequence of this change is that mutation-testing with fault-seeded objects may behave differently. Before, in JWalk 0.7, test reports would simply detect fewer states and the programmer would notice, for example, that a simple Stack never left its Empty state, if the stack counter was not incremented properly. Now, the fact that the predicate isEmpty never returns false is treated as a failure to discover the object's state-space. So, client programs should treat the raising of a StateSpaceException as an indicator of a possible fault with the tested class, rather than just the failure of JWalk to discover all of an object's state space. We will address this in a future release.

Wednesday, 25 April, 2007

Since October, undergraduate student Ben Solway has been developing a parallel version of JWalk 0.8, with the brief to design the new way in which the state space of the class is discovered. This resulted in the new strategy (reported below in March), in which JWalk uses the product of all outcomes of state predicates to predict the maximum number of distinct object states. For example, a ReservableBook may independently return true or false for its two predicates isOnLoan and isReserved, resulting in four distinct states: Default, OnLoan, Reserved and Reserved&OnLoan. As a consequence, when predicates are not independent, because they measure the same dimension, such as a Stack's predicates isEmpty and isFull, JWalk will continue to search for a possible state in which these predicates are both true. The only way to halt is to fail gracefully after a certain search depth is reached.

This affects how JWalk judges whether its state-space search has failed. Before, in JWalk 0.7, every predicate had to be disjoint, with the possibility of a single Default state in which no predicate returned true. Non-disjoint errors could be detected immediately and reported, during the state-space search. Now, there is no disjointness requirement and the completeness check can only be carried out once the search has terminated. The completeness check is now only approximate, since it cannot know which combinations of predicates should return true at the same time. The heuristic is that every predicate must have returned true or false at least once. If a predicate only returns one boolean value constantly, then clearly there is an intended state which has not been detected.

Friday, 20 April, 2007

Today, Chris Thomson and I submitted our finished JWalk paper to the TaicPart conference call for papers. The title is: "Lazy Systematic Unit Testing with JWalk". The abstract reads:

"Lazy systematic unit testing is a rapid method for testing and re-testing object-oriented classes, in which automatically generated functional test sets evolve in parallel with the intended design of the class. Tests are compiled from state-based specifications that are inferred on-the-fly from the tested unit, using hints from the programmer and predictive rules. When a class is modified, or extended by inheritance, the testing tool JWalk regenerates a test set from scratch, that is complete according to the chosen coverage criterion, automatically executing any old tests that are still valid, but constructing new tests that exercise all novel paths and changed specifications. When compared against expert testing with JUnit, the JWalk tool produces complete test sets more quickly and provides better state and transition coverage when retesting modified and extended classes, incrementally building test sets containing many hundreds of functional tests, for relatively little effort."

Friday, 13 April, 2007

Today I submitted an abstract for the forthcoming TaicPart conference (Testing: Academic and Industrial Conference, Practice and Research Techniques). The paper will be about a competition between JWalk and JUnit testing. The JUnit side will be covered by Chris Thomson, who helped to integrate JWalk with the IBM Eclipse environment earlier. This should be interesting!

Friday, 9 March, 2007

JWalk 0.8 Beta went live today, at 17.00 hrs! We are very pleased with this release, as it significantly improves on the performance of earlier JWalk versions. Mutation sequences are detected dynamically, according to whether the test object's state is really modified. Previously, they were predicted from void and non-void signatures. Algebraic exploration and testing use the new mutation sequences to drive the test object into new states; likewise the state discovery phase of state-based exploration.

The new strategy for inferring design states detects a novel state for each distinct boolean product vector obtained from the predicates defined for the test class. This allows predicates on orthogonal dimensions to overlap, while predicates partitioning the same dimension still work as expected, except that they never return true simultaneously. This means that searching must continue until the depth limit is reached. In the next minor release, we will allow such parameters to be set through the JWalker API.

Tuesday, 6 March, 2007

We have settled on a strategy that implements the execute() method as a template method. It is redefined only when major algorithmic changes occur, such as prior dynamic analysis of states, or loading and unloading oracles. The main subroutines inspectClass, executeTestSeries and executeTestCycle are shared by many classes. The latter is the only method that needs to be redefined for the different testing modes, for example, if different tallies are kept of successful and failed tests, or if the results are confirmed through oracles. The same executeTestSeries method now works for all of the different test strategies.

Another change between version 0.8 and 0.7 is in standardising on the meaning of discarded paths. Earlier, we realised that the previous algorithm actually counted the number of failed edges, rather than total number of discarded test sequences, and this varied in the algebraic and other approaches (because the algebraic approaches grew fewer edges in the first place). So, we have now made this all the same. The discarded test sequences now also include discarded paths containing observers in their prefixes, for the algebraic approaches. This yields apparently more discarded paths, since we are now also summing the pruned edges over all depths, rather than just counting the current pruned edges.

Thursday 3 March, 2007

The next version JWalk 0.8 Alpha is going through a major refactoring at the moment. All the different TestStrategy classes currently redefine the execute() method wholesale. This seems inelegant. The notion of growing test sets and evaluating them in cycles is so generic that it would be much better to be able to factor this out. The main differences are: mutation sequences only are grown during algebraic exploration and state-space discovery, compared to all sequences for protocol-exploration and state-based testing; the initial test cycle is the state cover for the state-based approaches and a set of constructors for the other approaches; and the state-based approaches refer to their starting state, while others do not.

As a basis for this, the core algorithms provided by ProtocolExplorer must be revisited to make them generic enough to accept testing strategies that refer to states, as well as those which do not. The algorithm for extending the previous test cycle to create the next cycle must be allowed to distinguish between all method sequences and mutator-method sequences. In the Alpha revision, we had two separate methods; but this is being replaced by one method and a predicate which decides whether old sequences are prunable. The isPrunable method returns true if a sequence terminated (raised an exception), but can be redefined for algebraic exploration to signal that sequences ending in observers may also be pruned. The predicate is sensitive to a state variable in the state-exploring modes, and prunes observer sequences only during the state-space discovery phase.

Monday 26 February, 2007

I finally finished the thorough-going revision to this website, in support of the older, stable JWalk 0.7 Beta. Earlier, I had rewritten the download page and the user guide page to accommodate the org.jwalk packaging of the tool. Over the week, I also added many of the old test examples to the examples page, including examples of exercising the Java kernel classes, examples of interleaving local and inherited methods, examples of testing third-party software and finally some mutation-testing, in which JWalk is trained on correct software and then oracles detect all faults in fault-seeded versions of the same class. That's right - all the faults were detected! It took me a while to fix the inside JWalk section of the site, since this required using javadoc with the right settings to pull in package-level information (as well as public information). I edited the customising page, to reflect the removal of reset() from the Generator interface, which makes customisation even simpler. I may add a few more custom generators to the supplied set with JWalk 0.8. And last of all, I updated this weblog, which had been left unattended for a while.

Friday 23 February, 2007

Only just finished the resubmission for Automated Software Engineering in time for this weekend's deadline. As part of the revision, I had a new idea, which was to judge how good JWalk is in terms of "oracle efficiency", that is, the percentage of testing that can be automated for each manual intervention. The figures are quite satisfying - as the oracle set grows, more and more of the testing process can be automated, until well over 90% of the cases are automatically validated on each longer cycle. Obviously, this reaches 100% if you retest to the same depth. This argument is worth pushing!

Friday 9 February, 2007

Finished re-structuring the introduction and writing the survey of other approaches (JCrasher, ROSTRA, SYMSTRA, Java Pathfinder; specification miners like DAIKON, Agitator; algebraic inference by Henkel and Diwan, etc.) Now, I need to present JWalk more succinctly, then give references to more examples of usage than before.

Monday 5 February, 2007

Finally free of exam scripts! Right now, I'm busy revising the JWalk paper for Automated Software Engineering. I'm about to upload a bunch of test examples on longer cases than the examples previously published, in support of this paper. I hope this will prove sufficient to answer the referees' concerns.

Thursday 25 January, 2007

Exam scripts have started pouring in - will have to suspend work on JWalk 0.8 for the moment.

Tuesday 23 January, 2007

A version of JWalk 0.8 Alpha was tested today. This is a major release and significantly improves on the functionality of version 0.7. The detection of observer and mutator sequences is based on whether the methods actually do, or do not, modify the state of tested objects, rather than on predictions made from the type signatures of the methods. This means that methods may return a result and also modify the object. Likewise, void-methods may happen to have no side-effects, in which case they are ignored in mutator sequences.

The state-space discovery algorithm has also been changed to support both orthogonal and non-orthogonal partitioning of the object's state space using natural predicates owned by the class. So, for example, a LibraryBook class may have predicates isOnLoan and isReserved, from which JWalk will infer four qualitatively different states, which are named: Default, OnLoan, Reserved and Reserved&OnLoan, corresponding to all possible products of boolean outcomes. I still need to clean up the website for the stable JWalk 0.7 Beta release, however. Will I be able to make the deadline for the revised JWalk paper?

Monday 15 January, 2007

I have discovered that JWalk's technique for predicting test results for equivalence-classes of test sequences bears a distant similarity with other work on partial order reduction. The work by Visser and co-workers on Java Pathfinder is interesting to compare with JWalk. This tool executes bytecodes like an imperative program, and tries to collapse the sets of execution traces using partial order reduction. Traces that converge allow the merging of states (from which they start). Essentially, JWalk does partial order reduction on the signatures of test-sequences, based on an assumption about the mutator/observer nature of the operations.

Thursday 11 January, 2007

I have found some different correspondences between JWalk's different exploring modes and the notion of state abstraction in the work of Tao Xie and his co-workers. His overall goal is similar: automated software testing in the absence of specifications. I found some similarities in state-abstraction with his earlier tool ROSTRA. This appears to have been subsumed in work on another tool SABICU, devoted to object-oriented unit testing. JWalk's protocol-walking and algebra-walking modes are similar to ROSTRA's "all method paths" and "all mutator paths" modes. JWalk's state-walking is quite different, depending on a partitioning of the class's variable space into design states, based upon natural predicates in the class's interface.

Monday 8 January, 2007

I have been finding some similarities between JWalk's exploring modes and the work by Christoph Csallner and Yannis Smaragdakis on their tool JCrasher. In JWalk's protocol-walking and algebra-walking modes, the tool signals exceptions, like JCrasher. JCrasher is a "robustness-testing" tool, and therefore expects software to be robust, and exceptions to be worthy of investigation. It uses heuristics to classify raised exceptions into genuine faults, or violations of the unit's preconditions. JWalk is a "conformance-testing" tool, so is interested in all results, whether exceptional or not. When trained with oracles, JWalk can also detect a semantic faults, which are not exceptions. The original JCrasher paper appeared in Software Practice and Experience.

Friday 5 January, 2007

Started working on JWalk 0.8 today. This next release will have a different rule for inducing the state-space and an improved rule for deciding the difference between (algebraic) observers and other kinds of operation. The trick is to exercise function-methods to determine whether they do, or do not have side effects. If they do, they must also be added to the set of mutator-methods. The new state-space rule will be based on the products of boolean values returned by predicates. This will allow programmers to behave in the most natural way, with regard to adding predicates to classes. At the moment, while no predicate is required for the Default state, child class predicates must explicitly partition the states of the parent class. If they don't, this is reported as a violation of mutual exclusion. The current rule is motivated by the desire to distinguish state refinement from ill-formed state predicates.

Back to earlier entries...
Regent Court, 211 Portobello, Sheffield S1 4DP, United Kingdom