Many software applications have “macro” capabilities, allowing users to record keystrokes, then replay them verbatim to save time performing repetitive tasks. Programming by demonstration (PbD) is a more sophisticated technique where the application translates the semantics (i.e. meaning) of users’ actions into programming language notation. The PbD engine generalizes to a functional level, rather than simply being rote sets of keystrokes. This can help nonprogrammers robustly automate their computer based tasks.
Most extant PbD systems have user actions based on physical phenomena such as maze traversal, but automating gaming tasks is of limited interest to most computer users. One user group that could benefit from task automation without programming is molecular biologists, analyzing large genetics datasets ("bioinformatics") but typically having no programming skills. Visual workflow systems help automate bioinformatics, but using empirical studies I show five significant barriers to their use.
I propose here that a domain-savvy PbD system can mitigate these barriers by inferring data analysis workflows from molecular biologists' Web browsing sessions. I call this technique Workflow by Demonstration (WbD). The success of this approach depends on replacing the typical physical phenomenon model of PbD with a biology-specific data model and hypertext interface. Emerging informatics standards ("Semantic Web" technologies) facilitate the use of common data models across different data providers on the Web. Many molecular biology resources are Web based, therefore this work implements Semantic Web technologies to facilitate WbD.
Biologists were given pen-and-paper workflow design tasks, revealing the types of data flow they intuitively understood. These defined the types of workflow "code" a WbD system should support, and the corresponding hypertext demonstration actions were modeled. A browser (Seahawk) implements these action to code mappings. User studies evaluating Seahawk show that biologists could 1) demonstrate Web based analysis for realistic tasks, 2) understand the automatically generated workflows, and 3) use them in the workflow environment Taverna. This suggests WbD is a viable technique for bioinformatics. Although the data model used was biology-specific, the underlying semantic technologies used are domain agnostic. Techniques described here may therefore be applicable to novice programmers in other domains.