Polymorphic task decomposition is an extension of standard task-action grammars. It is a view of UI design that focuses on the actions triggered by user activity: on this, do that.
What follows is a reflection on polymorphic task decomposition, written back in 2003, looking at the original paper by Savadis, Paramythis, Akoumianakis and Stephanidis in 1995. Whilst I do take issue with parts of their paper, it is still an interesting and valid approach to the problem of modelling accessible/adaptable user interfaces. Their paper is listed in my bibliography page and I recommend reading it (requires access to the ACM Digital Library).
The comments were written to help crystallize both the similarities and the differences between the approach of Savadis and my own work.
Savadis’ idea is an extension of standard task-action grammars. It is a view of UI design that focuses on the actions triggered by user activity. The example he uses is “Delete File”. Having identified these core actions, they are then functionally decomposed into simpler activities, with a notation to express sequence, repetition etc. See Figure 1 (repeated from his paper) below.
The figure shows a tree hierarchy with Delete File at the top, splitting into two branches called Direct Manipulation and Modal Dialogue. Each of these also bifurcates: direct manipulation into Select File and Select Delete (which is associated with a sketch representing two ways to select); Modal Dialogue into Select Delete, Select File, and Confirm Delete (select delete is associated with a sketch representing two further ways to Select Delete).
Savadis would call this diagram multi-modal and polymorphic. It is multi-modal in that there are many ways to accomplish the same task: direct manipulation and a guided modal dialogue. It is polymorphic in that some actions decompose into alternate representations and associated interactions. It is at this polymorphic level that adaptability for specific user impairment is noted e.g. scanning versus point and click for ‘Select Delete’ in Figure 1. Not mentioned, is that this is also the case at modal level where a complex modal dialogue may be inappropriate for some users with cognitive impairment (especially on a small screen where the dialogue may obscure the graphical context of the action).
Within this task hierarchy, tasks are either abstract or concrete. Tasks that are abstract such as “Select”, are those that are capable of polymorphism. Savadis argues for a process of maximizing abstract tasks in the hierarchy to maximize polymorphic capability and through this, to maximize inclusion (my word). To that end, he recommends revisiting first-pass task hierarchy designs with a view to abstraction.
For each mode/style of a task, the properties of the user and the action are identified. See Figure 2 below (repeated from his paper). Of most interest to me is the list of properties that the user must possess, or that are important in the effective use of the mode. Examples classes of properties given are: “general computing expertise, domain-specific knowledge, role in an organizational context, motor abilities, sensory abilities, mental abilities”. Savadis does not define a finite list of properties, but simply states that “the broader the set of values, the higher the differentiation capabilities among individual end-users”. Which is my argument about my capability model (and my deeply unloved ATNAC paper) [which I will describe in a later blog post], except, I would argue that for any particular model of user capability, the property list is finite. But only for that model.
Savadis proposes a runtime model with three subsystems. See Figure 3 below. The user modelling subsystem approximates to my capability + preference models, plus inferred user capability at runtime. The design logic subsystem applies the rules identified for each node in the task hierarchy, guided by the user information server. The dialog control subsystem is responsible for rendering the interaction specified in the tasks selected by the decision making module.
Savadis goes on to discuss how this fits within the process of UI design and requirements capture, placing his work (the Unified Interface Design Method) in the context of Design Rational and Design Space Analysis.
3 How this relates to my work
How Polymorphic Task Decomposition (PTD [my acronym]) relates to my work really depends upon your starting point.
PTD starts with a verb. My work starts with a noun.
PTD starts with a verb because it starts with an implied data structure than is navigated and manipulated as a result of tasks, which may decompose into sub-tasks, undertaken on behalf of the user. The ‘Delete File’ example given in the paper is an example of this. This is a very structured analysis/structured design view of the world that is almost straight out of an Ed Yourdon or Tom De Marco book (both key players in structured design in the 1980s), where you have a module calling tree that manipulates content in an entity relationship diagram. As this paper is from 1995, perhaps that’s not so surprising.
3.1 UI as Tree
PTD views the UI as a giant tree of tasks and sub-tasks, some mutually exclusive. See Figure 4 below (taken from the paper).
Figure 4 is used in the paper to consider the impact of polymorphism at different levels of the task tree. As a practical example of such a task tree, consider Microsoft Word.
Word works with a top-level (mostly fixed) menu bar, context-sensitive options on mouse buttons, and keyboard short-cuts. Selecting, say, the file menu, gives a sub-menu and context-sensitive short-cuts. Word, on the Mac at least, has a “send to” option in the file menu to email the current document, or to send it directly to Excel.
Figure 4 is the inverse of the Word menu hierarchy. Because PTD is task-based, the top-level tasks are largely the leaves of the menu tree. The top level tasks in the cut-down example of Word are “Email document” and “Send document to Excel”; this is level A in the hierarchy above.
Taking “Send document to Excel” as our top-level action, the next level of task is a multi-modal choice between a direct keyboard short-cut, and modal menus (like in the ‘Delete file’ example in the paper): “Send by keyboard shortcut”, “Send by modal menu”.
Taking “Send by modal menu” as the task to decompose, Microsoft’s designers require the user to choose between kinds of thing to send to. There are no short-cuts at this level – the user must stick with modal menus.
To have selected “Send by modal menu”, the user must have first selected “Select perform actions on file”. So “Send by modal menu has 3 sub-tasks: “Select perform actions on file” then “Send by modal menu” then “Send document to Excel”. It is that order of task that drives the rendering of the interface.
At each node in the tree, the relevant user properties are then identified, particularly those where alternate modalities (e.g. keyboard short-cuts and sub-menus) are identified. It is the properties identified here that are used to select appropriate rendering. In this case, both modalities are rendered, with reminders of the short-cuts rendered on menu-items.
PTD allows for multiple alternate renderings of any task node, but constrained to the one task hierarchy given. It can be different views of the same content; it cannot be different task-level navigation (that would be an alternate modality, which would be expressed in the diagram explicitly). This becomes an issue if you need to re-organize content for a particular user, or a particular device, say with a smaller screen since whilst the content being navigated is stable, the task hierarchy is dynamic.
PTD does in fact allow for relatively complex rules abut order and repetition, but it isn’t designed to cope with dynamic rule-based generation of navigation structure. And that problem is the old Structured Analysis/Structured Design problem of defining program controls almost independent of data structure: change the data structure slightly and the whole calling tree falls apart. Change the requirements slightly (e.g. splitting content between two pages) and the whole task tree falls apart.
3.2 UI as Association
My own work [with a project name of Carnforth] has a view of the UI as a navigable relational model of content. The current model is in four layers of increasing abstraction:
content –> semantics –> navigation –> exception
The content layer inventories tangible content that may be expressed through the UI.
The semantics layer describes the relationships between content. It also identifies equivalent content, precedence, and longevity. For example the content layer may contain text elements, and the semantics layer may describe the elements as an ordered list or as the content of a table (or its headings). In terms of comparison with PTD, equivalent content is of note. Because PTD is about actions, there is no obvious way to identify equivalent content except by alternate modes. Because of the tree nature of the PTD hierarchy, this means that each time alternate content is added, a completely new branch starts in the tree, making it difficult and cumbersome to replace content within the tree without repeating parts of the original. I can imagine allowing branches to merge and allowing more complex dependencies between siblings within the tree, but there are no examples in the paper (or anything to say that it is not possible).
The navigation layer describes navigable transitions across the semantic layer. For example, the semantic layer may contain an ordered list of text and a set of bitmaps, with the navigation layer containing navigation associations from list elements to specific images to display. The navigation layer may also identify groups of content (“pages”) where user “selection” of elements within the groups may infer navigation transitions to another group (“hyperlink”). The meanings associated with the navigation associations have a counterpart in PTD’s “tasks”.
PTD tasks describe actions on events that cause UI rendering.
PTD’s rendering rules are described on a per-action basis.
It is possible with PTD to describe additional polymorphic tasks at any node of the task hierarchy without affecting navigation through overall tree.
Carnforth associations describe possible transitions between UI renderings.
Carnforth’s rendering rules are described on a content basis.
An action in Carnforth is a function of the transition:
Action = Transition(SourceContent, TargetContent)
Many transitions may map to the same action.
It is possible with Carnforth to add/remove additional transitions at will so long as the semantic meaning of the rendered content is observed. Since “pages” are defined at navigation level, this means that, say, a website’s navigation may be automatically reconfigured to suit a user’s (or a device’s) capabilities without a loss or change of semantic meaning.
Carnforth also possesses an exception layer, its highest level of abstraction. Exception handling may also exist in PTD as explicit sub-tasks (as an exclusive alternative to the standard action) or within the low-level rendering of the UI (PTD allows detail to be hidden within tasks if required).
Carnforth’s exception layer allows for different exception handling strategies to suit individual users and contexts. For example if a transition in the navigation layer fails, say a broken link on a web page, then a strategy appropriate to the user may be selected. If the page is being rendered in audio for a blind user, then perhaps a message is spoken but the existing pages is retained. If the page is being read in a mobile environment, then perhaps a “Retry? Yes/No” dialogue is offered (in case the user’s train went into a tunnel say). The typical default case may be to return a “404 error” page. In each case the transitions on the navigation layer are augmented/replaced/removed.
3.3 Server Technologies
A UI does not live in isolation, and provides only one part of the overall application. In the case of the Web, and many data-intensive applications e.g. Lotus Notes, The UI forms the client end of a client-server architecture.
Client-server architectures pose a problem for adaptable/adaptive interfaces. Taking Web pages as an example, a number of architectures exist where the server side dynamically generates web pages for presentation on the client side. In this arrangement, the server-generated pages tend to make buttons and hyperlinks fetch a new page on selection e.g. the ‘Google Search’ button on Google causes a form to be sent to the server resulting in a new results page being returned to the client. This corresponds directly to concrete tasks in the PTD task hierarchy. This makes it very difficult at the client side to adapt content, which traditionally is where assistive technology is located.
For Carnforth, a change in the architecture of client-server web pages is required. The client (assuming AT stays in the client) needs to be able to query for content from any of the four layers, allowing the server to encapsulate resolution of that request (the internal data structures of the application may be radically different). For existing web technology, this means AJAX-like applications with more intelligence in the client pages.
This also means that the idea of a “task” is no longer the same as a page refresh (which is the classic HTML view) although the effect may often be the same. Tasks become inferred by a transition signature: the originating page, the target page (assuming an xmlHttpRequest) and the provided parameters from the encapsulating HTML form. The originating and target “pages” represent nodes in the navigation layer.
3.4 PTD existing within Carnforth
For all the modelling of relations within Carnforth, dialogue sequences must and do exist within the UI. And if they exist, they need to be designed.
There are two places where this makes sense within Carnforth.
One is in the design and description of interaction metaphors in my Metaphor domain [the PhD developed a range of relational models, each representing a problem domain]. In this case it is defining “patterns” to be applied to an abstract UI.
The other is in the “prototyping process”. There is always a question with these sort of tools and techniques about how they would be used in practice. My working idea has always been: hard-code a web page/user interface, throw it at my four-layer model, and augment the model with sufficient content, semantics, navigation rules, and exception patterns that it becomes adaptable. The more you do, the more accessible the finished product. That initial prototyping calls out for PTD, with or without the polymorphism component (but better with).
In practice, I do wonder if PTD is ever likely to be used. The WYSIWYG view of the world is deeply ingrained in web/UI developers, and it is difficult to see people stopping to consider task hierarchies, when tools such as Dreamweaver encourage top-down design of menus and layout.
4 Reflection on PTD in 2016
I can’t say that I see a great deal of polymorphic task decomposition in 2016, then again I don’t see many people useing my CISNA model either (which is what this work evolved into). As I said in the last section above, designers and developers have a very ingrained way of working and their idea of adapting content for disabled users is to throw some alternative content into a very fixed hierarchy of content. The best we see is probably Responsive Design, where designers adapt layout, and occasionally add/remove content, based on screen size. Hopefully Responsive Design will become the thin edge of the wedge into adaptive user interfaces.