Evolution of Selenium

As we make strides to automating the web at work, I caught myself asking myself an innocuous question: What incremental improvements can be made to Selenium UI-Element? Defining elements, especially for more complex pages, takes considerable effort. I wondered if some simple enhancements could make the task a bit easier.

But then I stopped myself. Why not think big? What is the true potential of an idea like UI-Element in the framework of Selenium?

Firefox ships with the SQLite database engine, which can be manipulated through extensions, as Google’s recent Google Gears has demonstrated. With this kind of storage it would be rather trivial to capture a record of DOM trees while recording a test with the IDE. In fact, we wouldn’t really call it recording at all. We’d call it training.

I think an expert system could enable rapid induction of the mapping between object labels and the object finder (in the case of UI-Element, the finder is a function returning an XPath). Here’s what that process might look like:

  1. Tester provides data to the system by recording a large number of interactions with the application, covering all aspects and workflows. The system records tuples of (command, target, value, DOM tree) in an ordered fashion, such that the entire recording session can be reproduced if need be. All this data is persisted in some datastore.
  2. The system distills the data, with the following goals:
    1. Derive a list all interacted elements
    2. Associate each element with the set of pages it appears on
    3. Associate each element with the set of commands enacted upon it
    4. Associate each element with a set of values related to it
    5. Create a graph of the common workflows
  3. The system now solicits more user input. For each element found above, it redisplays pages containing the element (to refresh the user’s memory and to serve as a reference) and asks the user wizard-style questions like the following:
    1. What is the name of this element? Its purpose?
    2. Can this element appear on the page more than once? If so, please click on all other occurrences of the element on this sample page.
    3. Does this element have a direct association any other elements on the page? If so, click those elements.
    4. Please indicate the relationship of the current element to each associated element: child, parent, or peer.
    5. Here is a list of attributes that might help accurately define the XPath for this element. Do any of these class names, ids, etc. appear semantically relevant to the element? Please rank them in order of relevance.
  4. Using the user-provided answers, the system automatically deduces an XPath generating function for each element that takes into account multiple occurrences on the same page, as well as relationship to relevant elements on the same page. It has the original DOM tree at its disposal for this analysis.

Associations expressed in natural language can be automatically encoded into element finder logic. The user does what he or she does best - provide domain-specific knowledge about the application - and does not have to worry about the gritty details of encoding a mapping.

I don’t think something like this would be impossible with Selenium.

2 Responses to “Evolution of Selenium”

  1. picardo Says:

    I think the greatest improvement would be to give it a more usable interface, something that would invite non-programmers or non-experts. It seems to me the IDE is stuck in an earlier paradigm where the interface didn’t matter as much as functionality, and it was felt the more functionality a program has the better. But nowadays non-experts are coming to open source projects in great numbers–as the Coding Horror author Jeff Atwood and Jakob Nielsen point out–and I think it would be great if someone could take all this existing, and btw powerful, functionality you have created and translate it into usability. Even in terms of a more accessible documentation would be really helpful. Case in point, I’ve been navigating Open Qa for three days trying to nail down exactly what Selenium IDE was capable of, and I came to your original article on UI extension just this morning; it was linked to in some obscure corner of a page. It turns out that Selenium doesn’t have the capability I require, though, so I need to extend it to modify web pages and save files, or go elsewhere, but I wish I had figured that out in the first few hours of my search.

  2. Administrator Says:

    You’re right about the documentation - I too wish there was more and that it was more clearly laid out. For example, the best Selenese guide is tucked away in a releases subdirectory:

    http://release.openqa.org/selenium-core/nightly/reference.html

    I actually like the Selenium IDE. It’s a pretty simple interface, and does pretty much what it claims to do - allow recording of browser actions, and facilitates exporting to various “driveable” formats. What in particular do you find lacking in the IDE, in terms of usability? Or functionality? Do you believe the solution would be effective for all users of the IDE, or for your specific use cases?

    I’m curious.

Leave a Reply


Bad Behavior has blocked 824 access attempts in the last 7 days.