Can You Hear Me Now?: Real Testing

For about a year and a half, I owned a Motorola E815 mobile phone. I loved the thing. It worked flawlessly until the Bluetooth feature decided to stop working one day and I could no longer pair a headset with it. I called Verizon Wireless, which agreed there was a physical malfunction and offered to replace it with a refurbished unit. I took them up on their offer and received a replacement unit within three days.

Along with the replacement unit came a two-page printout of very cryptic test results. From what I could tell, they had hooked the refurbished unit up to a computer and ran a bunch of unit tests on the phone to prove to me and themselves that I would receive a functioning unit. The tests came in two flavors:

  1. Happy Path
    “A well-defined test case that uses known input, that executes without exception and that produces an expected output” (http://en.wikipedia.org/wiki/Happy_path). In other words, the computer testing my phone made phone calls, used the built-in contact list, and exercised other common functionality in ordinary ways.
  2. Boundary Condition
    Read any of the Pragmatic Unit Testing books (available in both Java and C# flavors) and you will learn that software often fails on unexpected input and boundary conditions–really large numbers, really large negative numbers, zero, null values, full hard disks, or anything else the developer wasn’t expecting when s/he was writing code.

I clearly remember thinking “Wow, yet another reason to like Verizon Wireless. They really tested this replacement phone.”

The funny thing was that the number two (2) button on the phone didn’t work all the time. After trying to live with the inconvenience of a fickle button, I called Verizon to get another replacement. Again I received a refurbished phone along with the same two-page printout with slightly different but successful test results. All the buttons worked this time, but the speaker buzzed like it was overdriving whenever someone would talk to me, even if the volume was all the way down at its lowest setting. After trying to live with that inconvenience, I again called to get a replacement. Another refurbished phone, accompanying test results, and this time one out of every three attempts to flip the phone open resulted in a phone power reset.

And then it dawned on me: Verizon (or Motorola, not quite sure) probably spends much time, effort, and money creating well thought-out and automated happy path and boundary condition tests to run on phones before shipping them out. However, I have a high degree of confidence that a human never tried to actually make a phone call with any of the phones I received. I noticed all three replacements were broken during the first calls I tried to make with them. All that time, effort, and money was wasted (in my situation at least). Once I realized the testing process for refurbished units was broken, I decided to just cough up the money and buy a totally new phone. (Which I just dropped the other day and shattered the external screen on. We’ll see how long I can live with that nuisance.)

The moral of this long story is not to bash Verizon. (Their network truly is what it’s hyped-up to be.) The moral of the story is that real testing needs to be done. Verizon should be making real phone calls using real humans–or at least a robotic device that simulates a human’s interaction with its phones.

Integrated test suites that know the guts of an implementation and execute at lightening speed are great–let’s not discount those. However, we must ensure that real testing takes place from the deepest parts of the system all the way out to the point of human touch. Obviously, subjecting humans to perform all the testing of a product by hand is inhumane and grossly cost-inefficient. (This is particularly true in the case of multiple iterations of regression testing–don’t laugh, I’ve seen it happen.) Testers should strike a balance. Testers should use automated, but realistic, simulated interaction tests with software, web sites, and product interfaces. They should use application test suites that actually click software buttons and Sahi, Selenium, or Watir to click web-based hyperlinks and check checkboxes. This type of testing provides a nice balance of both automation and human interaction simulation.

In short, testing should involve traditional, automated happy path and boundary condition tests; automated human-touch simulations; and, finally, real human-touch. The order of importance will depend on what exactly is being tested; just make sure all three happen on your project or else I might be blogging about you too.

Usable Trash Cans and Metro Lines

My design life has been altered by three really good books:

  1. Donald Norman’s The Design of Everyday Things
  2. Edward Tufte’s Visual Explanations
  3. Steve Krug’s Don’t Make Me Think

After reading them, I can’t help but regularly see how I might go about fixing broken designs or simply improving ones that already work.

Last night was no exception.

Exhibit A: The Unusable Trash Can

While looking for a place to discard the remains of my dinner, I passed a row of recycling bins twice. Patrick, being a more intelligent individual, actually read the print on the recycling bins and noticed that one was really a trash can:

Unusable Trash Can

I’m all for reading and intelligent thinking, but whoever designed this fleet of waste bins could have done two things to aid their usability:

  1. Use a different color. Gestalt psychology teaches us that our brains tend to be holistic. When we see things that look the same, we at first believe they actually are the same–or at worst highly similar. I saw three blue bins and assumed all three were for recycling. I was wrong.
  2. Remove the conflicting text. I don’t know about yours, but my mind juxtaposes recycling and waste. (I think it’s because of all the positive “marketing” I’ve heard over the years about the benefits of recycling over simply throwing things in the trash.) I read “recycling” and stopped reading because I wasn’t looking for a recycling bin; I was looking for a trash can. It was right there in front of me.

Exhibit B: The Red Line

Patrick and I had two options as to which Metro station we wanted to start our trip from. He picked Grosvener-Strathmore over White Flint because he knew that more trains visited Grosvener and that we would be on our way quicker if it was our starting point.

The Red Line

Both stations are on the Red Line and no other lines intersect Grosvener. So why and how can more trains visit it? Naturally, demand for the Metro increases the closer you get to the heart of DC, and they can handle this demand by allowing for trains to reverse direction at this particular station.

How are ignorant people like me supposed to know this helpful information? As I was asking myself this question, my mind subconsciously jumped to Minard’s Napoleon’s March and thought it would be nice if the thickness of the Metro lines on signs and printed material was relative to the frequency of train visits. In short, a thin line would mean few train visits and a thick one would mean more.

Obviously, this idea breaks down if the train schedule is dynamic (which it isn’t) or if a train breaks down on the tracks blocking traffic, which, unfortunately, my sister can attest to. However, under normal conditions, it reflects reality and would probably prove useful as people plan their trips without having to inspect a daunting, six-page train schedule table.

Although neither thoughts are mind-blowing, they struck me as nice ones to reflect on and share.

(See Patrick’s post on Subway Maps and Scope Creep.)