IBM Disaster Recovery

Balancing complexity with usability in enterprise software

Designers tend to live by the credo “less is more,” but is this always the right approach? Some tasks are inherently complex, with sophisticated users who need a lot of capabilities. Just as you couldn’t design a nuclear power plant control room or an airplane cockpit with only a few buttons, a lot of the enterprise software I worked on at IBM involves a high degree of technical complexity. The challenge isn’t to remove all the complexity — but to ensure that usability doesn’t suffer in spite of it.

This is the story of designing a new product that enables continuous availability of systems during planned and unplanned downtimes through data replication and synchronization of data. Don’t worry if that sounds overwhelming, as I’ve explained it all below. I’ve also explained how I navigated this complexity, both from a personal standpoint, and in terms of creating logical user flows.

My role on this project was UX Designer. I collaborated closely with one other designer under the supervision of a team lead. The project spanned roughly three months.

A disaster recovery plan means having a backup database ready to take over until you can fix your damaged database.

WHAT IS DISASTER RECOVERY?

Disaster recovery is about having a plan in place to manage and recover from a disaster scenario. A “disaster” could be an IT bug, an electrical surge, an earthquake, or really anything that takes down your primary database or technical infrastructure. The goal of disaster recovery is to minimize disruption to business operations by reducing downtime and limiting data loss. Businesses take disaster recovery very seriously because every second they’re offline is money that’s going down the drain.

Here’s how a disaster recovery plan works. A user sets up one or more backup databases. When there’s a disaster, they switch over from their main database to a backup. You can think of it a little bit like a hospital having a backup generator in case anything goes wrong. Admins work on repairing their primary database, and when it’s fixed, they copy changes from the backup over to the main database and get it going again.

Creating a storyboard was a good way to ensure everyone was on the same page about the process.


THE GOAL

We designed this new feature for the IBM Integrated Analytics System (IIAS), an on-premises appliance. I didn’t have access to customers in the early phases of the project, however I did have access to many subject matter experts (SME) and technical staff who work with existing IIAS clients. I collaborated with these experts throughout the entire process, holding daily discussions to understand the problem space, discuss user needs, and refine interface options.

Through our discussions we identified our main user persona — Emmet, The Systems Administrator. IBM already had a database of validated personas so we had a good foundation to start with.


We also identified the core problem and came up with a clear goal to work towards.

As-is Pain Point

The System Administrator does not have a simple way to understand if their system is recoverable, what if any recovery plan is in place, what state the system will be in after a recovery, and if there are any warning signs that recoverability is at risk.

To-be Opportunity

The System Administrator will be able to see whether the system is ready for a disaster, have realistic expectations of what a recovery process will achieve, and be able to understand if there are any risks or potential issues with the data replication process.

DESIGN CHALLENGES


Challenge #1: Gaining Alignment

Throughout this project, we struggled to gain consensus among our stakeholders and collaborators. At the outset, they had a hard time defining the project requirements and explaining the end goals. As designers with no initial domain knowledge, hearing conflicting objectives made the situation even more challenging. To get everyone on the same page, my design colleague and I spent time fleshing out use cases and defining clear user stories. We created storyboards, to better understand the end-to-end process of disaster recovery. We also created many rough hand-drawn sketches of interface options. We based these sketches on assumptions, many of which we later invalidated. But the goal wasn’t to be perfect — it was to get the product managers, SMEs, and developers talking about their varied visions for the product. Through this process, we learned more about the product requirements and what was technically feasible in time for the launch. As things progressed, we faced scope creep, but we worked with the broader team to develop a roadmap and phased approach so that some of our ideas could be tabled for a later product release.

We started with a bare UI and added the complexity back in as needed. Each iteration allowed us to gather information about the objects and actions that were truly mission critical. Hand-drawn sketches were a strategic choice — we wanted our designs to feel rough and easy to change. No one would be upset or hesitant to point out if things were wrong.

Challenge #2: Complexity vs simplicity

An important consideration was how much we should reveal to the user about what was happening on the product’s backend. The underlying technology behind data replication is complex, involving things like Qcapture, Qapply, and Qsubs. Although we knew our persona Emmet was very technically minded, we wanted to ensure he felt in control but not overwhelmed with information. We worked hard to balance hiding some things “behind the curtain” while still giving Emmet enough information to know his disaster recovery plan is in order. A big part of this involved striking the balance between using very specific terminology that Emmet knows and expects to see, but keeping the product feeling conversational and not too jargon-heavy.

To further assist the user, we created a series of ten help drawers that appear on the side of the screen to provide contextual guidance. This gives users information about the task at hand and the associated terminology as they need it. I wrote content for this aiming to keep it simple but robust.

Challenge #3: Nailing down the user workflow

What I found as I worked on this product is that setting up a disaster recovery plan is a complex task and a lot of steps need to take place in order to get things up and running. Some of these things need to happen in a very particular order because it’s important on the back end. However, this sequence did not feel very natural or intuitive from a user perspective. My teammate and I tried changing the order of actions but something would “break”, then we changed it again and there was a different issue. It was a very tricky process trying to balance what needed to happen from a technical perspective with how a user might actually go about initiating a disaster recovery plan.

Finding the “right” order was crucial, so we created a clickable prototype in order to discover the optimal sequencing. We conducted user testing at various points in this process and learned that most people struggled to figure out the steps they needed to take based on our existing designs. However, we also learned what steps they expected to take, which enabled us to come up with a few different paths or workflows a user could follow. Based on all of our testing and exploration, we introduced step navigation to walk users through the process of setting up their disaster recovery plan. This helped users avoid confusion about what to do next, while still giving them the flexibility to maneuver in a couple of different directions based on their needs.

Whiteboarding with architects and SMEs to capture the user workflow.

We created a clickable prototype in Invision for usability testing. We worked with a user researcher to define tasks and questions for testing. System Administrators struggled with the workflow, which led us to introduce a wizard pattern.

FINAL PRODUCT

We created high-fidelity designs of the end-to-end workflow and held regular playbacks with the development team to ensure our designs were technically feasible and within scope. We also produced detailed documentation in order to hand off to our visual designers. We worked with them to explain our design intentions and made UX refinements as needed.

PRODUCT ROADMAP

I’m happy with the product we delivered and it has been shipped to great success. Of course, there’s always more that could be done, and we identified some of these things during the design process but decided to table them for later product releases due to time constraints. Primarily, the roadmap includes scaling the product to be able to support not just one, but multiple “backup” databases. Ideally, these databases would all be able to replicate to each other (a many-to-many relationship) so that if one or even multiple databases go down, there’s always a backup somewhere. Our tabled designs included wireframes for giving the user the ability to specify which particular bits of data would be replicated and to/from which databases. We also provided an “enterprise” view of this information so the user could monitor what is happening across all their databases and be alerted to any issues.

LEARNINGS

One of the key things I took away from this project was a heightened understanding of the role that designers play as the bridge between technical constraints and user needs. I had to work hard to advocate for the user and their expectations regarding workflows to ensure the development team and SMEs didn’t simply implement what made sense to them based on their deep understanding of the backend infrastructure. We had to create an experience that felt smooth and intuitive to our end user, and it was my job as the designer to ensure that happened. It was important to me that despite the technical complexity of the task, we didn’t sacrifice usability.

I also developed better techniques for rapidly acquiring knowledge in complex domains. Designing highly technical, large-scale, enterprise software means being comfortable with ambiguity and aware that you “don’t know what you don’t know”. The technical experts that you work with will often assume you know certain things (due to their own depth of knowledge) and neglect to mention them — unless you ask, which of course you can’t because you don’t know to! Listening “between the lines”, shamelessly asking the “dumb” questions, conducting independent research, and using design methodologies to uncover hidden information have helped me greatly.

Throughout this project I faced many challenges and this helped me further sharpen my intuition around what design tools to use in order to overcome specific obstacles in the design process. I always perform design tasks with an objective in mind, and this project was a clear illustration of that.