Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puts more focus on U.X. in the RFD #43447

Merged
merged 7 commits into from
Jun 26, 2024
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 142 additions & 13 deletions rfd/0000-rfds.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,10 +117,151 @@ something like the following.
```
# Required Approvers
* Engineering: @zmb3 && (@codingllama || @nklaassen)
* Security: (@reedloden || @jentfoo)
* Security: (@rjones || @klizhentas)
* Product: (@xinding33 || @klizhentas)
```

### UX

Always start the RFD with a user experience section where you start with user stories. Every other part of your design - security, scale and privacy will flow from the UX, not vice-versa.

#### User stories

Explore UI, CLI and API user experience by going through scenarios that users would go through while solving specific problems.

In each story, explain specific step-by-step UI, CLI and API requests/responses that the user would observe,
as if you are writing a step by step guide for a user who knows as little as possible about Teleport.

If you find too many steps or concepts end users would have to learn, start again to reduce it to a minimum.

In each user story, think about failure modes - what will happen if your integration fails?

**Example: Alice integrates Okta via UI**

Here is an exmaple of a UI-driven user story:

Alice is a system administrator and she would like to integrate Okta with Teleport. She does not know anything about Teleport except the basics, but she has detailed Okta knowledge.

She logs into Teleport, looks for "Integrations", quickly finds an Okta tile and clicks on it.

In the Okta tile, she is asked to add a name for her Okta tenant. She can find the tenant in the Okta's UI and the information
bubble shows her how to do that.

The next step for Alice is to find and locate the SCIM bearer token. Alice needs to go back to Okta again, create Teleport API services
app in the Okta catalog, copy the SCIM token and paste it back to Teleport. Teleport's UI directs her to do just that.

Alice copies the token into Teleport UI. Let's assume she makes a mistake, and the token is broken or misses the permissions.

Alice is directed to Test the integration. The test finds an error and shows her that Okta returns an error:

`Insufficient permissions when synchronizing a user". Teleport shows a detailed response from Okta service, offers to check the token permissions and try the test again.

Finally, Alice figures out the right permission set on Okta's side and Teleport test passes.

Teleport tries a test sync run and offers Alice to tweak the integration parameters. If Alice is happy with the set she clicks save.

#### Make failure modes a first class citizen.

Administrators and system managers spend most of their day debugging integration
issues, failures and errors. Make their day pleasant by building user experiences
for most common failure scenarios:

* What if the integration fails after its setup? Can Alice learn that it's broken, thenfind out where to go back and troubleshoot it?
klizhentas marked this conversation as resolved.
Show resolved Hide resolved
* What if Alice needs to tweak the parameters of the integration after setup? Can she go back to the integration and test it?

#### Build Poka-Yoke Devices

In Manufacturing, a Poka-yoke device is anything that prevents an error within the manufacturing process or makes defects visible.

Translated to Teleport, you can build a UX that can prevent people from making a mistake.

For example, if an admin assigned to a role, and changes a mapping that will lock themselves out and leave no other admins, Teleport could prevent the error by blocking the action:

"You can't unassign yourself, because there will be no more admins left."

#### Make UX that reduces information overload and work

Let's take a look at the Gmail. When a user clicks on an e-mail, they are offered an option - "Filter messages like this”. Instead of deleting or moving messages one by one, Gmail offers to write, test and set up a rule that also applies to all other messages.
klizhentas marked this conversation as resolved.
Show resolved Hide resolved

This reduces the amount of manual, tedious work, and works well for one message or a thousand.

When possible, build UX that offers users to reduce the amount of steps and do extra work on their behalf, instead of prompting them to do work that can be automated.

#### Think through the Day One and Day Two user experiences

As a Day 1 user, we don't have any domain knowledge of the product, we are novices.

That's why Day 1 flow should be the first user story we think through. It does not have to be scalable, but it must be easy.

For example, as a Day 1 user, I need step by step guide on how to add one or two servers and databases without learning about RBAC, configs and other Teleport internals. On the UI, Day 1 flow is guiding user each step of the way to enroll a server, test its connection and get to success in the minimum amount of steps.

As a Day two user, I'm concerned about setting up a feature at scale. My Day two user experience is different, and I know a bit more about Teleport.

For example, I would like to spend a bit more time setting up Teleport to automatically discover all my AWS resources and add them to the cluster.

Here are two imaginary examples demonstrating how Day 1 and Day 2 CLI U.X. are different.

**Example: Day 1 CLI certificates**

As a day one user, I would like to issue a certificate to two services to set up mTLS in my cluster.

```bash
tbot join service-a --cluster=teleport.example.com
[1] Joining to cluster teleport.example.com...
[2] Issuing a certificate to ./tbot/certs/service-a/cert.pem and key.pem..

To test using this certificate, try:

curl https://teleport.example.com/ --cakey=... --cacert...
klizhentas marked this conversation as resolved.
Show resolved Hide resolved
```

On day 1 we keep the amount of new concepts, ideas that users need to think about here to a minimum, and automate most of the steps.

This flow does not have to cover all possible scenarios, just 80% most common ones to
get user to success as fast as possible.

***Example Day 2 CLI certificates***

The UX in the previous example won't scale for Day two, as there are many configuration options to consider, so for a day two user we can offer something more flexible at the expense of adding complexity.


```bash
tbot bootstrap service-a --cluster=teleport.example.com

[1] Generating tbot.yaml for service a in ./tbot/configuraiton/tbot.yaml
[2] Generating service-a role...
[3] Generating systemd unit ./tbot/certs/service-a/cert.pem and key.pem...
[4] Starting a daemon...
```

In this case instead of a simple one liner, we generate detailed step-by step parts and instruct users how to configure those.

#### Make error and info messages actionable.

Make sure errors and info give specific instructions and give enough information.

Explore common failure modes and how users can recover from them.

Here are a couple of examples of messages that need work:

> Please review the access list "My-Awesome-Team", the review is due in 4 days.

This error message misses the actual link or any specific steps users need to take to review the list.

> Failed to set up Okta integration - "Bad request".

This is the most frustrating error messages users can encounter - they don't' see any logs, no way to re-test it or trigger the error,
klizhentas marked this conversation as resolved.
Show resolved Hide resolved
and all they can do is to reach out to support.

#### Consider Cloud UX from the start.

Cloud is a first class citizen. The feature setup can no longer rely on static teleport.yaml configuration, as this automatically
excludes all cloud customers.

#### Upgrade UX

Consider the UX of configuration changes and their impact on Teleport upgrades.

### Security

Describe the security considerations for your design doc.
Expand Down Expand Up @@ -149,18 +290,6 @@ Describe the privacy considerations for your design doc.
and how it will be retained/deleted
* Explore if there are sufficient logs showing any data access or modification

### UX

Describe the UX changes and impact of your design doc.
(Non-exhaustive list below.)

* Explore UI, CLI and API user experience by diving through common scenarios
that users would go through
* Show UI, CLI and API requests/responses that the user would observe
* Make error messages actionable, explore common failure modes and how users can
recover
* Consider the UX of configuration changes and their impact on Teleport upgrades
* Consider the UX scenarios for Cloud users

### Proto Specification

Expand Down
Loading