Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add spo dls to internal knowledge search #141

Merged
merged 21 commits into from
Jan 12, 2024

Conversation

seanstory
Copy link
Member

@seanstory seanstory commented Dec 18, 2023

Description

The goals of this PR are:

  • to introduce Sharepoint Online as a searchable content type
  • to provide an example of using document level security in a search experience

To that end, several significant changes were made.

  1. The configuration of a single API key was replaced with configuration for a username/password. This is because API keys cannot be used to create other API keys with different role descriptors.
  2. A new configuration of a "search persona" was added. This defaults to a hardcoded string of "admin", but if the configured search application has an index with DLS enabled, the first 30 identities will be fetched from the identities index and will be populated as options in the dropdown.
  3. The SearchApplicationSettings model was augmented to include these new settings

Requested Review

React is not my wheelhouse, and I know the diff is not currently well-organized. Please feel free to provide "obvious" guidance. "Sean, function X obviously should be in file Y" is perfectly acceptable here. 😄

If we can, let's leave styling review out for now. Maybe one day this project will get a face lift, but I'm probably not the right person to invest in that - we're focused on functionality ATM.

Disclaimer

It should be noted - this is not intended to be example code that could be mimicked for a production search application - this is NOT SECURE, and merely demonstrates how changing API keys based on a persona from an identity index can provide different views into a search application's data. In a real setup, the management of API keys would be done by a backend server that sits between the browser and Elasticsearch.

dls-internal-search

@seanstory seanstory marked this pull request as ready for review January 3, 2024 20:13
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a stop-gap until elastic/search-application-client#33 is merged and released

REACT_APP_SEARCH_APP_ENDPOINT=http://localhost:9200
REACT_APP_SEARCH_USER=elastic
REACT_APP_SEARCH_PASSWORD=changeme
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry if I'm misunderstanding something here, but are we asking people to send their account password to the front end? I can understand that it is convenient to have the front end do everything without a back end, but this is not a practice that should be encouraged, I think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. See this slack thread: https://elastic.slack.com/archives/C01795T48LQ/p1702066837681769?thread_ts=1702066380.039599&cid=C01795T48LQ

specifically:

I don’t think we want to build an e2e app that is going to be the skeleton a customer could start from. But we want to give them some examples, using the sample app that exists already

and

An example application that customers could actually run with is not as small a thing.

This sounds to me like too big effort to start with.

Basically it came down to scope, and the fact that this isn't intended to be used for production, but to simply demo on fake data. Once this PR is in a clean place, I'll be starting on a blog/tutorial that will reference this example, but discuss how insecure it is and how a real architecture for DLS would need a backend server between Elasticsearch and the browser that could do some of these elements for you securely.

are we asking people to send their account password to the front end?

Even with the above context, I'd planned to use only API keys. Unfortunately, there's a 3-year-old "known issue" where ES API keys cannot be used to create API keys for other users/privileges. See this slack thread: https://elastic.slack.com/archives/C0D8ST60Y/p1703017952223129
So I was faced with either switching to basic auth, or introducing something like JWT as an additional realm and part of setup. And again since this isn't indended to be used for production but just as an example/demo, the extra scope does not seem worth it IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I have is that this is not demonstrating something that has any practical value. If you are going to show how to do something in a bad way that nobody should use, then why show it in the first place?

I have no doubt there are reasons why this is this way, but the reality is that the decision of wether this is supposed to be used in production or not is not ours to make. People who download this example will make this decision, and if you give them something that is badly designed but works, the majority of people will use it without any concerns, because they'll assume that if we do it we think it's okay for them to do it too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I have is that this is not demonstrating something that has any practical value. If you are going to show how to do something in a bad way that nobody should use, then why show it in the first place?

I think there's some distance between "not intended for production" and "no practical value". 🙂
The issue that we've seen is that DLS is really confusing for our Field folks. Despite a LOT of documentation we've written on the subject, they struggle to understand the moving pieces enough to be able to put together a POC. The goal here is to provide something similar to this existing demo, but populated by dynamic search personas, rather than hardcoded ones. CC @serenachou and @danajuratoni in case y'all would like to add some more context.

the majority of people will use it without any concerns

I think that's a bit of a leap, especially if we go out of our way to flag that this is not production-ready or secure. We could definitely go further than I have so far though - would it assuage some of your concerns if we added a banner to the top of the search experience to note that this is not for production? And/or if we added a large disclaimer at the top of the example app's README?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this knowledge base app - since it's not meant for production in current scope, let's clearly document the password strategy here with a warning that customers should look to more secure methologies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main 2 goals of this initiative are

  1. provide customers and Field with guidance how to set up DLS for POCs. We've seen multiple instances where customers and Field colleagues were not able to successfully set up DLS after many days of research. Providing an example on how this can be achieved, has immense value.
  2. additionally, the example app should be an easy to reuse tool that @lio-p can use to create eden demos.

For DLS specifically, this example successfully shows how permissions should be set up. However, the goal wasn't to show a best practice for an entire search application architecture. The security concerns voiced out in this thread are valid concerns, and the question I see raised is: does the value of this example app that zooms in on setting up DLS justify deferring / separately addressing generic search experience design & development "best practices" concerns? cc: @serenachou
From the DLS POV, my strong opinion is that we need an example of how DLS works e2e. And this was brought up numerous times already. Sean's work is a great starting point we can use in our discussions, and this initiative should be continued to also showcase our recommendation for designing a search experience, as @serenachou and @radhapolisetty prioritize.

@miguelgrinberg a followup question here is - is the goal to have all example apps in elasticsearch-labs follow specific design standards for search experiences? If so, what are these?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danajuratoni Since this is a public repository where people come to get examples and ideas to help them build their own projects, I think it is reasonable to expect that we will be promoting safe and industry accepted patterns. I'm not setting a high bar here, just that we should follow standard practices, especially with regards to security.

The version of this application as modified in this PR asks the user to configure their Elastic account password in their front end, and uses it from the front end in a way that is intended to only be used in a server-to-server communication. This is unacceptable, we should never promote any applications that require having sensitive information available in the browser. Even if we feel this is just a demo, not for production, etc. the risk is too high, because people are not going to read the disclaimers and all the "buts" that we can come up with and will copy the insecure solution and run with it no matter the risk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @miguelgrinberg, I'm going to pick this up and add a quick backend to this PR. Checking to make sure we're aligned on what we'd want in this demo: I think adding authentication and SSL to this backend would be overkill, but happy to hear your thoughts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sphilipse, yes, I would also consider SSL and authentication overkill for a demo, in fact we don't have them in the search tutorial and RAG examples either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool, I added a backend in 3440e55, let me know if that matches your expectations @miguelgrinberg and then I think this should be close to ready for merging.

@seanstory seanstory requested a review from yansavitski January 4, 2024 14:06
Copy link
Member Author

@seanstory seanstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few little things, but overall I think this is great. Thanks so much!

@seanstory seanstory merged commit 0849905 into main Jan 12, 2024
1 check passed
@seanstory seanstory deleted the seanstory/add-spo-dls-to-internal-knowledge-search branch January 12, 2024 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants