Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serval App MVP #192

Merged
merged 15 commits into from
Oct 24, 2023
Merged

Serval App MVP #192

merged 15 commits into from
Oct 24, 2023

Conversation

Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Oct 23, 2023

Fixes #132.

Once this is merged, I can deploy it.

I can also pretty up code down the line if you' prefer, but it's all pretty straightforward.


This change is Reviewable

@Enkidu93 Enkidu93 requested a review from johnml1135 October 23, 2023 16:11
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!.

@johnml1135
Copy link
Collaborator

samples/ServalApp/README.md line 11 at r1 (raw file):

When the Serval API is updated, use the tool [swagger-to](https://pypi.org/project/swagger-to/) to generate a new `serval_client_module.py` using the following command:

swagger_to_py_client.py --swagger_path path/to/swagger.json --outpath serval_client_module.py

Where is the json file? Where should I run the command? i.e., go to this sub folder of the repo and run it exactly like this...

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 23 at r1 (raw file):

            print(f"\tStarted {build}")
            session.delete(build)
            email_server.send_build_started_email(build.email)

The email sending may fail. Delete the build after trying to send.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 29 at r1 (raw file):

            print(f"\tFaulted {build}")
            session.delete(build)
            email_server.send_build_faulted_email(build.email, error=data)

Delete after you do everything else.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 37 at r1 (raw file):

            email_server.send_build_completed_email(build.email, '\n'.join([f"{'|'.join(pretranslation.refs)}\t{pretranslation.translation}" for pretranslation in pretranslations]))

        def update(build:Build, email_server:ServalAppEmailServer, data=None):

Is this never called?

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 66 at r1 (raw file):

            while(True):
                send_updates(email_server)
                sleep(300) #Once every five minutes...

This would be nice to configure - especially for testing.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 41 at r1 (raw file):

        serval_auth = ServalBearerAuth()
        client = RemoteCaller(url_prefix="https://prod.serval-api.org",auth=serval_auth)

This should be configurable - for pointing to ext-qa.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_email_module.py line 39 at r1 (raw file):

        msg['From'] = self.sender_address
        msg['To'] = recipient_address
        msg['Subject'] = 'Your NMT build job is complete!'

There should be more information for distinguishing this. The initial and final emails should have an ID number that is shared (can use the build ID). Also, there should be a place for a "name of build" that the user can enter - in case they are making many translations and want to distinguish between them.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 100 at r1 (raw file):

    def submit():
        engine = json.loads(client.translation_engines_create(TranslationEngineConfig(source_language=st.session_state['source_language'],target_language=st.session_state['target_language'],type='Nmt',name=f'serval_app_engine:{st.session_state["email"]}')))

As per the other comment, we should allow the user to name an arbitrary string for the engine/build name.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 100 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

As per the other comment, we should allow the user to name an arbitrary string for the engine/build name.

That is, an arbitrary string that we strip of all special characters and normalize.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_email_module.py line 39 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

There should be more information for distinguishing this. The initial and final emails should have an ID number that is shared (can use the build ID). Also, there should be a place for a "name of build" that the user can enter - in case they are making many translations and want to distinguish between them.

Also, the source and target languages, along with the names of the Paratext files uploaded should be included in one or both emails.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 113 at r1 (raw file):

            )
        )
        build = json.loads(client.translation_engines_start_build(engine['id'], TranslationBuildConfig(pretranslate=[PretranslateCorpusConfig(corpus_id=corpus["id"], text_ids= [] if st.session_state['source_files'][0].name[-4:] == '.zip' else list(map(lambda f: f.name, st.session_state['source_files'])))], options="{\"max_steps\":10}")))

max steps 10!?!?!?!??! At very least have this be set by an environment variable - so we can make a deployment of this either for test or real life.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 126 at r1 (raw file):

        st.session_state['source_language'] = st.text_input(label="Source language tag*", placeholder="en")
        if st.session_state.get('source_language','') == '' and tried_to_submit:
            st.error("Please enter a source language tag before submitting", icon='⬆️')

Point to NLLB-200 language codes - provide a link.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 153 at r1 (raw file):

            if already_active_build_for(st.session_state['email']):
                st.session_state['tried_to_submit'] = True
                st.session_state['error'] = "There is already an a pending or active build associated with this email address. Please wait for the previous build to finish."

Are we sure we want to do this? There may be one person supporting multiple teams. How about "You already have {number} build job[s] running. Are you sure you want to add another one?"

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 154 at r1 (raw file):

                st.session_state['tried_to_submit'] = True
                st.session_state['error'] = "There is already an a pending or active build associated with this email address. Please wait for the previous build to finish."
                st.rerun()

Also, we should be able to log into the email address sending these emails to periodically check who and how many translations are being made.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 166 at r1 (raw file):

                st.session_state['error'] = "Some required fields were left blank. Please fill in all fields above"
                st.rerun()
        st.markdown("<sub>\* Use IETF tags if possible. See [here](https://en.wikipedia.org/wiki/IETF_language_tag) for more information on IETF tags.</sub>", unsafe_allow_html=True)

Also call out NLLB-200 language codes.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_email_module.py line 69 at r1 (raw file):

If you are experiencing difficulties using this application, please contact eli_lowry@sil.org.

Thank you!

Here is where all the info about the build should be dumped - all the files uploaded, the name chosen, the source and target language, etc.

@johnml1135
Copy link
Collaborator

Did you use black formatting? Where are the poetry, launch, updated gitignore, etc. files?

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 18 unresolved discussions (waiting on @Enkidu93)


samples/ServalApp/builds.db line 0 at r1 (raw file):
This should not be uploaded to Github - but rather should be ignored.


samples/ServalApp/serval_app.py line 33 at r1 (raw file):

        def completed(build:Build, email_server:ServalAppEmailServer, data=None):
            print(f"\tCompleted {build}")
            session.delete(build)

Delete after you do everything else.


samples/ServalApp/serval_app.py line 49 at r1 (raw file):

                started(build, email_server)
            else:
                responses.get(build_update.state, update)(build, email_server, build_update.message)

For all the times that it is just normally Pending or Active, what happens? There is no responses.get for normal updates. Will it throw an exception?
`


samples/ServalApp/serval_app.py line 101 at r1 (raw file):

    def submit():
        engine = json.loads(client.translation_engines_create(TranslationEngineConfig(source_language=st.session_state['source_language'],target_language=st.session_state['target_language'],type='Nmt',name=f'serval_app_engine:{st.session_state["email"]}')))
        source_files = [json.loads(client.data_files_create(st.session_state['source_files'][i], format="Paratext" if st.session_state['source_files'][i].name[-4:] == '.zip' else "Text")) for i in range(len(st.session_state['source_files']))]

not all Paratext zips have a zip extension. Either we add the other potential extensions (and say which ones we support), or we give the users a choice.

@johnml1135
Copy link
Collaborator

samples/ServalApp/pyproject.toml line 9 at r3 (raw file):

[tool.poetry.dependencies]
python = ">=3.8,<3.9.7 || >3.9.7,<4.0"

Why not 3.9.7? Why not >=3.11?

@johnml1135
Copy link
Collaborator

samples/ServalApp/pyproject.toml line 14 at r3 (raw file):

requests = "^2.31.0"
SQLAlchemy = "^2.0.22"

Obviously, you can update the versions. But there should be dev dependencies.

[tool.poetry.group.dev.dependencies]
pytest = "^7.4.2"
black = "^23.3.0"
flake8 = "^3.9.2"
isort = "^5.9.3"
ipykernel = "^6.7.0"
jupyter = "^1.0.0"
pyright = "^1.1.331"

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 95 at r3 (raw file):

                        get_update(build, email_server)
                    except Exception as e:
                        print(f"\tFailed to update {build} because of exception {e}")

Any reason for all the prints rather than actually using a logger? Then it would be more straightforward to log the exception.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 100 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Also, at the moment, as a default, I have the name as 'serval_app_engine:'; is that reasonable?

That should be fine. It will just be for us to look at. As for filtering out bad characters - it may not actually matter, as long as there is appropriate escaping and any json or string representations are not messed up.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 101 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Sounds good.

@ddaspit - what are the range of paratext zip extensions?

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 153 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

The person cannot check emails until they've been authenticated, but I could make it per client ID if you like as well. Do you expect to give unique creds to most users?

As this is not for many users - probably one per group. I will assume that the creds will only be shared between a known small group of users (and they can see everything on the Serval site itself if they share creds). I would just want security between credentials.
Yes, unique per Email:ClientID combo.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 166 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

OK, so would you like me to list those on the page? Or just keep it as is with the IETF tag link (although, of course, all of those might not be supported by NLLB)?

Conversely, you could provide a link to the Serval swagger doc that talks about language codes: f"{os.environ.get("SERVAL_HOST_URL")}/swagger/index.html#/Translation%20Engines/TranslationEngines_Create"

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_email_module.py line 39 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Right, but the class is tied to the db. I can just add fields - not a big deal. Just pushing for simplicity as much as possible.

Whatever implementation you feel is best. A full record of all the information that they put in (including the ClientID, but not the ClientSecret) would in my mind be the best set of information. It should also include the engine and build ID in Serval. The failure or complete emails should just reference the build name that they gave us (which may not be unique) and the engine and build ID (which are).

@johnml1135
Copy link
Collaborator

samples/ServalApp/builds.db line at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

OK, I've removed it. In the past (and ONLY with sqlite dbs), I've been instructed to check in empty dbs for ease, but it really doesn't make a huge difference in this case - even with streamlit's oddities.

If there is no empty DB, will SQLite crash, or will it automatically create one? If it crashes, can you add code to automatically create an empty file if it doesn't exist?

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 153 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Alrighty. I'll leave it then as is. I don't think cancelling a build is necessary at this time given the use-cases and the simplicity of the app, but if you both think I should add it, I can as a "You already have a running build. Would you like to cancel it first?"

Let's cross the bridge when we come to it. I like your idea though about asking the user first.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r3, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @ddaspit and @Enkidu93)

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @ddaspit and @johnml1135)


samples/ServalApp/.flake8 line 5 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

per-file-ignores = **/*.pyi:E252,E301,E302,E305,E501,E701,E704,E741,F401,F811,F821
exclude =
.git,
pycache,
.venv

May I ask why this is necessary? Doesn't flake8 know to only examine .py files? I thought it did. Also, I don't intend to check in the pycache and I'm not running flake8 on startup, so it that needed?


samples/ServalApp/pyproject.toml line 9 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Why not 3.9.7? Why not >=3.11?

There's a conflict among the dependencies. This was poetry's recommended solution.


samples/ServalApp/pyproject.toml line 14 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Obviously, you can update the versions. But there should be dev dependencies.

[tool.poetry.group.dev.dependencies]
pytest = "^7.4.2"
black = "^23.3.0"
flake8 = "^3.9.2"
isort = "^5.9.3"
ipykernel = "^6.7.0"
jupyter = "^1.0.0"
pyright = "^1.1.331"

OK, should I be using 'pyright'? Do you want me to write tests? (it'd be kind of difficult given that it's pretty much all UI) Why jupyter etc.? (Or is this just copy-pasted from machine.py?) I'll go ahead and add the ones I think make sense, and you can correct me.


samples/ServalApp/serval_app.py line 100 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

That should be fine. It will just be for us to look at. As for filtering out bad characters - it may not actually matter, as long as there is appropriate escaping and any json or string representations are not messed up.

OK.


samples/ServalApp/serval_app.py line 153 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Let's cross the bridge when we come to it. I like your idea though about asking the user first.

OK, sounds good 👍.


samples/ServalApp/serval_app.py line 166 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Conversely, you could provide a link to the Serval swagger doc that talks about language codes: f"{os.environ.get("SERVAL_HOST_URL")}/swagger/index.html#/Translation%20Engines/TranslationEngines_Create"

True. I wonder if that defeats some of the purpose though given swagger's complexity and the fact that we're trying to hide that. I could include it as a "For more information, see..." kind of thing?


samples/ServalApp/serval_app.py line 153 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

As this is not for many users - probably one per group. I will assume that the creds will only be shared between a known small group of users (and they can see everything on the Serval site itself if they share creds). I would just want security between credentials.
Yes, unique per Email:ClientID combo.

OK, the only security complication there is that it would require storing the client ids in the db. That should be alright, correct?


samples/ServalApp/serval_app.py line 95 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Any reason for all the prints rather than actually using a logger? Then it would be more straightforward to log the exception.

I should just use the streamlit logger if possible - good call.


samples/ServalApp/serval_email_module.py line 39 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Whatever implementation you feel is best. A full record of all the information that they put in (including the ClientID, but not the ClientSecret) would in my mind be the best set of information. It should also include the engine and build ID in Serval. The failure or complete emails should just reference the build name that they gave us (which may not be unique) and the engine and build ID (which are).

OK, so you do or do not want to have a list of file names as well?


samples/ServalApp/builds.db line at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

If there is no empty DB, will SQLite crash, or will it automatically create one? If it crashes, can you add code to automatically create an empty file if it doesn't exist?

Yes, I already took care of it. It will now recreate as needed. The complication (and the reason I was avoiding it) is because streamlit is awkward in that it reruns the entire file at times, so it just adds more complexity and simultaneous db sessions to check and possibly create each time, but it really shouldn't be an issue.

@Enkidu93
Copy link
Collaborator Author

.vscode/settings.json line 35 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

"editor.formatOnSave": true

I believe there's a general setting (line 3 of this file) that takes care of this. Format on save works on my end.

@johnml1135
Copy link
Collaborator

.vscode/settings.json line 35 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I believe there's a general setting (line 3 of this file) that takes care of this. Format on save works on my end.

I only saw the diff - now I see it.

@johnml1135
Copy link
Collaborator

samples/ServalApp/.flake8 line 5 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

May I ask why this is necessary? Doesn't flake8 know to only examine .py files? I thought it did. Also, I don't intend to check in the pycache and I'm not running flake8 on startup, so it that needed?

If it is ok without it, then let it be. I was just comparing to Machine.py.

@johnml1135
Copy link
Collaborator

samples/ServalApp/pyproject.toml line 9 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

There's a conflict among the dependencies. This was poetry's recommended solution.

Good enough- as long as it is intentional.

@johnml1135
Copy link
Collaborator

samples/ServalApp/pyproject.toml line 14 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

OK, should I be using 'pyright'? Do you want me to write tests? (it'd be kind of difficult given that it's pretty much all UI) Why jupyter etc.? (Or is this just copy-pasted from machine.py?) I'll go ahead and add the ones I think make sense, and you can correct me.

Add the ones that make sense - that you need for your development. We don't need testing.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 95 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I should just use the streamlit logger if possible - good call.

Yes, that sounds good.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 12 files reviewed, 13 unresolved discussions (waiting on @ddaspit and @johnml1135)


.vscode/settings.json line 35 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

I only saw the diff - now I see it.

Done.


samples/ServalApp/.flake8 line 5 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

If it is ok without it, then let it be. I was just comparing to Machine.py.

Done.


samples/ServalApp/pyproject.toml line 9 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Good enough- as long as it is intentional.

Done.


samples/ServalApp/pyproject.toml line 14 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Add the ones that make sense - that you need for your development. We don't need testing.

Done.


samples/ServalApp/serval_app.py line 37 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

(Sorry, the line numbers changed after reformatting). Yes, I can rename it.

Done.


samples/ServalApp/serval_app.py line 153 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

OK, the only security complication there is that it would require storing the client ids in the db. That should be alright, correct?

I went ahead with this. Should be fine. I


samples/ServalApp/serval_app.py line 95 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Yes, that sounds good.

Done.


samples/ServalApp/serval_email_module.py line 39 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

OK, so you do or do not want to have a list of file names as well?

I went ahead and included it all - including file names.


samples/ServalApp/serval_email_module.py line 69 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Here is where all the info about the build should be dumped - all the files uploaded, the name chosen, the source and target language, etc.

Done.

@Enkidu93
Copy link
Collaborator Author

samples/ServalApp/serval_app.py line 101 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

@ddaspit - what are the range of paratext zip extensions?

I went ahead and added just .tar(.gz) until further notice.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 166 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

True. I wonder if that defeats some of the purpose though given swagger's complexity and the fact that we're trying to hide that. I could include it as a "For more information, see..." kind of thing?

That sounds good. If it's easy, it should be easy. If it's challenging, there should be the ability to find the right answer.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_app.py line 101 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I went ahead and added just .tar(.gz) until further notice.

good enough.

@johnml1135
Copy link
Collaborator

samples/ServalApp/serval_email_module.py line 39 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I went ahead and included it all - including file names.

good enough.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93)

@Enkidu93
Copy link
Collaborator Author

samples/ServalApp/serval_app.py line 166 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

That sounds good. If it's easy, it should be easy. If it's challenging, there should be the ability to find the right answer.

Done.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

@johnml1135 johnml1135 merged commit 3ac210c into main Oct 24, 2023
1 check passed
@ddaspit ddaspit deleted the serval_app_init branch October 24, 2023 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Serval App
4 participants