Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry hangfire jobs #197

Merged
merged 9 commits into from
Jul 2, 2024
Merged

Retry hangfire jobs #197

merged 9 commits into from
Jul 2, 2024

Conversation

johnml1135
Copy link
Collaborator

@johnml1135 johnml1135 commented Apr 30, 2024

#158


This change is Reviewable

@johnml1135 johnml1135 requested a review from ddaspit April 30, 2024 18:11
@codecov-commenter
Copy link

codecov-commenter commented Apr 30, 2024

Codecov Report

Attention: Patch coverage is 49.26591% with 311 lines in your changes missing coverage. Please review.

Project coverage is 67.04%. Comparing base (1011777) to head (e52be9d).

Files Patch % Lines
...Machine.AspNetCore/Services/SmtTransferBuildJob.cs 0.00% 108 Missing ⚠️
...chine.AspNetCore/Services/ServalPlatformService.cs 0.00% 52 Missing ⚠️
...chine.AspNetCore/Services/ClearMLMonitorService.cs 0.00% 40 Missing ⚠️
...ore/Services/ServalPlatformOutboxMessageHandler.cs 57.57% 27 Missing and 1 partial ⚠️
...spNetCore/Services/MessageOutboxDeliveryService.cs 77.77% 24 Missing and 2 partials ⚠️
...NetCore/Configuration/IMachineBuilderExtensions.cs 0.00% 24 Missing ⚠️
src/SIL.Machine.AspNetCore/Services/FileSystem.cs 0.00% 13 Missing ⚠️
...achine.AspNetCore/Services/MessageOutboxService.cs 84.21% 7 Missing and 2 partials ⚠️
...Core/Configuration/IServiceCollectionExtensions.cs 0.00% 3 Missing ⚠️
...Machine.AspNetCore/Services/ModelCleanupService.cs 33.33% 2 Missing ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #197      +/-   ##
==========================================
- Coverage   67.30%   67.04%   -0.26%     
==========================================
  Files         447      455       +8     
  Lines       35432    35925     +493     
  Branches     4736     4762      +26     
==========================================
+ Hits        23848    24087     +239     
- Misses      10496    10745     +249     
- Partials     1088     1093       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, I would like to generalize the transactional outbox. There are other places where I would like to use it.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs line 270 at r1 (raw file):

                        )
                );
                o.AddRepository<PlatformMessage>("platform_message_outbox");

The naming convention for collections is a plural noun, so if we change the name of the doc to OutboxMessage, it would be outbox_messages.


src/SIL.Machine.AspNetCore/Models/PlatformMessage.cs line 3 at r1 (raw file):

namespace SIL.Machine.AspNetCore.Models;

public record PlatformMessage : IEntity

We might want to use this for other messages in the future, so I would just call this Message or OutboxMessage.


src/SIL.Machine.AspNetCore/Models/PlatformMessage.cs line 7 at r1 (raw file):

    public string Id { get; set; } = "";
    public int Revision { get; set; } = 1;
    public required string Method { get; init; }

I think an enum would be better.


src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 5 at r1 (raw file):

namespace SIL.Machine.AspNetCore.Services;

public class PlatformMessageOutboxService(

You should separate the hosted service from the service that enqueues messages. The hosted service should only run on a single server (the engine server). Messages can be enqueued from any server. You won't be able to use the _messageInOutbox flag. You also won't need to lock in the ProcessMessagesAsync method.


src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 67 at r1 (raw file):

        bool allMessagesSuccessfullySent = true;
        IReadOnlyList<PlatformMessage> messages = await _messages.GetAllAsync();

Optimally, we would watch the collection for changes using the SubscribeAsync method. If we use polling, it might be more efficient to use ExistsAsync to check if the outbox is empty.


src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 88 at r1 (raw file):

                switch (message.Method)
                {
                    case "BuildStartedAsync":

You can drop the Async from the names.


src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 205 at r1 (raw file):

        // log error
        message.Attempts++;
        await _messages.UpdateAsync(m => m.Id == message.Id, b => b.Set(m => m.Attempts, message.Attempts));

You can use the Inc operator.

@johnml1135
Copy link
Collaborator Author

Is there a good way to? I couldn't think of a way to convert function calls to strings (or even binary data) that could be stored in a mongoDB. Unless there is some package that can auto-grab the GRPC calls and make them into an outbox, I don't have a good solution. I couldn't find one in my quick search.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs line 270 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

The naming convention for collections is a plural noun, so if we change the name of the doc to OutboxMessage, it would be outbox_messages.

Done.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Models/PlatformMessage.cs line 3 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We might want to use this for other messages in the future, so I would just call this Message or OutboxMessage.

Done.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Models/PlatformMessage.cs line 7 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I think an enum would be better.

Done.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 5 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should separate the hosted service from the service that enqueues messages. The hosted service should only run on a single server (the engine server). Messages can be enqueued from any server. You won't be able to use the _messageInOutbox flag. You also won't need to lock in the ProcessMessagesAsync method.

Ok - I'll subscribe to changes in the MongoDB instead to have a quick turn time.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 67 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Optimally, we would watch the collection for changes using the SubscribeAsync method. If we use polling, it might be more efficient to use ExistsAsync to check if the outbox is empty.

Switched to a subscribe - and checking first with Exists.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 88 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You can drop the Async from the names.

Ok - I see that Async is not relevant when we are queueing up the message to be sent.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 205 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You can use the Inc operator.

Done

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs line 183 at r2 (raw file):

                .UseRecommendedSerializerSettings()
                .UseMongoStorage(connectionString, GetMongoStorageOptions())
                .UseFilter(new AutomaticRetryAttribute { Attempts = 3 })

Here is what is likely the issue: https://discuss.hangfire.io/t/recurring-jobs-do-not-automatically-get-retried-after-application-crash-net-core-service/9160/2

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some ideas on how we could generalize it. You don't need to worry about it right now.

Also, inserting a message to the outbox and the corresponding model update need to occur in the same transaction. For example, the IBuildJobService.BuildJobStartedAsync call and the IPlatformService.BuildStartedAsync call should occur in the same transaction.

Reviewed 7 of 8 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine.AspNetCore/Services/IMessageOutboxService.cs line 5 at r3 (raw file):

public interface IMessageOutboxService
{
    public Task EnqueueMessageAsync(

We need to ensure that messages are delivered in order for a particular build. I would pass in a groupId to this method that is used to ensure that all messages with the same group id are delivered in order. In the case of a build, the groupId would be the build id.


src/SIL.Machine.AspNetCore/Services/ServalPlatformService.cs line 118 at r3 (raw file):

            );
        }
        await _outboxService.EnqueueMessageAsync(

I'm concerned about the size of the MongoDB document. This could potentially get very big and there is a size limit on a MongoDB document. Maybe we could keep the pretranslate.trg.json file until we have successfully sent the pretranslations to Serval.


src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 39 at r3 (raw file):

            if (message.Attempts > 3) // will fail the 5th time
            {
                await PermanentlyFailedMessage(message, e);

It could potentially fail 5 times very quickly. An expiration time would be better.

@johnml1135
Copy link
Collaborator Author

Ok fixed.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/IMessageOutboxService.cs line 5 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We need to ensure that messages are delivered in order for a particular build. I would pass in a groupId to this method that is used to ensure that all messages with the same group id are delivered in order. In the case of a build, the groupId would be the build id.

Major updates.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 39 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

It could potentially fail 5 times very quickly. An expiration time would be better.

We should talk about this - what is the desired behavior if a message is unable to be sent and why? Why would we be retrying multiple times and what might resolve itself - and how we should respond?

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 38 at r4 (raw file):

            m => m.GroupId,
            m => m,
            (key, element) => element.OrderBy(m => m.Created).ToList()

From MongoDB:

Returns a new [ObjectId](https://www.mongodb.com/docs/manual/reference/bson-types/#std-label-objectid). The 12-byte [ObjectId](https://www.mongodb.com/docs/manual/reference/bson-types/#std-label-objectid) consists of:

-   A 4-byte timestamp, representing the ObjectId's creation, measured in seconds since the Unix epoch.
    
-   A 5-byte random value generated once per process. This random value is unique to the machine and process.
    
-   A 3-byte incrementing counter, initialized to a random value.

Since we only have one process, it will be the seconds and the counter that determine the order, which should work. In contrast, UtcNow relies on the underlying system timer, which may be up to 15ms - potentially creating a conflict.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 39 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

We should talk about this - what is the desired behavior if a message is unable to be sent and why? Why would we be retrying multiple times and what might resolve itself - and how we should respond?

4 days - basically, how long could the Serval server conceivably be down?

@johnml1135 johnml1135 force-pushed the retry_hangfire_jobs branch from f41b5a1 to 3e19b1d Compare May 8, 2024 13:48
@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/ServalPlatformService.cs line 118 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I'm concerned about the size of the MongoDB document. This could potentially get very big and there is a size limit on a MongoDB document. Maybe we could keep the pretranslate.trg.json file until we have successfully sent the pretranslations to Serval.

Saved the large files to hard drive

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 24 files reviewed, 5 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 38 at r4 (raw file):
This seems like a relevant quote from the documentation:

While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:

  • Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and

  • Are generated by clients, which may have differing system clocks.

Unfortunately, this means we can't rely on the ObjectId. This means we need to use a sequence number to guarantee the ordering. We have two options:

  1. Create an outbox collection that stores the current outbox state. The outbox state would contain the next sequence number. Use the UpdateAsync method to increment the sequence number and return the new value. The MongoDB update should be atomic.
  2. Create a unique index on the sequence number field in the message collection. In a loop, query the database for the message with the max sequence number, increment the number, and attempt to insert the message with the new sequence number.

I think I would prefer option 1, since it doesn't require a loop and could be used to store other metadata about the outbox. It would also allow us to have multiple outboxes, which could be useful.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 38 at r4 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This seems like a relevant quote from the documentation:

While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:

  • Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and

  • Are generated by clients, which may have differing system clocks.

Unfortunately, this means we can't rely on the ObjectId. This means we need to use a sequence number to guarantee the ordering. We have two options:

  1. Create an outbox collection that stores the current outbox state. The outbox state would contain the next sequence number. Use the UpdateAsync method to increment the sequence number and return the new value. The MongoDB update should be atomic.
  2. Create a unique index on the sequence number field in the message collection. In a loop, query the database for the message with the max sequence number, increment the number, and attempt to insert the message with the new sequence number.

I think I would prefer option 1, since it doesn't require a loop and could be used to store other metadata about the outbox. It would also allow us to have multiple outboxes, which could be useful.

Since we only have one client generating these (that is, for one build or engine ID, which is what we are concerned about) I believe the current solution should be acceptable.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 24 files reviewed, 5 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 38 at r4 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Since we only have one client generating these (that is, for one build or engine ID, which is what we are concerned about) I believe the current solution should be acceptable.

The engine server and the build server can generate messages. Even if we had only one client generating messages, I don't want our design to limit our ability to scale.

Copy link
Collaborator Author

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 25 files reviewed, 4 unresolved discussions (waiting on @ddaspit)


src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 38 at r4 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

The engine server and the build server can generate messages. Even if we had only one client generating messages, I don't want our design to limit our ability to scale.

1 - done.

@johnml1135 johnml1135 force-pushed the retry_hangfire_jobs branch from f174807 to 2aa8fe8 Compare May 20, 2024 17:54
@johnml1135 johnml1135 force-pushed the retry_hangfire_jobs branch from 2aa8fe8 to ba69059 Compare June 3, 2024 17:48
@johnml1135 johnml1135 force-pushed the retry_hangfire_jobs branch from ba69059 to 1b274be Compare June 20, 2024 15:56
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 16 files at r4, 7 of 9 files at r5, 3 of 4 files at r6, 16 of 16 files at r8, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs line 267 at r8 (raw file):

                );
                o.AddRepository<OutboxMessage>("outbox_messages");
                o.AddRepository<Sequence>("outbox_message_index");

The collection should be named message_outboxes to be consistent with the collection naming conventions that we currently use.


src/SIL.Machine.AspNetCore/Models/Sequence.cs line 3 at r8 (raw file):

namespace SIL.Machine.AspNetCore.Models;

public record Sequence : IEntity

This should be named MessageOutbox.


src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 234 at r8 (raw file):

            async (ct) =>
            {
                await using (await @lock.WriterLockAsync(cancellationToken: ct))

I don't think that the lock will work properly inside of a transaction. The lock needs to be acquired and released outside of the transaction.


src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 30 at r8 (raw file):

            )
        )!;
        string id = Sequence.IndexToObjectIdString(outboxIndex.CurrentIndex);

The order index and the message id should be separate properties.


src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 38 at r8 (raw file):

            RequestContent = requestContent
        };
        if (requestContent.Length > MaxDocumentSize)

I would prefer to always save the pretranslations to disk, so that the behavior is consistent and easier to debug. This would also allow us to avoid deserializing and serializing the pretranslations an extra time. The RequestContent could be the path to the pretranslations json file.


src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 5 at r8 (raw file):

namespace SIL.Machine.AspNetCore.Services;

public class MessageOutboxHandlerService(

MessageOutboxDeliveryService is a clearer name.


src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 17 at r8 (raw file):

    private readonly ILogger<MessageOutboxHandlerService> _logger = logger;
    protected TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(10);
    protected TimeSpan MessageExpiration { get; set; } = TimeSpan.FromDays(4);

This should be configurable.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Models/Sequence.cs line 3 at r8 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This should be named MessageOutbox.

So the message index should be named MessageOutbox and the message itself should be OutboxMessage? That appears confusing. What about OutboxIndex?

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 234 at r8 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I don't think that the lock will work properly inside of a transaction. The lock needs to be acquired and released outside of the transaction.

Done

johnml1135 and others added 4 commits June 27, 2024 16:20
- correctly handle scoped services in background services
- abstract file handling of content in outbox services
- merge Id and Context in Outbox model
- Consistently use strings for outbox message method identifiers
- split up tests into true unit tests
- fix properties in outbox models
- fix lifetime of new services
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to try to get this in, so I went ahead and fixed a couple of issues in the PR instead of adding more comments.

Reviewed 1 of 24 files at r11.
Reviewable status: 17 of 57 files reviewed, 7 unresolved discussions

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 86 at r13 (raw file):

            }

            var dataAccessContext = scope.ServiceProvider.GetRequiredService<IDataAccessContext>();

What is the benefit of changing how the dataAccessContext is accessed?

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/MessageOutboxDeliveryService.cs line 23 at r13 (raw file):

    {
        using IServiceScope scope = _services.CreateScope();
        var messages = scope.ServiceProvider.GetRequiredService<IRepository<OutboxMessage>>();

What is the benefit of a custom scope creation for running this?

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 63 at r13 (raw file):

            {
                await using Stream fileStream = _fileSystem.OpenWrite(filePath);
                await contentStream.CopyToAsync(fileStream, cancellationToken);

By doing this rather then moving the file, all the data needs to stream through the server rather than the address just changing. This is true for local files, S3 files and memory files. Is this extra work acceptable?

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 24 files at r11, 36 of 36 files at r13, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 86 at r13 (raw file):

Previously, johnml1135 (John Lambert) wrote…

What is the benefit of changing how the dataAccessContext is accessed?

This was a required change. The data access context has a scoped lifetime and this service has a singleton lifetime, so it isn't possible to inject directly into this class.


src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 63 at r13 (raw file):

Previously, johnml1135 (John Lambert) wrote…

By doing this rather then moving the file, all the data needs to stream through the server rather than the address just changing. This is true for local files, S3 files and memory files. Is this extra work acceptable?

Yes, it should be fine. We are only copying the file down from the S3 bucket, which is something that we have to do at some point. We aren't performing extra serialization/deserialization or anything like that. My plan is to pull this out of the Machine library so that it can be used in other servers like Serval. This change makes it more generic, abstracts the file handling from other services, and less coupled to the S3 bucket.


src/SIL.Machine.AspNetCore/Services/MessageOutboxDeliveryService.cs line 23 at r13 (raw file):

Previously, johnml1135 (John Lambert) wrote…

What is the benefit of a custom scope creation for running this?

This was a required change. See my comment in the ClearMLMonitorService.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs line 15 at r13 (raw file):

    protected override async Task DoWorkAsync(IServiceScope scope, CancellationToken cancellationToken)
    {
        var engines = scope.ServiceProvider.GetRequiredService<IRepository<TranslationEngine>>();

So, let's see if I understand. We have a few recurrent tasks, namely, ClearMLMonitorService, MessageOutboxDeliveryService, ModelCleanupSerivce and SmtTransferEngineCommitService. - and we want to create services for them each time they are called rather than having one long-living instance of the service for the whole class. My best guess is that by having the scoped database access, etc. being short-lived saves on memory.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Models/OutboxMessage.cs line 8 at r13 (raw file):

    public int Revision { get; set; } = 1;
    public required int Index { get; init; }
    public required string OutboxRef { get; init; }

OutboxRef doesn't work - see previous comment. The OutboxRef needs to be a 24 digit hex value to be an ID, but it is rather a name. The error for trying to run an E2E test is:

machine-engine-cntr  |       MongoDB.Bson.BsonSerializationException: An error occurred while serializing the OutboxRef property of class SIL.Machine.AspNetCore.Models.OutboxMessage: 'ServalPlatform' is not a valid 24 digit hex string.

The issue is that the 24 digit value is not pre-known globally but rather created. We need a name that is not the 24 digit hex value, hence my solution was to have an OutboxName that was used for keying off of and using the type of the message types to not have to pass around as many values.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 86 at r13 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This was a required change. The data access context has a scoped lifetime and this service has a singleton lifetime, so it isn't possible to inject directly into this class.

It was possible, but I agree that it is not preferred. - https://ercanerdogan.medium.com/how-to-use-scoped-services-within-singleton-services-in-asp-net-216de79b4a8d#:~:text=The%20main%20goal%20of%20using,available%20in%20the%20singleton%20service.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine.AspNetCore/Models/OutboxMessage.cs line 8 at r13 (raw file):

Previously, johnml1135 (John Lambert) wrote…

OutboxRef doesn't work - see previous comment. The OutboxRef needs to be a 24 digit hex value to be an ID, but it is rather a name. The error for trying to run an E2E test is:

machine-engine-cntr  |       MongoDB.Bson.BsonSerializationException: An error occurred while serializing the OutboxRef property of class SIL.Machine.AspNetCore.Models.OutboxMessage: 'ServalPlatform' is not a valid 24 digit hex string.

The issue is that the 24 digit value is not pre-known globally but rather created. We need a name that is not the 24 digit hex value, hence my solution was to have an OutboxName that was used for keying off of and using the type of the message types to not have to pass around as many values.

Oops, you are correct. Let me fix that.


src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 86 at r13 (raw file):

Previously, johnml1135 (John Lambert) wrote…

It was possible, but I agree that it is not preferred. - https://ercanerdogan.medium.com/how-to-use-scoped-services-within-singleton-services-in-asp-net-216de79b4a8d#:~:text=The%20main%20goal%20of%20using,available%20in%20the%20singleton%20service.

This article explains the solution that we are already using here.


src/SIL.Machine.AspNetCore/Services/ModelCleanupService.cs line 15 at r13 (raw file):

Previously, johnml1135 (John Lambert) wrote…

So, let's see if I understand. We have a few recurrent tasks, namely, ClearMLMonitorService, MessageOutboxDeliveryService, ModelCleanupSerivce and SmtTransferEngineCommitService. - and we want to create services for them each time they are called rather than having one long-living instance of the service for the whole class. My best guess is that by having the scoped database access, etc. being short-lived saves on memory.

The scoped lifetime is used for services that need to live during a single request. In order to properly handle data access contexts and transactions during a request, repositories and the data access context need to have a scoped lifetime. Unfortunately, this means that when they are used in non-scoped contexts, such as singleton services, we have to do the extra work of first creating a scope and then retrieving the scoped services.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r14, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine.AspNetCore/Models/OutboxMessage.cs line 8 at r13 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Oops, you are correct. Let me fix that.

Done

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.Serval.EngineServer/appsettings.json line 36 at r14 (raw file):

  },
  "MessageOutbox": {
    "MessageExpirationInHours": 48

This is no longer relevant - it is not hours but "Timeout"

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 57 at r14 (raw file):

                HasContentStream = contentStream is not null
            };
        string filePath = Path.Combine(_options.CurrentValue.DataDir, outboxMessage.Id);

Two issues:

System.IO.DirectoryNotFoundException: Could not find a part of the path '/app/src/SIL.Machine.Serval.JobServer/outbox/668312727c4e5395711e3e2a

The first, is that the DataDir is never created, so there is a "directory not found" exception. The second is that the folder you are using (the app folder) the docker containers don't have write access to. We will either have to give explicit permissions to the docker containers or change to a different directory.

One potential solution is to use the UserDirectory:

Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);

and then create the folder when starting the delivery service.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 57 at r14 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Two issues:

System.IO.DirectoryNotFoundException: Could not find a part of the path '/app/src/SIL.Machine.Serval.JobServer/outbox/668312727c4e5395711e3e2a

The first, is that the DataDir is never created, so there is a "directory not found" exception. The second is that the folder you are using (the app folder) the docker containers don't have write access to. We will either have to give explicit permissions to the docker containers or change to a different directory.

One potential solution is to use the UserDirectory:

Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);

and then create the folder when starting the delivery service.

Moreover, the pretranslations are created on the job server but are sent on the engine server. They don't share a folder - we will need an external data store - either we re purpose the EnginesDir or create something new.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 57 at r14 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Moreover, the pretranslations are created on the job server but are sent on the engine server. They don't share a folder - we will need an external data store - either we re purpose the EnginesDir or create something new.

im working on it now

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 57 at r14 (raw file):

Previously, johnml1135 (John Lambert) wrote…

im working on it now

I figured this would just go under the /var/lib/machine volume mount. Something like /var/lib/machine/messages


src/SIL.Machine.Serval.EngineServer/appsettings.json line 36 at r14 (raw file):

Previously, johnml1135 (John Lambert) wrote…

This is no longer relevant - it is not hours but "Timeout"

Good catch.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 57 at r14 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I figured this would just go under the /var/lib/machine volume mount. Something like /var/lib/machine/messages

Should we create the folder in the code or can we expect that the folder has already been created?

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 57 at r14 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Should we create the folder in the code or can we expect that the folder has already been created?

We should create it in code. I pushed a commit.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r15, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 56 of 58 files reviewed, all discussions resolved (waiting on @johnml1135)


src/SIL.Machine.Serval.EngineServer/appsettings.json line 36 at r14 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Good catch.

Done

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r16, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)

Copy link
Collaborator Author

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r3, 1 of 17 files at r4, 4 of 11 files at r5, 11 of 16 files at r8, 4 of 24 files at r11, 36 of 36 files at r13, 1 of 1 files at r14, 3 of 5 files at r15, 2 of 2 files at r16, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)

@johnml1135 johnml1135 merged commit d622eb8 into master Jul 2, 2024
4 checks passed
@johnml1135 johnml1135 deleted the retry_hangfire_jobs branch July 2, 2024 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants