Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colossus s3 api #5150

Open
wants to merge 61 commits into
base: master
Choose a base branch
from

Conversation

ignazio-bovo
Copy link
Contributor

@ignazio-bovo ignazio-bovo commented May 16, 2024

The implementation is the same as suggested here:
#4981 (last comment by Mokhtar)

Instead of utilising a package in order to handle different types of cloud provider I have created a provider agnostic interface which integrates in the colossus logic

I believe this will allow us to easily add support for other providers in the future, by using their official sdk.
This also was useful in order to keep the changes to existing colossus logic to a minimal level (as the logic is already complex by itself).

The files localCache has been enhanced in order to keep track of which files are on the bucket and which files are on the local volume.
Only "accepted" data objects are stored on the bucket

Ignazio Bovo added 25 commits April 19, 2024 14:10
…rovements

made sure that every SyncTask execute internal invocation throws an error to be handled by the top level SyncTask.execute explicitly to guarantee no throw
… and add logic to getFile

get file now will try to pipe the stream into the response object with appropriate header settings
every accepted object will be uploaded to he storage bucket if the connection is enabled. Basically the logic stays the same as before for asset in the pending folder it's just the destination that has changed now : either LocalVolume/uploadsDir or storageBucket
Copy link
Member

@mnaamani mnaamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start.

The awsConnectionHandler implementation needs to be fixed to properly do uploading to the s3 storage.

The integration bash scripts and docker-compose files to run tests need some changes to get them working. I have made local changes on my machine to get tests working. I can open a PR to your branch if you like.

One jest unit test is failing.

Some limitations I see in the current implementation is that only a single object store can be configured at a time. I think that is fine since object stores are more easily scalable and operator should not need to frequently change which bucket to use. It might be practical from an organizational point of view perhaps to be able to specify a subfolder in the s3 bucket to use for storing the dataobjects.

ENABLE_STORAGE_PROVIDER=true

## Specify the cloud storage provider to use:
CLOUD_STORAGE_PROVIDER_NAME=aws
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENABLE_STORAGE_PROVIDER and CLOUD_STORAGE_PROVIDER_NAME probably make more sense as command line arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we can reduce from two params to just one, the provider name. If not empty string it means we want to use a cloud storage provider.

@@ -167,6 +172,7 @@
"ensure": "yarn format && yarn lint --fix && yarn build",
"checks": "tsc --noEmit --pretty && prettier ./src --check && yarn lint",
"start": "./bin/run server",
"test:integration:cloudProvider": "jest --detectOpenHandles './src/services/storageProviders/tests/**/syncService.test.ts'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running the test I got one failed test:

Mokhtars-MBP:storage-node mokhtar$ yarn test:integration:cloudProvider
yarn run v1.22.19
$ jest --detectOpenHandles './src/services/storageProviders/tests/**/syncService.test.ts'
  Invalid testPattern ./src/services/storageProviders/tests/**/syncService.test.ts supplied. Running all tests instead.
  Invalid testPattern ./src/services/storageProviders/tests/**/syncService.test.ts supplied. Running all tests instead.
 PASS  src/services/storageProviders/tests/integration/setupLocalCache.test.ts (13.784 s)
 PASS  src/services/storageProviders/tests/integration/acceptObject.test.ts
 FAIL  src/services/storageProviders/tests/integration/acceptObjectService.test.ts
  ● accept pending object flow test suite › accept pending object happy path › should call accept objects/maxTxBatchSize times

    expect(jest.fn()).toHaveBeenCalledTimes(expected)

    Expected number of calls: 2
    Received number of calls: 0

      70 |
      71 |       expect(acceptPendingDataObjectsBatch).toHaveBeenCalledTimes(2)
    > 72 |       expect(acceptObject).toHaveBeenCalledTimes(2)
         |                            ^
      73 |       expect(addDataObjectIdToCache).toHaveBeenCalledTimes(2)
      74 |     })
      75 |   })

      at Object.<anonymous> (src/services/storageProviders/tests/integration/acceptObjectService.test.ts:72:28)

Test Suites: 1 failed, 2 passed, 3 total
Tests:       1 failed, 23 passed, 24 total

.env Outdated Show resolved Hide resolved
Comment on lines 10 to 13
- DEFAULT_REGION=${AWS_REGION}
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- LOCALSTACK_ENDPOINT=${LOCALSTACK_ENDPOINT}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at https://docs.localstack.cloud/references/configuration/
default_region is deprecated, there are no key configuration, localstack just ignores secret and key ids for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here I think you meant to configure LOCALSTACK_HOST. LOCALSTACK_ENDPOINT is not a configuration option for localstack anymore

tests/network-tests/run-tests.sh Outdated Show resolved Hide resolved
storage-node/src/services/helpers/acceptObject.ts Outdated Show resolved Hide resolved
* @returns A promise that resolves to an instance of IConnectionHandler if the provider is enabled, otherwise undefined.
* @throws An error if the storage cloud provider type is invalid or unsupported.
*/
export async function parseConfigOptionAndBuildConnection(): Promise<IConnectionHandler | undefined> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be an async function.

* @param filePath - The file path of the file to upload.
* @returns A promise that resolves when the upload is complete or rejects with an error.
*/
uploadFileToRemoteBucket(filename: string, filePath: string): Promise<any>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should define how to handle case when an key already exists on the store.

storage-node/src/services/webApi/controllers/filesApi.ts Outdated Show resolved Hide resolved

echo "Staring storage infrastructure"
export LOCALSTACK_ENABLED=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOCALSTACK_ENABLED is not passed to colossus-1 and colossus-2 in the docker-compose.yml file so they are not using the localstack, and instead public aws. So they fail to access s3 bucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants