Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 based case store #11

Merged
merged 155 commits into from
Nov 12, 2024
Merged

s3 based case store #11

merged 155 commits into from
Nov 12, 2024

Conversation

jonenst
Copy link
Contributor

@jonenst jonenst commented Jun 28, 2023

Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

Does this PR already have an issue describing the problem?

NO

What kind of change does this PR introduce?

Feature

What is the current behavior?

cases stored in folder

What is the new behavior (if this is a feature change)?
cases stored in external s3 api

Does this PR introduce a breaking change or deprecate an API?

  • The Breaking Change or Deprecated label has been added
  • The migration guide has been updated in the github wiki (What changes might users need to make in their application due to this PR?)
    NO

Other information:

not finished, need all API to be efficient. For now everything is downloaded everytime

Signed-off-by: HARPER Jon <[email protected]>
@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot E 1 Security Hotspot
Code Smell A 26 Code Smells

84.9% 84.9% Coverage
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@ghazwarhili ghazwarhili changed the title [WIP] s3 based case store s3 based case store Oct 26, 2023
Copy link
Contributor

@FranckLecuyer FranckLecuyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests: OK
Code review:

  • remove some code smells
  • fix security hotspot
  • fix DCO

REHILI Ghazwa (Externe) added 6 commits November 5, 2023 17:03
Signed-off-by: Rehili Ghazwa <[email protected]>
Signed-off-by: Rehili Ghazwa <[email protected]>
Signed-off-by: Rehili Ghazwa <[email protected]>
Signed-off-by: Rehili Ghazwa <[email protected]>
Signed-off-by: Rehili Ghazwa <[email protected]>
aws:
s3:
# using host ip address for s3 endpoint because spring cloud aws 2.x doesn't allow
# spring.cloud.aws.s3.path-style-access-enabled (only spring cloud aws 3.x), to be revisisted ?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be revisted now ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
Copy link
Collaborator

@etiennehomer etiennehomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code ok.
Tests ok for filesystem and S3

Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
return new CaseException(Type.TEMP_FILE_PROCESS, "Error processing temporary case file: " + uuid, e);
}

public static CaseException importZipContent(UUID uuid, Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importZipContent and processTempFile unused to be removed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

public Set<String> listName(UUID caseUuid, String regex) {
List<String> fileNames;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundancy calls of getOriginalFilename
String originalFilename = getOriginalFilename(caseUuid);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

List<String> fileNames;
if (isCompressedCaseFile(getOriginalFilename(caseUuid))) {
// For a compressed file basename.xml.gz, listName() should return ['basename.xml']. That's why we remove the compression extension to the filename.
fileNames = List.of(removeExtension(getOriginalFilename(caseUuid), "." + getCompressionFormat(caseUuid)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to eliminate the need for if-else then if ...

return Set.of(removeExtension(originalFilename, "." + getCompressionFormat(caseUuid)))
                    .stream()
                    .filter(name -> name.matches(regex))
                    .collect(Collectors.toSet());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the actual form of the code allow to filter with the regex at only one place and the logic is easier to comprehend that way

fileNames = List.of(removeExtension(getOriginalFilename(caseUuid), "." + getCompressionFormat(caseUuid)));
} else {
List<S3Object> s3Objects = getCaseFileSummaries(caseUuid);
fileNames = s3Objects.stream().map(obj -> Paths.get(obj.key()).toString().replace(CASES_PREFIX + caseUuid.toString() + DELIMITER, "")).toList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use getCaseName(caseUuid) to replace Paths.get(obj.key()).toString().replace(CASES_PREFIX + caseUuid.toString() + DELIMITER, "")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what you want to do. Here, we need to clean the key from the bucket name and the caseUuid. We can't do much with getCaseName(caseUuid)

// the original archive name has to be filtered.
fileNames = fileNames.stream().filter(name -> !name.equals(getOriginalFilename(caseUuid))).toList();
// each subfile hase been gzipped -> we have to remove the gz extension (only one, the one we added).
fileNames = fileNames.stream().map(name -> removeExtension(name, GZIP_EXTENSION)).toList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fileNames = fileNames.stream()
                    .filter(name -> !name.equals(originalFilename))
                    .map(name -> removeExtension(name, GZIP_EXTENSION))
                    .collect(Collectors.toList());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// To optimize copy, files to copy are not downloaded on the case-server. They are directly copied on the S3 server.
CopyObjectRequest copyObjectRequest = CopyObjectRequest.builder()
.sourceBucket(bucketName)
.sourceKey(CASES_PREFIX + sourcecaseUuid + "/" + fileName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DELIMITER

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.sourceBucket(bucketName)
.sourceKey(CASES_PREFIX + sourcecaseUuid + "/" + fileName)
.destinationBucket(bucketName)
.destinationKey(CASES_PREFIX + caseUuid + "/" + fileName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DELIMITER

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

try {
s3Client.copyObject(copyObjectRequest);
} catch (S3Exception e) {
throw new ResponseStatusException(HttpStatus.NOT_FOUND, "Source file " + caseUuid + "/" + fileName + NOT_FOUND);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DELIMITER

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (!objectsToDelete.isEmpty()) {
DeleteObjectsRequest deleteObjectsRequest = DeleteObjectsRequest.builder()
.bucket(bucketName)
.delete(Delete.builder().objects(objectsToDelete).build())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.delete(delete -> delete.objects(objectsToDelete))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (!objectsToDelete.isEmpty()) {
DeleteObjectsRequest deleteObjectsRequest = DeleteObjectsRequest.builder()
.bucket(bucketName)
.delete(Delete.builder().objects(objectsToDelete).build())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.delete(delete -> delete.objects(objectsToDelete))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

int firstSlash = key.indexOf(DELIMITER);
int secondSlash = key.indexOf(DELIMITER, firstSlash + 1);
return key.substring(secondSlash + 1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both methods parseUuidFromKey and parseFilenameFromKey are finding the indices of firstSlash and secondSlash, which are the same calculations in both. We can calculate these indices only once and reuse them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it's worth it. If we wanted to do this, we should create a pojo with all the parsed info and then someone can read the differents parts..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes too complicated. let's keep it simple

return getCaseMetaDataEntity(caseUuid).getOriginalFilename();
}

// key format is "gsi-cases/UUID/filename"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// key format is "gsi-cases/UUID/filename"
// key format is "gsi-cases/UUID/path/to/file"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


private List<S3Object> getCaseFileSummaries(UUID caseUuid) {
List<S3Object> files = getCasesSummaries(uuidToKeyPrefix(caseUuid));
if (files.size() > 1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to remove

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

return files;
}

private List<CaseInfos> infosFromDownloadCaseFileSummaries(List<S3Object> objectSummaries) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename remove "Download" we should use it only for fileaccess to better understand the code ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method removed

Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
Signed-off-by: Etienne Homer <[email protected]>
Copy link

sonarqubecloud bot commented Nov 7, 2024

@etiennehomer etiennehomer merged commit ed4c357 into main Nov 12, 2024
4 checks passed
@etiennehomer etiennehomer deleted the case_s3 branch November 12, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.