Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A download server for linux packages for AdoptOpenJDK #1002

Closed
aahlenst opened this issue Nov 19, 2019 · 11 comments
Closed

A download server for linux packages for AdoptOpenJDK #1002

aahlenst opened this issue Nov 19, 2019 · 11 comments
Assignees
Milestone

Comments

@aahlenst
Copy link
Contributor

We've been using an Artifactory instance sponsored by JFrog for roughly half a year to host Linux packages of AdoptOpenJDK. There's work underway to host our flavour of JMC there, too. I'd like to reconsider whether we're on the right track here:

  • What are our needs?
  • Do we favour a managed solution like Artifactory?
    • Is Artifactory the right solution, after all?
  • Do we favour to host it oursevles?

There are various reason I'd like to reconsider our choice of Artifactory:

  • We're dependant on the external provider, in this case JFrog.
  • We don't have full control over the URL (it's https://adoptopenjdk.jfrog.io/). As a consequence, migrating to a different service (if we ever have to do it) is going to be a pain and take a long time.
  • The automatic generation of Debian package indices is broken (upstream issue) rendering our automation inoperative. The ETA for a fix is in the next two quarters (according to JFrog).
  • The support for Eclipse update sites (JMC) isn't great.
  • We might want to host things that don't fit into Artifactory at all, like packages for Alpine Linux.

My objective is to collect a list of requirements first so that we can check the various options out before coming up with an actionable proposal.

@aahlenst
Copy link
Contributor Author

From the perspective of the Linux packages:

  • We've amassed approx. 20 GB of Debian/Ubuntu packages and 180 GB of RPMs since May 2019 (6 months). I expect that we need around 500 GB of storage per year for releases alone. In the first months, we didn't build all OpenJ9 variants. We have 5 TBs of nightly builds.
  • I'm leaning towards using upstream tooling to host the package feeds, i.e reprepro and createrepo. This means I'd need a server that has a complete local copy of all the packages so that the package indexes and metadata can be generated and signed with our GPG key (increased security required).

@thegreystone
Copy link

thegreystone commented Nov 22, 2019

Sounds excellent. Having the JMC update sites on a download server would be great. Something along the lines of:

https://<baseurl>/jmc/updatesites/latest/ide/
https://<baseurl>/jmc/updatesites/latest/rcp/
https://<baseurl>/jmc/updatesites/7.0.0/ide/
https://<baseurl>/jmc/updatesites/7.0.0/rcp/
https://<baseurl>/jmc/updatesites/7.1.0/ide/
https://<baseurl>/jmc/updatesites/7.1.0/rcp/

@thegreystone
Copy link

Note that once we have published the update sites, we should re-spin and re-publish the application builds, including 7.0.0 and 7.1.0, with correct overrides for the URLs. Then it will finally be possible to install the optional plug-ins. :)

@aahlenst
Copy link
Contributor Author

Rough idea using AWS terminology:

architecture_sketch

The Jenkins nodes push build artifacts to an upload server using restricted SFTP. The upload server keeps a local copy of all files. It is responsible for generating package indices and signing files. This cannot be done on Jenkins nodes because reprepro needs all packages on a local disk to generate the package indices. The upload server syncs its local copy of all files with a S3 bucket. From there, our users download the files via Cloudfront.

The AdoptOpenJDK GPG key needs to be stored on the upload server. Therefore, it has to be locked down.

Questions:

  • Does that look okay to you?
  • Although I used AWS terminology, what would be our preferred provider?
  • Does SFTP work for everyone?

As soon as we have a proposal everybody is happy with, I'll do a test setup so that we can verify that it actually works as expected.

@thegreystone
Copy link

thegreystone commented Dec 2, 2019

Sound reasonable to me, but I'm not directly involved in these part. Patrick (@reinhapa), what do you think?

@reinhapa
Copy link

reinhapa commented Jan 9, 2020

I have no specific opinion about this, but I will need some help getting the update sites to be working later down the process though...

@thegreystone
Copy link

@aahlenst - when do you think the test setup will be available?

@aahlenst
Copy link
Contributor Author

I cannot give any estimates. Won't happen until mid of February for sure except someone steps up to help. Happy to talk anyone through it.

@aahlenst
Copy link
Contributor Author

Requirements we have:

  • Support for deb, rpm, apk (Alpine), Eclipse P2
  • 500 GB of releases per year
  • 2 TB of nightly builds per year
  • 20 TB of bandwidth per month via CDN
  • Custom domain via SSL (e.g., packages.adoptopenjdk.net)

The storage and bandwidth requirements are estimates. It's very hard to get that info out of Artifactory.

I did some further research on options:

  • Self-hosting with OSS (createrepo and friends) is rather expensive because of storage and bandwidth requirements. We need a full local view of the entire package trees to generate and sign indexes. So we need some TBs of block storage. Just to fulfill the requirements for one year, we'd need to spend around 3200$ for the machine at AWS (t3a.medium with 3 TB of EBS, no backup). Bandwidth via Cloud Front is another 20000$ per year (seems a bit high?). We might be able to reduce the spending on bandwidth with Cloudflare, Fastly, ... Backup and people to operate that would come on top. I looked at Hetzner, too, where we host Jenkins. They would be significantly cheaper, but they do not really have good backup facilities for that amount of data. Another drawback of self-hosting: No API.
  • Self-hosting with Sonatype Nexus Pro: Would be cheaper because Nexus Pro can use S3 (and only S3, no Azure Blob Storage or something like that) and S3 is significantly cheaper than EBS. Nexus OSS does not support S3. Drawbacks: We'd still have to operate everything ourselves and Sonatype did not seem that interested to work with us.
  • Using GitHub, Azure Artifacts, AWS CodeArtifact, GCP ArtifactRegistry: Do not support the formats we need.
  • PackageCloud.io: They do not support P2, APK, but still interesting.
  • CloudSmith: They do not support P2, but are willing to add it.

@lskillen
Copy link

Hey folks / @aahlenst; Lee from @cloudsmith-io here. We're happy to help if we can. We're firm believers in data portability and reducing vendor lock-in, which is why we offer things like the custom domains support. If P2 is critical, we can see about prioritising it for you.

@karianna karianna changed the title A download server for AdoptOpenJDK A download server for linux packages for AdoptOpenJDK Jun 16, 2020
@karianna karianna added this to the June 2020 milestone Jun 16, 2020
@Haroon-Khel Haroon-Khel modified the milestones: June 2020, July 2020 Jul 3, 2020
@Haroon-Khel Haroon-Khel modified the milestones: July 2020, August 2020 Aug 18, 2020
@Haroon-Khel Haroon-Khel modified the milestones: August 2020, October 2020 Oct 5, 2020
@Haroon-Khel Haroon-Khel modified the milestones: February 2021, March 2021 Mar 2, 2021
@Haroon-Khel Haroon-Khel modified the milestones: March 2021, April 2021 Apr 6, 2021
@Haroon-Khel Haroon-Khel modified the milestones: April 2021, May 2021 May 18, 2021
@sxa
Copy link
Member

sxa commented Apr 4, 2024

I'm going to close this since we're currently remaining with JFrog with Fastly fronting it, although that is giving intermittent HTTP/403 responses for some users as per adoptium/adoptium-support#923

@sxa sxa closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants