Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2i2c Access Policy Hackdays: every-2-months Mondays 12-1pm PT #7

Open
jules32 opened this issue Jan 13, 2024 · 31 comments
Open

2i2c Access Policy Hackdays: every-2-months Mondays 12-1pm PT #7

jules32 opened this issue Jan 13, 2024 · 31 comments

Comments

@jules32
Copy link
Contributor

jules32 commented Jan 13, 2024

Notes doc

Hello! This is short mini hackday for us to meet about 2i2c access policy for the NASA Openscapes Hub and beyond.

January 29 from 3-5 ET; initial time set with @erinmr @yuvipanda @betolink @jmunroe; and open to others; I'm happy to send a calendar invite if you want to attend.

Background
Erin and Yuvi have been working on this throughout the fall; this repository's README describes a lot of this work, which is also linked in the Earthdata Cloud Cookbook. We had many exciting pair-wise conversations about ideas forward at AGU. This hackday is a chance to come together to share and develop a plan to move forward in 2024.

Some starting topics to shape the agenda:

  • Finishing documentation and supporting 2i2c user onboarding through Github teams
  • "Fledging": where do researchers go when they leave the hub after a workshop?
@jules32 jules32 moved this to In Progress This Week in MainPlanning Jan 13, 2024
@eeholmes
Copy link

Some comments that might be helpful re the "fledging" step. I don't use the Openscapes 2i2c hub very often except when helping with teaching for Openscapes. But I use the Docker images for the Openscapes hub all the time. If I want to teach the Earthdata Cloud Book content, then I want to use the Openscapes image. It saves me so much work to do that and I can use those images to fire up a development environment in many different ways.

So as you think about "fledging" people, I would think about the importance of the (docker) images (corn and py-rocket). Having those, which you need any way, is a huge resource for those who are "fledging". Those images (and the files to make them) are provider agnostic. For example, coiled.io is certainly not required when one wants to fledge from Openscapes. I think Carl's work on how to use one image (or files to create the image) as the base for entry to spinning up compute platforms in different ways is a great example.

But maintaining images with the idea that others will use them does require a bit more effort. It doesn't mean that you create the image as a "product" like the rocker project does. But it does mean a little more effort at clean image code (files) and some documentation. Kind of like code that one writes to "get the job done" without expectation that anyone else will look at it versus code that you plan for others to look at and re-use.

Something quite unique is the py-rocket image actually. That gets both a Python geospatial and R geospatial environment. Perhaps a kind of big image, but it is quite powerful to be able to jump back and forth between those environments or combine them in Quarto.

@yuvipanda
Copy link
Contributor

/cc @cboettig as well, particularly in reference to the images that @eeholmes is talking about. I've been working with him a little on improving the python support in upstream rocker (rocker-org/rocker-versioned2#718 and friends). Would be useful to see what more we can do to improve that :)

@jules32 jules32 moved this from In Progress This Week to Backlog / Upcoming in MainPlanning Jan 15, 2024
@jules32 jules32 moved this from Backlog / Upcoming to Upcoming This Year in MainPlanning Jan 15, 2024
@yuvipanda
Copy link
Contributor

A few areas I'd like to explore

Home directory usage policy

There isn't really a stated policy on how the home directory is to be used, and due to technical limitations it's really hard to actually limit the amount of space people use (although it is possible to monitor it via grafana). I'd love for an explicitly written down home directory usage policy to exist, and find ways to implement it as well.

https://github.com/berkeley-dsep-infra/datahub/blob/staging/docs/policy/storage-retention.rst is an example of one for the UC Berkeley JupyterHubs I used to maintain. It is specifically fitted for their use case, so I don't recommend adopting it straight up. But can act as an inspiration.

Work on allowing people with unaccepted GitHub invites to join

This requires engineering work to switch from using a GitHub OAuth app to a GitHub App (see the difference). This requires some amount of engineering work, and will need to be prioritized appropriately.

"fledging" - where do people go after?

This to me is the most exciting and important concept to develop further strategically. I don't have any active thoughts right now, but will think about it some more.

@yuvipanda
Copy link
Contributor

Another idea!

Automate the 'application' process some more?

I would love to understand how people 'apply' to get access to the hub. This is something the pangeo project has struggled with as well (https://discourse.pangeo.io/t/unable-to-sign-up-for-pangeo/3974), so it would be useful to at the least describe this process better (I know a google sheet is involved somewhere) and see if we can build a community based solution that works for multiple people

@eeholmes
Copy link

I am managing one 2i2c hub and also multiple homegrown hubs.

Managing storage

This has been really hard and I am still pretty much unable to set up shared storage and persistent user storage the way I want. The whole pvc and pv and setting up the storage side on the provider side is complicated and mysterious. I managed to get help from Azure for setting up a shared drive but that was 2 hours of 1:1 help. I haven't been able to get any help since.

Managing storage costs

I quickly discovered that storage costs were going to take up the lions share of costs. Right now on Azure my 100Gb of default user persistent storage is $8 a month per user -- even if they have almost nothing there. Meh. I am doing something wrong. But I think users don't actually need persistent cloud storage in the hub. I feel like we could brainstorm ideas that would achieve the effect of persistent storage wo paying a cloud provider for it.

Ideas

  • Minimal persistent storage for most users. Rather users clone repos. The cloning happens automatically when you spin up the hub.
  • User persistent storage on a timer. Like 48 hrs default except for hackweeks. So norm is to not keep random junk in the user volume. Push to GitHub.
  • Use shared drive better so that users don't all have copies of the same thing.

Fledging

I think here we need to experiment with lots of different ideas. Here are some of the ideas that I am experimenting with. I think hubs are great for building community or for intensive shared development (hackweeks, workshops, classes, teams working intensively on something). Devcontainers allow individuals to use environments developed by others and reduce the set-up barrier/wall. Note many people I work with do not have admin access on their computers. So barrier = brick wall. Installing things is just not going to happen except for a minority.

  • Lots more teaching of workshops to observe how people use what they have learned for next steps.
  • Helping people set up their own hubs. I plan to return to India in September and hope to set up a few hubs on local servers at various institutes of friends.
  • Experimenting more with Carl's devcontainers.

@yuvipanda
Copy link
Contributor

I quickly discovered that storage costs were going to take up the lions share of costs.

@eeholmes I assume you were using the default PVC provisioner on zero to jupyterhub, which gives each user a persistent storage device regardless of wether they use it or not. This is super expensive and unsustainable. On AWS, we use EFS instead, and this is much cheaper - you only pay for what you use. On Azure, we use Azure File which is much much better, as there's one shared pool for everyone rather than one disk per user. jupyterhub/zero-to-jupyterhub-k8s#421 is the long running upstream issue about providing better documentation for this setup.

@eeholmes
Copy link

@yuvipanda Yeah I am using the default provisioner and yes I discovered that I don't want to use that. Fortunately NOAA is paying me to experiment and learn things so I didn't have to pay for my mistake. Actually it's only $400 or so but I clearly need to figure out a better way. I will look at the link to learn about the better set-up. Unfortunately it's been hard for me to get help from Azure support on this, but I can solve that. I need to have someone above me to tell Azure to pay attention to my requests.

@jules32
Copy link
Contributor Author

jules32 commented Jan 23, 2024

Thanks all for these ideas. How does this sound for a starting agenda?

Purpose: Huddle about current status of NASA Openscapes 2i2c policies and plan next steps based on specific community needs and what has worked in other hubs
Outcomes: Have shared goals going forward, some next steps planned
Process: 2 hour call that is 1 hr discussion towards a plan, 45 mins hacktime, 15 mins wrapup

Draft Agenda

  • Quick intros (5 mins)
  • Onboarding to 2i2c (20 mins) - Erin & Yuvi
    • Current process - Erin screenshares
      • Access Policy. Apply via Google Form > admin adds to GitHub Team (3 tiers) > participant accepts the invite during workshop
    • Next steps vision - Yuvi
      • i.e. unaccepted GitHub invites, improve application process
    • Discussion
  • Maintaining users in 2i2c (10 mins) -
    • Current: Managing storage and storage costs
    • Vision: Tech & policy questions: what do other groups do?
      home directory usage policy?
  • "Fledging" from 2i2c (20 mins) -
    • Big Question: Where do researchers go when they leave the hub after a workshop?
    • Current: we don't remove people
    • Goal: researchers can continue to iterate their workflows in the Cloud and eventually add their own credit card?
    • How to automate removing from Hub & how long to people stay?
    • Maintaining docker images
  • Hacktime in breakout rooms (45 mins)
  • Shareouts & next steps (15 mins)

@jules32 jules32 moved this from This Quarter Upcoming to This Week In Progress in MainPlanning Jan 24, 2024
@jules32
Copy link
Contributor Author

jules32 commented Jan 31, 2024

A great hackday, thank you all! Here are the notes (published view) and the next steps:

  • Meet again for 1 hour in 6 weeks - Julie
  • start using s3 buckets - Luis
  • Bri 2 tickets - done
  • CRON job ticket - Luis
  • GitHub Classrooms - someone to explore?
  • Home directory policies - if/else (Yuvi can advise on tech)
  • Create nasa-openscapes-workshops organization - Julie can create, then follow up with 2i2c uses this, is there documentation?

@stefaniebutland stefaniebutland moved this from This Week In Progress to This Quarter Upcoming in MainPlanning Feb 7, 2024
@stefaniebutland
Copy link
Member

Next hackday is March 11

@stefaniebutland stefaniebutland changed the title Hackday: 2i2c Access Policy for NASA Openscapes Hub - Jan 29 Hackday: 2i2c Access Policy for NASA Openscapes Hub Feb 20, 2024
@yuvipanda
Copy link
Contributor

Is there an agenda for today?

@jules32
Copy link
Contributor Author

jules32 commented Mar 11, 2024

Yes! I've just started it; Google Doc linked from our calendar invite. Please add other topics!

@jules32
Copy link
Contributor Author

jules32 commented Mar 11, 2024

Some of the conversation from today's Hackday:

Onboarding to 2i2c

@BriannaLind
Copy link

@yuvipanda @jules32 is this meeting happening March 22 sometime?

@jules32
Copy link
Contributor Author

jules32 commented Apr 17, 2024

Yes! April 22, 12-1pm. You should have an invite already. The agenda can be updates and follow ups from what folks have been working on.

@BriannaLind
Copy link

@jules32 okay Ill update the board with that info: #7

@BriannaLind
Copy link

Next tag up: April 22, 2024 12-1 PT

@yuvipanda
Copy link
Contributor

I've invited @batpad, the new JupyterHub team lead for NASA VEDA / GHG center to this as well.

@jules32 jules32 moved this from This Quarter Upcoming to This Week In Progress in MainPlanning Apr 22, 2024
@jules32
Copy link
Contributor Author

jules32 commented Apr 22, 2024

Hi All, Here's today's light agenda with zoom linfo.

I propose we start off with “pitch your update/topic”, briefly < 3 mins so that everyone with an update/question has a chance to share out loud. Then we’ll decide as a group where to discuss/dig in further.

Please feel free to write notes in the Agenda ahead of time!

@jules32
Copy link
Contributor Author

jules32 commented May 6, 2024

Today's call with JupyterHub leads

Tasha Snow (CryoCloud), Alexy Shiklamonkov (NASA), Ramon Ramirez-linan (Open Science Studio at SMCE), Tess Jaffe (NASA FORNACS initiative, Astro data center that is DAAC equivalent), Wei Ji Leong (DevelopmentSeed, VEDA), Yuvi Panda (Jupyter and 2i2c), Julie Lowndes (NASA Openscapes)

Talking points:

Openscapes, NASA Openscapes project -

Our JupyterHub: purpose: build a community across NASA data center support to have a common set of tutorials and teaching approach for new learners. Teaching workshops early and often over 3+ years; and iterating/improving support (tutorials, teaching, and tech).

  • onboarding new learners, meet where they are. earthaccess, docker base images (Python, R, MatLab, BYO image)
  • fledging - where do they go. tech, cost, "portable" base images
  • maintainance - 2i2c Access Policies, GitHub Teams, storage, cost - (grafana, monthly automated reports). workshop-planning. Onboarding new teachers.

Paste in chat:
Earthdata Cloud Cookbook https://nasa-openscapes.github.io/earthdata-cloud-cookbook/; 2i2c Access Policies (stablizing this spring then will be in the Cookbook) https://github.com/NASA-Openscapes/2i2cAccessPolicies

@BriannaLind
Copy link

What is the next date for this meeting? @ebolch @amfriesz I think you may want to attend

@eeholmes
Copy link

Were there notes taken? I missed as I had turned off my alarm for the 3-day holiday!

@BriannaLind
Copy link

Were there notes taken? I missed as I had turned off my alarm for the 3-day holiday!

notes are here

@jules32
Copy link
Contributor Author

jules32 commented May 29, 2024

Our next 6-weekly call is Monday, June 3, 3pm ET. It should already be on Bri and Aaron's calendars but happy to readd and share the zoom link

@BriannaLind
Copy link

Our next 6-weekly call is Monday, June 3, 3pm ET. It should already be on Bri and Aaron's calendars but happy to readd and share the zoom link

Can we please add Erik Bolch ([email protected]) I dont think I will be able to make it bc I am serving on a NASA panel that day.

@jules32
Copy link
Contributor Author

jules32 commented May 30, 2024

Done!

@jules32
Copy link
Contributor Author

jules32 commented Jun 3, 2024

Today's topic will be around #11;

Yuvi (zooming with Julie) has temporarily shut down Hub: Set cluster size to 0. Will reduce standard cost.

When we receive credits: email [email protected] and Slack tag Yuvi.

We will investigate: why the increase in Cloud compute? Following May 30 workshop? What workflows/policies do we need from here?

Image

@jules32 jules32 changed the title Hackday: 2i2c Access Policy for NASA Openscapes Hub Hackdays every-2-months: 2i2c Access Policy for NASA Openscapes Hub Jun 25, 2024
@jules32 jules32 changed the title Hackdays every-2-months: 2i2c Access Policy for NASA Openscapes Hub 2i2c Access Policy Hackdays: every-2-months for NASA Openscapes Hub Jun 25, 2024
@jules32 jules32 changed the title 2i2c Access Policy Hackdays: every-2-months for NASA Openscapes Hub 2i2c Access Policy Hackdays: every-2-months Mondays 12-1pm PT Jun 25, 2024
@jules32
Copy link
Contributor Author

jules32 commented Jul 15, 2024

Hello! Today might be a smaller group, but we can still plan to meet with any updates people have. Andy Teucher is on vacation but I can share about his progress on Monthly AWS usage reports and would love to get feedback.

@jules32
Copy link
Contributor Author

jules32 commented Jul 15, 2024

Ending action items from today:

  • Figure out offboarding process for workshop-hub users using Shared Password. - Julie for first start on this, figure out what we want our policy to be with Mentors, then work with Andy & Yuvi to implement
  • Hackday for storage - Eli, Luis, Carl, Andy, Yuvi - figure out when, maybe fall?
  • Check out the binder and see if and how Openscapes can use it, and what more is needed. - Jim TODO; write documentation explainer/invitation and share

@yuvipanda
Copy link
Contributor

/cc @colliand for last action item :)

@jules32
Copy link
Contributor Author

jules32 commented Aug 26, 2024

A small group today; Andy Teucher, Julie Lowndes, Luis Lopez, Mahsa Jami.

We screenshared and discussed some recent updates (more notes in our doc):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: This Week In Progress
Development

No branches or pull requests

5 participants