From 17e10eb56cd01a5d2bb538df6d93286a55c78258 Mon Sep 17 00:00:00 2001 From: "E. G. Patrick Bos" Date: Thu, 2 Nov 2023 21:20:43 +0100 Subject: [PATCH 01/10] remove prospector from python chapter --- best_practices/language_guides/python.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/best_practices/language_guides/python.md b/best_practices/language_guides/python.md index e48b683a..d49c64dc 100644 --- a/best_practices/language_guides/python.md +++ b/best_practices/language_guides/python.md @@ -107,7 +107,8 @@ pip install -e . ``` The `-e` flag will install your package in editable mode, i.e. it will create a symlink to your package in the installation location instead of copying the package. This is convenient when developing, because any changes you make to the source code will immediately be available for use in the installed version. -Set up continuous integration to test your installation setup. Use `pyroma` (can be run as part of `prospector`) as a linter for your installation configuration. +Set up continuous integration to test your installation setup. +You can use `pyroma` as a linter for your installation configuration. ### Packaging and distributing your package For packaging your code, you can either use `pip` or `conda`. Neither of them is [better than the other](https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/) -- they are different; use the one which is more suitable for your project. `pip` may be more suitable for distributing pure python packages, and it provides some support for binary dependencies using [`wheels`](http://pythonwheels.com). `conda` may be more suitable when you have external dependencies which cannot be packaged in a wheel. @@ -137,16 +138,14 @@ For packaging your code, you can either use `pip` or `conda`. Neither of them is The style guide for Python code is [PEP8](http://www.python.org/dev/peps/pep-0008/) and for docstrings it is [PEP257](https://www.python.org/dev/peps/pep-0257/). We highly recommend following these conventions, as they are widely agreed upon to improve readability. To make following them significantly easier, we recommend using a linter. -Many linters exists for Python, [`prospector`](https://github.com/landscapeio/prospector) is a tool for running a suite of linters, it supports, among others: +Many linters exists for Python. +We have long promoted use of [`prospector`](https://github.com/landscapeio/prospector), a tool for running a suite of linters, including, among others [pycodestyle](https://github.com/PyCQA/pycodestyle), [pydocstyle](https://github.com/PyCQA/pydocstyle), [pyflakes](https://pypi.python.org/pypi/pyflakes), [pylint](https://www.pylint.org/), [mccabe](https://github.com/PyCQA/mccabe) and [pyroma](https://github.com/regebro/pyroma). -* [pycodestyle](https://github.com/PyCQA/pycodestyle) -* [pydocstyle](https://github.com/PyCQA/pydocstyle) -* [pyflakes](https://pypi.python.org/pypi/pyflakes) -* [pylint](https://www.pylint.org/) -* [mccabe](https://github.com/PyCQA/mccabe) -* [pyroma](https://github.com/regebro/pyroma) +However, we have [since 2023 been switching](https://github.com/NLeSC/python-template/issues/336) to [Ruff](https://github.com/astral-sh/ruff). +It is much faster and aims to support most of the functionality that `prospector` does (see the website for the complete function parity overview). +It can be configured in a `pyproject.toml` section. -Make sure to set strictness to `veryhigh` for best results. `prospector` has its own configuration file, like the [.prospector.yml default in the Python template](https://github.com/NLeSC/python-template/blob/main/%7B%7Bcookiecutter.directory_name%7D%7D/.prospector.yml), but also supports configuration files for any of the linters that it runs. Most of the above tools can be integrated in text editors and IDEs for convenience. +Most of the above tools can be integrated in text editors and IDEs for convenience. Autoformatting tools like [`yapf`](https://github.com/google/yapf) and [`black`](https://black.readthedocs.io/en/stable/index.html) can automatically format code for optimal readability. `yapf` is configurable to suit your (team's) preferences, whereas `black` enforces the style chosen by the `black` authors. The [`isort`](http://timothycrosley.github.io/isort/) package automatically formats and groups all imports in a standard, readable way. From ce6405dc16a2921bc77e32ce4d8820ac8a5cdd5e Mon Sep 17 00:00:00 2001 From: "E. G. Patrick Bos" Date: Fri, 3 Nov 2023 08:59:51 +0100 Subject: [PATCH 02/10] python: updates to conda details --- best_practices/language_guides/python.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/best_practices/language_guides/python.md b/best_practices/language_guides/python.md index d49c64dc..27768948 100644 --- a/best_practices/language_guides/python.md +++ b/best_practices/language_guides/python.md @@ -66,7 +66,8 @@ Installation of packages that are not using `wheel`, but have a lot of non-Pytho The disadvantage of Conda is that the package needs to have a Conda build recipe. Many Conda build recipes already exist, but they are less common than the `setuptools` configuration that generally all Python packages have. -There are two main distributions of Conda: [Anaconda](https://docs.anaconda.com/anaconda/install/) and [Miniconda](https://docs.conda.io/projects/continuumio-conda/en/latest/user-guide/install/index.html). Anaconda is large and contains a lot of common packages, like numpy and matplotlib, whereas Miniconda is very lightweight and only contains Python. If you need more, the `conda` command acts as a package manager for Python packages. +There are two main "official" distributions of Conda: [Anaconda](https://docs.anaconda.com/anaconda/install/) and [Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) (and variants of the latter like miniforge, explained below). +Anaconda is large and contains a lot of common packages, like numpy and matplotlib, whereas Miniconda is very lightweight and only contains Python. If you need more, the `conda` command acts as a package manager for Python packages. If installation with the `conda` command is too slow for your purposes, it is recommended that you use [`mamba`](https://github.com/mamba-org/mamba) instead. For environments where you do not have admin rights (e.g. DAS-6) either Anaconda or Miniconda is highly recommended since the installation is very straightforward. @@ -76,8 +77,8 @@ A possible downside of Anaconda is the fact that this is offered by a commercial Do note that since 2020, [Anaconda has started to ask money from large institutes](https://www.anaconda.com/blog/anaconda-commercial-edition-faq) for downloading packages from their [main channel (called the `default` channel)](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html#what-is-a-conda-channel) through `conda`. This does not apply to universities and most research institutes, but could apply to some government institutes that also perform research and definitely applies to large for-profit companies. Be aware of this when choosing the distribution channel for your package. -An alternative installer that avoids this problem altogether because it only installs packages from `conda-forge` by default is [miniforge](https://github.com/conda-forge/miniforge). -There is also a mambaforge version that uses the faster `mamba` by default. +An alternative, community-driven Conda distribution that avoids this problem altogether because it only installs packages from `conda-forge` by default is [miniforge](https://github.com/conda-forge/miniforge). +Miniforge includes both the faster `mamba` as well as the traditional `conda`. ## Building and packaging code From edec21f9b02554a4d8d2d7bb345adbbf7bbcf700 Mon Sep 17 00:00:00 2001 From: "E. G. Patrick Bos" Date: Fri, 3 Nov 2023 09:01:04 +0100 Subject: [PATCH 03/10] python: fix psycopg link --- best_practices/language_guides/python.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best_practices/language_guides/python.md b/best_practices/language_guides/python.md index 27768948..fa12b6ac 100644 --- a/best_practices/language_guides/python.md +++ b/best_practices/language_guides/python.md @@ -288,7 +288,7 @@ It is good practice to restart the kernel and run the notebook from start to fin ### Database Interface -* [psycopg](http://initd.org/psycopg/) is an [PostgreSQL](http://www.postgresql.org) adapter +* [psycopg](https://www.psycopg.org/) is a [PostgreSQL](http://www.postgresql.org) adapter * [cx_Oracle](http://cx-oracle.sourceforge.net) enables access to [Oracle](https://www.oracle.com/database/index.html) databases * [monetdb.sql](https://www.monetdb.org/Documentation/SQLreference/Programming/Python) is [monetdb](https://www.monetdb.org) Python client From cfe939acc9052b1bf66002253f013c0dc2363f59 Mon Sep 17 00:00:00 2001 From: maltelueken Date: Fri, 3 Nov 2023 16:16:53 +0100 Subject: [PATCH 04/10] Replace r-studio with posit in R language guide links --- best_practices/language_guides/r.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/best_practices/language_guides/r.md b/best_practices/language_guides/r.md index a8ab8c7b..299498a0 100644 --- a/best_practices/language_guides/r.md +++ b/best_practices/language_guides/r.md @@ -11,7 +11,7 @@ R is particularly popular in the social, health, and biological sciences where i One of the strengths of R is the large number of available open source statistical packages, often developed by domain experts. For example, R-package [Seewave](http://rug.mnhn.fr/seewave/) is specialised in sound analyses. Packages are typically released on CRAN [The Comprehensive R Archive Network](http://cran.r-project.org). A few remarks for readers familiar with Python: -* Compared with Python, R does not need a notebook to program interactively. In [RStudio](https://www.rstudio.com/), an IDE that is installed separately, the user can run sections of the code by selecting them and pressing Ctrl+Enter. Consequently the user can quickly transition from working with scripts to working interactively using the Ctrl+Enter. +* Compared with Python, R does not need a notebook to program interactively. In [RStudio](https://posit.co/products/open-source/rstudio/), an IDE that is installed separately, the user can run sections of the code by selecting them and pressing Ctrl+Enter. Consequently the user can quickly transition from working with scripts to working interactively using the Ctrl+Enter. * Numbering in R starts with 1 and not with 0. ### Recommended sources of information @@ -30,7 +30,7 @@ To install R check detailed description at [CRAN website](http://cran.r-project. #### IDE R programs can be written in any text editor. R code can be run from the command line or interactively within R environment, that can be started with `R` command in the shell. To quit R environment type `q()`. -[RStudio](http://www.rstudio.com/products/RStudio/) is a free powerful integrated development environment (IDE) for R. It features editor with code completion, command line environment, file manager, package manager and history lookup among others. You will have to install RStudio in addition to installing R. Please note that updating RStudio does not automatically update R and the other way around. +[RStudio](https://posit.co/products/open-source/rstudio/) is a free powerful integrated development environment (IDE) for R. It features editor with code completion, command line environment, file manager, package manager and history lookup among others. You will have to install RStudio in addition to installing R. Please note that updating RStudio does not automatically update R and the other way around. Within RStudio you can work on ad-hoc code or create a project. Compared with Python an R project is a bit like a virtual environment as it preserves the workspace and installed packages for that project. Creating a project is needed to build an R package. A project is created via the menu at the top of the screen. @@ -71,7 +71,7 @@ However, externally contributed plotting packages may offer easier syntax or con In summary, it is good to familiarize yourself with both the basic plotting functions as well as the contributed graphics packages. In theory, the basic plot functions can do everything that ggplot2 can do, it is mostly a matter of how much you like either syntax and how much freedom you need to tailor the visualisation to your use case. ## Building interactive web applications with shiny -Thanks to [shiny.app](http://shiny.rstudio.com) it is possible to make interactive web application in R without the need to write javascript or html. +Thanks to [shiny.app](https://shiny.posit.co/) it is possible to make interactive web application in R without the need to write javascript or html. ## Building reports with knitr [knitr](https://yihui.name/knitr/) is an R package designed to build dynamic reports in R. It's possible to generate on the fly new pdf or html documents with results of computations embedded inside. @@ -116,8 +116,8 @@ R function documentation offers plenty of space to document the functionality, i # Available templates * https://rapporter.github.io/rapport/ -* http://shiny.rstudio.com/articles/templates.html -* http://rmarkdown.rstudio.com/developer_document_templates.html +* https://shiny.posit.co/r/articles/build/templates/ +* https://bookdown.org/yihui/rmarkdown/document-templates.html # Testing, Checking, Debugging and Profiling @@ -130,7 +130,7 @@ See also [checking](http://r-pkgs.had.co.nz/check.html) and [testing](http://r-p Continuous integration should be done with an [online service](../testing.md#Online-services-for-continuous-integration), see [Chapter](../testing.md) on testing. ### Debugging and Profiling -Debugging is possible in RStudio, see [link](https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio). For profiling tips see [link](http://adv-r.had.co.nz/Profiling.html) +Debugging is possible in RStudio, see [link](https://support.posit.co/hc/en-us/articles/205612627-Debugging-with-RStudio). For profiling tips see [link](http://adv-r.had.co.nz/Profiling.html) # Not in this tutorial yet: From 9643c9515c58034cc51375cb8b793751e3722009 Mon Sep 17 00:00:00 2001 From: maltelueken Date: Fri, 3 Nov 2023 16:17:35 +0100 Subject: [PATCH 05/10] Update maintainer of R language guide --- best_practices/language_guides/r.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best_practices/language_guides/r.md b/best_practices/language_guides/r.md index 299498a0..ebe6232e 100644 --- a/best_practices/language_guides/r.md +++ b/best_practices/language_guides/r.md @@ -1,6 +1,6 @@ # What is R? -*Page maintainer: unmaintained* +*Page maintainer: Malte Lüken* [@maltelueken](https://github.com/maltelueken) R is a functional programming language and software environment for statistical computing and graphics: https://www.r-project.org/. From 2a34fe2290a53ddac48dd0a5f20ab5e1ea527110 Mon Sep 17 00:00:00 2001 From: Bouwe Andela Date: Mon, 6 Nov 2023 14:18:35 +0100 Subject: [PATCH 06/10] Fix broken links in bash chapter --- best_practices/language_guides/bash.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/best_practices/language_guides/bash.md b/best_practices/language_guides/bash.md index 67f30610..3b904569 100644 --- a/best_practices/language_guides/bash.md +++ b/best_practices/language_guides/bash.md @@ -86,7 +86,7 @@ Here we list the most commonly used Bash tools that are built to manipulate The nice thing about these tools is that you can combine them by streaming the output of one tool to become the input of the next tool. Have a look at the -[tutorial](https://swcarpentry.github.io/shell-novice/04-pipefilter/index.html) +[tutorial](https://swcarpentry.github.io/shell-novice/04-pipefilter.html) for an introduction. This can be done by creating [pipelines](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Pipelines) @@ -164,7 +164,7 @@ This will bring you all the advantages of a fully-fledged programming language recommended programming language at the Netherlands eScience Center. If you do not mind having an extra dependency and would like to use the features and commands available in the shell from Python, the -[sh](https://amoffat.github.io/sh/) library is a nice option. +[sh](https://sh.readthedocs.io) library is a nice option. Disclaimer: if you are an experienced Bash developer, there might be situations where using a Bash script solves your problem faster or in a more portable way From 6964773e62928e9dd7ee8d3b6016616aa4ca920d Mon Sep 17 00:00:00 2001 From: Bouwe Andela Date: Mon, 6 Nov 2023 14:30:14 +0100 Subject: [PATCH 07/10] Remove nlesc specific chapter --- _sidebar.md | 3 - nlesc_specific/e-infrastructure/das5.md | 175 ------------------ .../e-infrastructure/e-infrastructure.md | 100 ---------- 3 files changed, 278 deletions(-) delete mode 100644 nlesc_specific/e-infrastructure/das5.md delete mode 100644 nlesc_specific/e-infrastructure/e-infrastructure.md diff --git a/_sidebar.md b/_sidebar.md index 66eec357..08d48c94 100644 --- a/_sidebar.md +++ b/_sidebar.md @@ -19,6 +19,3 @@ * [C and C++](/best_practices/language_guides/ccpp.md) * [Fortran](/best_practices/language_guides/fortran.md) * [Contributing to this Guide](/CONTRIBUTING.md) -* NLeSC specific - * [Access to (Dutch) e-Infrastructure](/nlesc_specific/e-infrastructure/e-infrastructure.md) - * [DAS-5](/nlesc_specific/e-infrastructure/das5.md) diff --git a/nlesc_specific/e-infrastructure/das5.md b/nlesc_specific/e-infrastructure/das5.md deleted file mode 100644 index 50c91202..00000000 --- a/nlesc_specific/e-infrastructure/das5.md +++ /dev/null @@ -1,175 +0,0 @@ -# DAS-5 - -*This text is about DAS-5. However, most of the advice should also be -applicable to its successor [DAS-6](https://www.cs.vu.nl/das/home.shtml).* - -This text gives a couple of practical hints to get you started using the -DAS-5 quickly. It is intended for people with little to no experience -using compute clusters. - -First of all, and this is the most important point in this text: read -the usage policy and make sure you understand every word of it: -https://www.cs.vu.nl/das5/usage.shtml - -The DAS-5 consists of multiple cluster sites, the largest one is located -at the VU, which you can reach using by the hostname -`fs0.das5.cs.vu.nl`. The firewall requires that your IP is whitelisted, -which means you will be able to access the DAS from the eScience Center -office, but not directly when you are somewhere else. To use the DAS -from anywhere you can use eduVPN. - -When you login in it means you are logged into the headnode, this node -should not be used for any computational work. The cluster uses a -reservation system, if you want to use any node that is not the head -node, you must use the reservation system to gain access to a compute -node. The reserveration system on DAS-5 is called Slurm, you can see all -running jobs on the cluster using `squeue` and cancel any of your -running jobs with `scancel `. - -The files in your home directory `/home/username/` will be backed up -automatically, if you accidently delete an important file you can email -the maintainer and kindly request him to put back an old version of the -file. If you have to store large data sets put them under -`/var/scratch/username/`, the scratch space is not backed up. - -You can use the command `module` to gain access to a large set of -preinstalled software. Use `module list` to see what modules are -currently loaded and `module avail` to see all available modules. You -can load or unload modules with the 'module load' and `module unload`. -You may want to add some of the modules you frequently use to your -bashrc. Note that all that these modules do is add or remove stuff from -your `PATH` and `LD_LIBRARY_PATH` environment variables. If you need -software that is not preinstalled, you can install it into your home -directory. For installing Python packages, you have to use Anaconda or -`pip install --user`. - -If you want an interactive login on any of the compute nodes through the -reservation system, you could use: `srun -N 1 --pty bash`. The srun -command is used to run a program on a compute node, -N specifies the -number of nodes, --pty specifies this is an interactive job, bash is the -name of the program being launched. This reservation is only cancelled -when you logout of the interactive session, please observe the rules -regarding reservation lengths. - -To access the nodes you've reserved quickly it's a good idea to generate -an ssh key and add your own public key to your 'authorized_keys' file. -This will allow you to ssh to nodes you have reserved without password -prompts. - -To reserve a node with a particular GPU you have to specify to srun what -kind of node you want. I have the following alias in my bashrc, because -I use it all the time: -`alias gpurun="srun -N 1 -C TitanX --gres=gpu:1"` -If you prefix any command with `gpurun` the command will be executed on -one of the compute nodes with an Nvidia GTX Titan X GPU in them. You can -also type `gpurun --pty bash` to get an interactive login on such a -node. - - -## Running Jupyter Notebooks on DAS-5 nodes - -If you have a Jupyter notebook that needs a powerfull GPU it can be -useful to run the notebook not on your laptop, but on a GPU-equipped -DAS-5 node instead. - -### How to set it up - -It can be a bit tricky to get this to work. In short, what you need is -to install jupyter, for example using the following command: -``` -pip install jupyter -``` -And it's recommended that you add this alias to your .bashrc file: -``` -`alias notebook-server="srun -N 1 -C TitanX --gres=gpu:1 bash -c 'hostname; XDG_RUNTIME_DIR= jupyter notebook --ip=* --no-browser'"` -``` -Now you can start the server with the command ``notebook-server``. - -You just need to connect to your jupyter notebook server after this. -The easiest way to do this is to start firefox on the headnode (fs0) and connect to the node that was printed by the ``notebook-server`` command. Depending on what node you got from the scheduler you can go to the address ``http://node0XX:8888/``. For more details and different ways of connecting to the server see the longer explanation below. - -### More detailed explanation - -First of all, you need to install jupyter into your DAS-5 account. I -recommend using miniconda, but any Python environment works. If you are -using the native Python 2 installation on the DAS don't forget to add -the `--user` option to the following pip command. You can install -Jupyter using: `pip install jupyter`. - -Now comes the tricky bit, we are going to connect to the headnode of the DAS5 and reserve -a node through the reservation system and start a notebook server on that node. -You can use the following alias for that, I suggest storing it in your .bashrc file: -`alias notebook-server="srun -N 1 -C TitanX --gres=gpu:1 bash -c 'hostname; XDG_RUNTIME_DIR= jupyter notebook --ip=* --no-browser'"` - -Let's first explain what this alias actually does for you. -The first part of the command is similar to the `gpurun` alias explained above. If you -do not require a GPU in your node, please remove the `-C TitanX --gres=gpu:1` part. -Now let's take a look at what the rest of this command is doing. - -On the node that we reserve through `srun` we execute the following bash command: -`hostname; XDG_RUNTIME_DIR= jupyter notebook --ip=* --no-browser'` -This is actually two commands, the first only prints the name of the host, -which is important because you'll need to connect to that node later. The -second command starts with unsetting the environment variable XDG_RUNTIME_DIR. - -On the DAS, we normally do not have access to the default directory -pointed to by the environment variable XDG_RUNTIME_DIR. The Jupyter notebook -server wants to use this directory for storing temporary files, if -XDG_RUNTIME_DIR is not set it will just use /tmp or something for -which it does have permission to access. - -The notebook server that we start would normally only listen to -connections from localhost, which is the node on which the notebook -server is running. That is why we pass the `--ip=*` option, to configure the -notebook server to listen to incoming connections from the headnode. Be warned -that this is actually highly insecure and should only be used within trusted -environments with strict access control, like the DAS-5 system. - -We also need the ``--no-browser`` no browser option, because we do not want to run the browser on the DAS node. - -You can type ``notebook-server`` now to actually reserve a node and start the jupyter notebook server. - -Now that we have a running Jupyter notebook server, there are 2 different approaches to connect to our notebook server: - 1. run your browser locally and setup a socks proxy to forward your http traffic to the headnode of the DAS - 2. starting a browser on the headnode of the DAS and use X-forwarding to access that browser - -Approach 1 is very much recommended, but if you can't get it to work, you can defer to option 2. - -### Using a SOCKS proxy - -In this step, we will create an ssh tunnel that we will use to forward -our http traffic, effectively turning the headnode of the DAS into your -private proxy server. Make sure you that you can connect to the headnode -of the DAS, for example using a VPN. -If you are using another ssh host in between, it makes sense to configure your SSH client with a proxyjump or use proxycommand. -The following command is rather handy, you might want to -save it in your bashrc: -`` alias dasproxy="ssh -fNq -D 8080 @fs0.das5.cs.vu.nl" `` -Do not forget to replace `` with your own username on the DAS. - -Option `-f` stands for background mode, which means the process started with this command will keep running in the background, `-N` means there is no command to be executed on the remote host, and `-q` stands for quiet mode, meaning that most output will be surpressed. - -After executing the above ssh command, start your local browser and -configure your browser to use the proxyserver. Manually configure the proxy -as a "Socks v5" proxy with the address 'localhost' and port 8080. -Do not forget to also tick the box to also proxy DNS traffic over this proxy. - -After changing these settings navigate to the page `http://node0XX:8888/`, -where `node0XX` should be replaced with the hostname of the node you -are running the notebook server on. Now in the browser open your -notebook and get started using notebooks on a remote server! - -### Using X-Forwarding - -Using another terminal, create an `ssh -X` connection to the headnode of -the DAS-5. Note that, it is very important that you use `ssh -X` for the -whole chain of connections to node, including the one used to connect to -the headnode of the DAS and any number of intermediate servers you are -using. This also requires that you have an X server on your local -machine, if you are running Windows I recommend installing VirtualBox -with a Linux GuestOS. - -On the headnode type `firefox http://node0XX:8888/`, where `node0XX` -should be replaced with the hostname of the node you are running the -notebook server on. Now in the browser open your notebook and get -started using notebooks on a remote server! diff --git a/nlesc_specific/e-infrastructure/e-infrastructure.md b/nlesc_specific/e-infrastructure/e-infrastructure.md deleted file mode 100644 index 0e247425..00000000 --- a/nlesc_specific/e-infrastructure/e-infrastructure.md +++ /dev/null @@ -1,100 +0,0 @@ -# Access to (Dutch) e-Infrastructure - -To successfully run a project and to make sure the project is sustainable after it has ended, it is important to choose the e-Infrastructure carefully. Examples of e-Infrastructure used by eScience Center projects are High Performance Computing machines (Supercomputers, Grids, Clusters), Clouds, data storage infrastructure, and web application servers. - -In general PI's will already have access to (usually local) e-Infrastructure, and are encouraged to think about what e-Infrastructure they need in the project proposal. Still, many also request our help in finding suitable e-Infrastructure during the project. - -Which infrastructure is best very much depends on the project, so we will not attempt to describe the optimal infrastructure here. Instead, we describe what is most commonly used, and how to gain access to this e-Infrastructure. - -Lack of e-Infrastructure should never be a reason for not being able to to a project (well). If you ever find yourself without proper e-Infrastructure, come talk to the Efficient Computing team. We should be able to get you going quickly. - -## SURF - -SURF is the most obvious supplier of e-Infrastructure for Netherlands eScience Center projects. For all e-Infrastructure needs we usually first look to SURF. This does not mean SURF is our exclusive e-Infrastructure provider. We use whatever infrastructure is best for the project, provided by SURF or otherwise. - -### Getting access to SURF infrastructure - -In general access to SURFsara resources is free of charge for scientists in The Netherlands. For most infrastructure gaining access is a matter of filling in a simple web-form, which you can do yourself on behalf of the scientists in the project. Exceptions are the Cartesius and Lisa, for which a more involved process is required. For these machines, only the PI of a project can submit (or anyone else with an NWO Iris account). - -The Netherlands eScience Center also has access to the infrastructure provided by SURFnet. Access is normally done on a per-organization basis, so may vary from one project partner to the next. - -### Available systems at SURF - -Here we list some of the most likely to be used resources at SURF. See the [overview of SURF services and products](https://www.surf.nl/en/research-it), and [detailed information on the SURFsara infrastructure](https://userinfo.surfsara.nl/systems). - -SURFsara: - -- **Snellius**: Snellius is the Dutch national supercomputer. Snellius is a general purpose capability system and is designed to be a well balanced system. If you need one or more of: many cores, large symmetric multi-processing nodes, high memory, a fast interconnect, a lot of work space on disk, or a fast I/O subsystem then Snellius is the machine of choice. -- **Lisa**: National Cluster. Similar machines as the Cartesius (the previous Dutch national supercomputer), without the interconnect (about 8000 cores in total). Storage also more limited. Lisa is typically designed to run lots of small (1 to 16 core) applications at the same time. -- **Grid**: Same machines again, now with a Grid Middleware. Not recommended for use in eScience Center projects. -- **HPC Cloud**: On demand computing infrastructure. Nice if you need longer running services, or have a lot of special software requirements. -- **Hadoop**: Big Data analytics framework. -- **BeeHub**: Lots of storage with a webDAV interface. -- **Elvis**: Remote rendering cluster. Creates a remote desktop session to a Linux machine with powerful Nvidia Graphics installed. -- **Data Archive**: Secure, long-term storage of research data on tape. Access to archive included with Cartesius and Lisa project accounts. - -SURFnet: - -- **SURFconext**: Federated identity management. Allows scientists to login to services using their home organization account. Best known example is SURFspot. Can be added to custom services as well. -- **SURFdrive**: Dropbox-like service hosted by SURF. - -Ask questions to: helpdesk@surfsara.nl. - -## DAS-5 - -The Netherlands eScience Center participates in the [DAS-5 (Distributed ASCI Supercomputer)](http://www.cs.vu.nl/das5), a system for experimental computer science. Though not intended for production work, it is great for developing software on, especially HPC, parallel and/or distributed software. - -DAS-5 consists of 6 clusters at 5 different locations in the Netherlands, with a total of about 200 machines, over 3000 cores, and about 800Tb total storage. These clusters are connected with dedicated lightpaths. Internally, each cluster has a fast interconnect. DAS-5 also contains an ever increasing amount of accelerators (mostly GPU's). - -DAS-5 is explicitly meant as an experimentation platform: any job should be able to run instantly, long queue times should be avoided. Running long jobs is therefore not allowed during working hours. During nights and weekends these rules do not apply. See [the usage policy](http://www.cs.vu.nl/das5/usage.shtml). - -Any eScience Center employee can get a DAS-5 account, usually available within a few hours. - -## Security and convenience when committing code to GitHub from a cluster - -When accessing a cluster, it is generally [safer to use a pair of keys than to login using a username and password](https://superuser.com/questions/303358/why-is-ssh-key-authentication-better-than-password-authentication). There is a [guide on how to setup those keys](https://www.cyberciti.biz/faq/how-to-set-up-ssh-keys-on-linux-unix/). Make sure you encrypt your private key and that it is not automatically decrypted when you login to your local machine. -Make a separate pair of keys to access your GitHub account following [GitHub's instructions](https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/). It involves [uploading your public key to your GitHub account](https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/) and [testing your connection](https://help.github.com/articles/testing-your-ssh-connection/). - -When committing code from a cluster to GitHub, one needs to store an encrypted private key in the $HOME/.ssh directory on the cluster. This is inconvenient, because it requires submitting a password to unlock the private key. This password has to be resubmitted when SSHing to a local node from the head node. To bypass this inconvenience [SSH agent forwarding](https://developer.github.com/guides/using-ssh-agent-forwarding/) is recommended. It is very simple. On your local machine, make a $HOME/.ssh/config file to contain the following: -``` -Host example.com - ForwardAgent yes -``` -Replace example.com by the head node of your cluster, i.e. the node you use to login to. -Next, -``` -chmod 600 $HOME/.ssh/config. -``` -Done! - -The only remaining problem is that SSH keys cannot be used when git cloning was done using https instead of SSH, but that can be [corrected](http://stackoverflow.com/questions/6565357/git-push-requires-username-and-password): -``` -git remote set-url origin git@github.com:username/repo.git -``` -## Commercial Clouds - -If needed a project can use commercial cloud resources, normally only if all SURF resources do not meet the requirements. As long as the costs are within limits these can come out of the eScience Center general project budget, for larger amounts the PI will need to provide funding. - -We do not have an official standard commercial cloud provider, but have the most experience with Amazon AWS. - -## Procolix - -If a more long term infrastructure is needed which cannot be provided by SURF, the default company we use for managed hosting is [Procolix](https://www.procolix.com/). Procolix hosts our eduroam/surfconext authentication machines. - -In principle the eScience Center will not pay for infrastructure needed by projects. In these cases the PIs will have to pay the bill. - -## GitHub Pages - -If a project is in need of a website or webapp using only static content (javascript, html, etc), it is also possible to host this at github. See https://pages.github.com/ - -## Local Resources - -A scientist may have access to locally available infrastructure. - -## Other - -This list does not include any resources from Nikhef, CWI, RUG, Target, etc, as these are (as far as we know) not open to all scientists. - -## Avoid if possible - -Try to avoid using self-managed resources (the proverbial machine under the Postdoc's desk). This may seem an easy solution at first, but will most probably require significant effort over the course of the project. It also increases the changes of the infrastructure disappearing at some random moment after the project has finished. From ed2bf29fefc102a130b211f4a169c9d4ebb65315 Mon Sep 17 00:00:00 2001 From: Bouwe Andela Date: Mon, 6 Nov 2023 14:36:25 +0100 Subject: [PATCH 08/10] Remove communication chapter --- _sidebar.md | 1 - best_practices/communication.md | 47 --------------------------------- 2 files changed, 48 deletions(-) delete mode 100644 best_practices/communication.md diff --git a/_sidebar.md b/_sidebar.md index 66eec357..15f32729 100644 --- a/_sidebar.md +++ b/_sidebar.md @@ -4,7 +4,6 @@ * [Version Control](/best_practices/version_control.md) * [Code Quality](/best_practices/code_quality.md) * [Code Review](/best_practices/code_review.md) - * [Communication](/best_practices/communication.md) * [Testing](/best_practices/testing.md) * [Releases](/best_practices/releases.md) * [Documentation](/best_practices/documentation.md) diff --git a/best_practices/communication.md b/best_practices/communication.md deleted file mode 100644 index d6b1b94f..00000000 --- a/best_practices/communication.md +++ /dev/null @@ -1,47 +0,0 @@ -# Communication - -Communication to the outside world is important for visibility of Netherlands eScience Center projects and for building -the user base. - -Communication to other developers is a way to build community and contributors. It also increases -our visibility in development world. - -## Home page - -The software should have a homepage with all the necessary introduction information, links to documentation, source code (github) and latest release download (e.g. [github.io pages](https://pages.github.com/)) - -The page should be created at the latest when the software is ready to be seen by the outside world. It is the place where people will learn about software, so it is important to describe its goals and functionality. -It should be targeted towards non-programming users (unless software is meant for programers i.e library) but should have -pointers for developers to more advanced resources (README.md) - -## Discussion list - -Github issues, mailing list, not private email, for all project related -discussions from the beginning of the project - -There should be no private discussions about the project. Therefore once discussions are started -(in the email), either move them to github issues or if they don’t fit into issues format any more, -create the mailing list. - -## Demo docker image in dockerhub (with Dockerfile) - -When applies, usually for services. - -If software is the service, Docker image should be created at a very early stage. This will allow for easier testing and platform -independent use. - -## An online demo - -Only for web applications - -Online demo should be available since first stable release. -When the website is the user interface for researchers, make sure there is a development version -running somewhere so that they can *play around with it* and give usability feedback. - -## Screencast - -For most software it should be possible to create a screencast. This is very useful for people to get a quick impression of what exactly you are doing without diving into the code itself. In case your software does not have a graphical user interface, even a screencast of a terminal session can be quite informative. Try to add audio, or at least subtitles, so people know what is going on in the video. - -At the Netherlands eScience Center we gather screencasts in our [Youtube Channel](https://www.youtube.com/user/NLeScienceCenter). - - From d4b029815e04c8cfc73ccf39807150e17c3aa2b4 Mon Sep 17 00:00:00 2001 From: "E. G. Patrick Bos" Date: Mon, 6 Nov 2023 16:41:20 +0100 Subject: [PATCH 09/10] address review comments --- best_practices/language_guides/python.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/best_practices/language_guides/python.md b/best_practices/language_guides/python.md index fa12b6ac..1a3a091a 100644 --- a/best_practices/language_guides/python.md +++ b/best_practices/language_guides/python.md @@ -140,16 +140,17 @@ For packaging your code, you can either use `pip` or `conda`. Neither of them is The style guide for Python code is [PEP8](http://www.python.org/dev/peps/pep-0008/) and for docstrings it is [PEP257](https://www.python.org/dev/peps/pep-0257/). We highly recommend following these conventions, as they are widely agreed upon to improve readability. To make following them significantly easier, we recommend using a linter. Many linters exists for Python. -We have long promoted use of [`prospector`](https://github.com/landscapeio/prospector), a tool for running a suite of linters, including, among others [pycodestyle](https://github.com/PyCQA/pycodestyle), [pydocstyle](https://github.com/PyCQA/pydocstyle), [pyflakes](https://pypi.python.org/pypi/pyflakes), [pylint](https://www.pylint.org/), [mccabe](https://github.com/PyCQA/mccabe) and [pyroma](https://github.com/regebro/pyroma). - -However, we have [since 2023 been switching](https://github.com/NLeSC/python-template/issues/336) to [Ruff](https://github.com/astral-sh/ruff). -It is much faster and aims to support most of the functionality that `prospector` does (see the website for the complete function parity overview). -It can be configured in a `pyproject.toml` section. +The most popular one is currently [Ruff](https://github.com/astral-sh/ruff). +Although it is new (see the website for the complete function parity comparison with alternatives), it works well and has an active community. +An alternative is [`prospector`](https://github.com/landscapeio/prospector), a tool for running a suite of linters, including, among others [pycodestyle](https://github.com/PyCQA/pycodestyle), [pydocstyle](https://github.com/PyCQA/pydocstyle), [pyflakes](https://pypi.python.org/pypi/pyflakes), [pylint](https://www.pylint.org/), [mccabe](https://github.com/PyCQA/mccabe) and [pyroma](https://github.com/regebro/pyroma). +Some of these tools have seen decreasing community support recently, but it is still a good alternative, having been a defining community default for years. Most of the above tools can be integrated in text editors and IDEs for convenience. Autoformatting tools like [`yapf`](https://github.com/google/yapf) and [`black`](https://black.readthedocs.io/en/stable/index.html) can automatically format code for optimal readability. `yapf` is configurable to suit your (team's) preferences, whereas `black` enforces the style chosen by the `black` authors. The [`isort`](http://timothycrosley.github.io/isort/) package automatically formats and groups all imports in a standard, readable way. +Ruff can do autoformatting as well and can function as a drop-in replacement of `black` and `isort`. + ## Testing From 0d1d86c4db01723c440436616f518eb1dfc66869 Mon Sep 17 00:00:00 2001 From: Patrick Bos Date: Wed, 8 Nov 2023 15:37:58 +0100 Subject: [PATCH 10/10] fix lychee.toml (#306) --- lychee.toml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lychee.toml b/lychee.toml index 7e8d1dc8..b7eaa462 100644 --- a/lychee.toml +++ b/lychee.toml @@ -1,6 +1,6 @@ # Lychee configuration file # See https://github.com/lycheeverse/lychee/blob/master/lychee.example.toml exclude_all_private = true -exclude_mail = true -progress = false -verbose = true +include_mail = false +no_progress = true +verbose = "info"