Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move all packages from Depends to Imports #22

Open
arilamstein opened this issue Apr 9, 2021 · 4 comments
Open

Move all packages from Depends to Imports #22

arilamstein opened this issue Apr 9, 2021 · 4 comments
Labels
check Necessary to pass R CMD CHECK

Comments

@arilamstein
Copy link
Contributor

When you run R CMD CHECK you get this NOTE:

N checking package dependencies (3.4s)
Depends: includes the non-default packages:
'codec', 'strtools', 'packtools', 'vartools', 'funtools',
'difftools', 'dplyr', 'lubridate', 'readr', 'readxl', 'rpivotTable',
'data.table', 'digest', 'downloader', 'glue', 'lazyeval', 'purrr',
'stringr', 'tibble', 'tidyselect'
Adding so many packages to the search path is excessive and importing
selectively is preferable.

Depends is somewhat dangerous, because forces the package to be loaded. This means that if someone is working locally, they might have their existing function / data names overridden when they load your package. Even worse, we don't control the packages we are loading. So after a call to update.packages(), people might get different results when they reload the package.

Fixing this, I believe, requires us to first fix #21. The reason is that if we move something from Depends to Imports now, and our namespace does not declare a specific importFrom call in the NAMESPACE file, R CMD CHECK will create a fatal error and crash. So we must first ensure that we are explicitly importing all necessary functions from each package. Then we can move the packages to Imports.

@arilamstein arilamstein added the check Necessary to pass R CMD CHECK label Apr 9, 2021
@arilamstein
Copy link
Contributor Author

As an update, my recent PR reduces this to two packages in Depends: dplyr and tidyselect.

I intentionally left tidyselect as-is because we currently have another PR that I think is related to it, and I wanted to reduce the chance of have any conflicts there.

I also intentionally left dplyr as-is because R CMD check triggers a lot of "no visible global function definition" NOTEs. I think (but am not sure) that some of those are dplyr functions.

@dholstius
Copy link
Member

@arilamstein
Copy link
Contributor Author

Let's see:

... can we close this?

My personal preference is to keep this open, and have it be the next thing I address in this repo. If you look at the DESCRIPTION file you'll see that there are still two packages listed in Depends: dplyr and tidyselect.

I know that you mentioned that you sometimes like to put packages in Depends because then they're available to everyone. But I think that a better way to achieve that goal is to add a call like library(tidyverse) to the bottom of /usr/lib/R/etc/Rprofile.site (or whatever the actual file is). That is, I think we should aim to have packages be as compact and tight as possible, and use the .Rprofile file to make broader changes to everyone's environment. Does that seem reasonable to you?

Also, when I run check I also see the following:

  extract_lookup_table: no visible global function definition for ‘last’
  extract_tree : graft: no visible global function definition for
    ‘data_frame’
  extract_tree: no visible global function definition for ‘data_frame’
  filter_categories: no visible global function definition for ‘first’
  filter_facilities: no visible global function definition for ‘first’
  pivot_table: no visible global function definition for ‘first’
  print_tbl: no visible global function definition for ‘humanize’
  print_tbl: no visible global function definition for ‘replace_which’
  print_tbl: no visible global function definition for ‘total_each’
  print_tbl: no visible global function definition for ‘one_of’
  read_xls: no visible global function definition for ‘regex’
  scale_at: no visible global function definition for ‘enquo’
  select_first: no visible global function definition for ‘all_of’
  select_first: no visible global function definition for ‘everything’
  select_last: no visible global function definition for ‘all_of’
  select_last: no visible global function definition for ‘everything’
  wildcard_join: no visible global function definition for ‘one_of’
  with_hierarchy: no visible global function definition for ‘num_range’
  with_hierarchy: no visible global function definition for ‘na.omit’
  write_csv: multiple local function definitions for ‘format_dbl’ with
    different formal arguments
  Undefined global functions or variables:
    . .name .wt all_of cat_id data_frame ems_unit enquo everything fac_id
    first format_nonbreaking humanize label last median na.omit node
    num_range one_of parent pol_abbr pol_id regex replace_which
    total_each year
  Consider adding
    importFrom("stats", "median", "na.omit")
  to your NAMESPACE file.

These are related to this issue. In that I think we can really only safely move packages from Depends to Imports when R knows exactly which package we want it to use for a particular function call. And here check is still saying "I don't know".

@dholstius
Copy link
Member

dholstius commented Apr 26, 2021

I know that you mentioned that you sometimes like to put packages in Depends because then they're available to everyone. But I think that a better way to achieve that goal is to add a call like library(tidyverse) to the bottom of /usr/lib/R/etc/Rprofile.site (or whatever the actual file is). That is, I think we should aim to have packages be as compact and tight as possible, and use the .Rprofile file to make broader changes to everyone's environment. Does that seem reasonable to you?

This is a good question. A few years ago, IIRC, the team wanted folks to have a clean-ish session by default, such that you'd always need to invoke library(inventory) at the top of scripts and such. Folks do sometimes do work that doesn't rely on the inventory package. And, we sometimes do development work on one of the subpackages, and I'm not sure that plays well with devtools::load_all(). (It's possible to start up R –no-environ on the command line. Is that possible with RStudio?)

So I'm not yet persuaded that adding library(inventory) to the site profile is the right thing to do. That said, I'm still open to it—and to other suggestions too!

Let's pose the question "What should the team's default R environment look like?" to the current core team. We've had some new hires, some training, and some retirements in the past few years. So there may be some new perspectives and/or preferences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
check Necessary to pass R CMD CHECK
Projects
None yet
Development

No branches or pull requests

2 participants