Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different dataset_id could link to the same dataset #59

Open
hongyuanjia opened this issue Sep 19, 2022 · 1 comment
Open

Different dataset_id could link to the same dataset #59

hongyuanjia opened this issue Sep 19, 2022 · 1 comment
Assignees
Labels
bug Something isn't working
Milestone

Comments

@hongyuanjia
Copy link
Member

hongyuanjia commented Sep 19, 2022

dataset_id could not be used as the unique identifier of the dataset. It is specific to data node. This did not cause any problems for esgf_query(), but did result in duplicated entries in the results of init_cmip6_index() when replica is set to TRUE. Should use dataset_pid as the unique dataset identifier when building index.

q <- epwshiftr::esgf_query(
    activity = "ScenarioMIP",
    variable = "tas",
    frequency = "day",
    experiment = "ssp585",
    source = "AWI-CM-1-1-MR",
    variant = "r1i1p1f1",
    replica = TRUE,
    latest = TRUE,
    resolution = "100 km",
    limit = 10000L,
    data_node = NULL
)

q[, .(dataset_id, dataset_pid)]
#>                                                                                        dataset_id
#> 1:   CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf-data1.llnl.gov
#> 2: CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf-data3.diasjp.net
#> 3:       CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf.ceda.ac.uk
#> 4:       CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf.nci.org.au
#>                                          dataset_pid
#> 1: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 2: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 3: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 4: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8

unique(q[, -c("dataset_id", "data_node")])
#>    mip_era activity_drs institution_id     source_id experiment_id member_id
#> 1:   CMIP6  ScenarioMIP            AWI AWI-CM-1-1-MR        ssp585  r1i1p1f1
#>    table_id frequency grid_label  version nominal_resolution variable_id
#> 1:      day       day         gn 20190529             100 km         tas
#>              variable_long_name variable_units
#> 1: Near-Surface Air Temperature              K
#>                                          dataset_pid
#> 1: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8

Created on 2022-09-19 with reprex v2.0.2

@hongyuanjia hongyuanjia self-assigned this Sep 19, 2022
@hongyuanjia hongyuanjia added the bug Something isn't working label Sep 19, 2022
@hongyuanjia hongyuanjia added this to the v0.2.0 milestone Sep 19, 2022
@hongyuanjia
Copy link
Member Author

Ref: [Identifiers](Returned Metadata Fields)

@hongyuanjia hongyuanjia modified the milestones: v0.1.4, v0.2.0 Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant