Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to retrieve JP2 files from webpage #5

Open
jennyferpinto opened this issue Nov 5, 2017 · 3 comments
Open

Unable to retrieve JP2 files from webpage #5

jennyferpinto opened this issue Nov 5, 2017 · 3 comments

Comments

@jennyferpinto
Copy link

Based on the lat and long provided in the query parameters we are using big-query to retrieve the base_url and granule_id. We then use this to construct the URL where the JP2 files are located. Such as: https://console.cloud.google.com/storage/browser/gcp-public-data-sentinel-2/tiles/32/D/PG/S2A_MSIL1C_20171014T073921_N0205_R063_T32DPG_20171014T073917.SAFE/GRANULE/L1C_T32DPG_A012072_20171014T073917/IMG_DATA

However, we are supposed to be returning the individual JP2 URLs.

Could you give us some information as to how this can be done? We thought we would maybe need to scrape the webpage for the URLs but the source page doesn't show any links.

@jf87
Copy link
Owner

jf87 commented Nov 6, 2017

Hi Jennyfer,

I think there are at least two different ways to do that:

(1) You can parse the metadata xml file (e.g., in your example, this one). It contains the a granule_list field that contains the image paths.

(2) Because the images are stored in a storage bucket, you can use the storage bucket API to list the single files (objects).
There is a command line util for that, e.g., this returns the images in your example:

gsutil ls gs://gcp-public-data-sentinel-2/tiles/32/D/PG/S2A_MSIL1C_20171014T073921_N0205_R063_T32DPG_20171014T073917.SAFE/GRANULE/L1C_T32DPG_A012072_20171014T073917/IMG_DATA/

You could do the same programmatically from within a Go program.

@jennyferpinto
Copy link
Author

Thank you! That was very helpful.

I was able to access the JP2 links by running what you sent in the command line

gsutil ls gs://gcp-public-data-sentinel-2/tiles/32/D/PG/S2A_MSIL1C_20171014T073921_N0205_R063_T32DPG_20171014T073917.SAFE/GRANULE/L1C_T32DPG_A012072_20171014T073917/IMG_DATA/

However, I then tried to use the storage bucket API but it says this exact bucket above doesn't exist, which is weird. (the code is below)

I don't think it's an authentication issue because it's a public bucket and I shouldn't have to authenticate to retrieve the files.

I did find this "If you only wish to access public data, you can create an unauthenticated client with:"

client, err := storage.NewClient(ctx, option.WithoutAuthentication())

But the WithoutAuthentication() method doesn't exist in the option library.

package main

import (
	"context"
	"fmt"

	"cloud.google.com/go/storage"
	"google.golang.org/api/iterator"
)

func main() {

	var s = "gs://gcp-public-data-sentinel-2/tiles/32/D/PG/S2A_MSIL1C_20171014T073921_N0205_R063_T32DPG_20171014T073917.SAFE/GRANULE/L1C_T32DPG_A012072_20171014T073917/IMG_DATA"
	list(s)
}

func list(bucket string) {
	ctx := context.Background()

	client, err := storage.NewClient(ctx)
	if err != nil {
		fmt.Println(err)
	}

	it := client.Bucket(bucket).Objects(ctx, nil)
	for {
		attrs, err := it.Next()
		if err == iterator.Done {
			break
		}
		if err != nil {
			fmt.Println(err)
		}
		fmt.Println(attrs.Name)
	}
}

@jf87
Copy link
Owner

jf87 commented Nov 6, 2017

Hi Jennyfer,

The bucket name is actually "gcp-public-data-sentinel-2".
But this contains everything, so you need to add a query for filtering (otherwise you get everything):

func (b *BucketHandle) Objects(ctx context.Context, q *Query) *ObjectIterator

See here for the Query type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants