Unable to retrieve JP2 files from webpage #5

jennyferpinto · 2017-11-05T14:17:38Z

Based on the lat and long provided in the query parameters we are using big-query to retrieve the base_url and granule_id. We then use this to construct the URL where the JP2 files are located. Such as: https://console.cloud.google.com/storage/browser/gcp-public-data-sentinel-2/tiles/32/D/PG/S2A_MSIL1C_20171014T073921_N0205_R063_T32DPG_20171014T073917.SAFE/GRANULE/L1C_T32DPG_A012072_20171014T073917/IMG_DATA

However, we are supposed to be returning the individual JP2 URLs.

Could you give us some information as to how this can be done? We thought we would maybe need to scrape the webpage for the URLs but the source page doesn't show any links.

jf87 · 2017-11-06T10:56:36Z

Hi Jennyfer,

I think there are at least two different ways to do that:

(1) You can parse the metadata xml file (e.g., in your example, this one). It contains the a granule_list field that contains the image paths.

(2) Because the images are stored in a storage bucket, you can use the storage bucket API to list the single files (objects).
There is a command line util for that, e.g., this returns the images in your example:

gsutil ls gs://gcp-public-data-sentinel-2/tiles/32/D/PG/S2A_MSIL1C_20171014T073921_N0205_R063_T32DPG_20171014T073917.SAFE/GRANULE/L1C_T32DPG_A012072_20171014T073917/IMG_DATA/

You could do the same programmatically from within a Go program.

jennyferpinto · 2017-11-06T14:26:09Z

Thank you! That was very helpful.

I was able to access the JP2 links by running what you sent in the command line

gsutil ls gs://gcp-public-data-sentinel-2/tiles/32/D/PG/S2A_MSIL1C_20171014T073921_N0205_R063_T32DPG_20171014T073917.SAFE/GRANULE/L1C_T32DPG_A012072_20171014T073917/IMG_DATA/

However, I then tried to use the storage bucket API but it says this exact bucket above doesn't exist, which is weird. (the code is below)

I don't think it's an authentication issue because it's a public bucket and I shouldn't have to authenticate to retrieve the files.

I did find this "If you only wish to access public data, you can create an unauthenticated client with:"

client, err := storage.NewClient(ctx, option.WithoutAuthentication())

But the WithoutAuthentication() method doesn't exist in the option library.

package main

import (
	"context"
	"fmt"

	"cloud.google.com/go/storage"
	"google.golang.org/api/iterator"
)

func main() {

	var s = "gs://gcp-public-data-sentinel-2/tiles/32/D/PG/S2A_MSIL1C_20171014T073921_N0205_R063_T32DPG_20171014T073917.SAFE/GRANULE/L1C_T32DPG_A012072_20171014T073917/IMG_DATA"
	list(s)
}

func list(bucket string) {
	ctx := context.Background()

	client, err := storage.NewClient(ctx)
	if err != nil {
		fmt.Println(err)
	}

	it := client.Bucket(bucket).Objects(ctx, nil)
	for {
		attrs, err := it.Next()
		if err == iterator.Done {
			break
		}
		if err != nil {
			fmt.Println(err)
		}
		fmt.Println(attrs.Name)
	}
}

jf87 · 2017-11-06T14:45:39Z

Hi Jennyfer,

The bucket name is actually "gcp-public-data-sentinel-2".
But this contains everything, so you need to add a query for filtering (otherwise you get everything):

func (b *BucketHandle) Objects(ctx context.Context, q *Query) *ObjectIterator

See here for the Query type.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to retrieve JP2 files from webpage #5

Unable to retrieve JP2 files from webpage #5

jennyferpinto commented Nov 5, 2017

jf87 commented Nov 6, 2017

jennyferpinto commented Nov 6, 2017

jf87 commented Nov 6, 2017

Unable to retrieve JP2 files from webpage #5

Unable to retrieve JP2 files from webpage #5

Comments

jennyferpinto commented Nov 5, 2017

jf87 commented Nov 6, 2017

jennyferpinto commented Nov 6, 2017

jf87 commented Nov 6, 2017