Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test s3 ingest through ftp-ingest directory #91

Open
7yl4r opened this issue Feb 21, 2019 · 4 comments
Open

test s3 ingest through ftp-ingest directory #91

7yl4r opened this issue Feb 21, 2019 · 4 comments
Assignees

Comments

@7yl4r
Copy link
Member

7yl4r commented Feb 21, 2019

As discussed earlier today the next step to automating s3 processing is to make sure .SEN3 files load properly via the ftp ingest DAG.

Specifically, in that file the BashOperator ingest_s3a_ol_1_efr loads all files in /srv/imars-objects/ftp-ingest/fl_sen3/ that match S3A_OL_1_EFR___*.SEN3 using the imars_etl.load command.
find and xargs are used to list all files & split them up, and mv is used to move the file to trash after it is loaded.

steps

Here are the steps to test this:

  1. manually put a .SEN3 (zipped) file into /srv/imars-objects/ftp-ingest/fl_sen3. This file should cover some of Florida (30n, 24s, -80e, -84w).
  2. create or wait for a DagRun to start for DAG ftp_ingest_na on imars-airflow-test
  3. modify DAG code & retest if DAG is failing, else proceed
  4. check that the loaded file metadata is correct in the MySQL db.
    1. from userproc start the mysql client & connect to imars_product_metadata: mysql --host imars-sql-hydra --port 3306 --database imars_product_metadata -u imars -p
    2. do a mysql search for your new file: SELECT * FROM file WHERE product_id=36 AND date_time='$FILE_DT'; where $FILE_DT is as described below.
    3. the table returned should be the new file & all metadata
  5. check that you can extract the file object using imars_etl.extract
    1. on userproc do imars-etl extract -v "product_id=36 AND date_time='$FILE_DT'" where $FILE_DT is as described below
    2. once complete the file should be placed in your working directory
    3. verify the file matches original using diff, sha1sum or your preferred method.

$FILE_DT

$FILE_DT should be the datetime of the granule you are testing in a format like: '2018-06-22 16:25:25' for /srv/imars-objects/gom/s3a_ol_1_efr/S3A_OL_1_EFR____20180622T162525.SEN3.


once this is verified working we can move on to ensuring the s3 processing is correct.

@7yl4r
Copy link
Member Author

7yl4r commented Feb 27, 2019

I have added a more user-friendly interface to the metadata db (more detail @ USF-IMARS/imars-etl#34).

This might make step (4) above a bit easier.

  • check that the loaded file metadata is correct in the MySQL db.
    1. go to this query on imars-physalis.marine.usf.edu:3000 & log in (same credentials as imars-airflow-test)
    2. enter $file_dt, click run
    3. the table returned should be the new file & all metadata

@sebastiandig
Copy link
Contributor

We're getting closer to finishing step 2. I'm still having SyntaxError: filepath does not match patter

path: S3A_OL_1_EFR____20180609T152151_20180609T152451_20180610T200648_0179_032_125_2520_LN1_O_NR_002.zip
pattern:S3{sat_id}_OL_1_EFR____{dt_Y:4d}{dt_m:2d}{dt_d:2d}T{dt_H:2d}{dt_M:2d}{dt_S:2d}.zip

I would try this myself, but I don't have permission, to change data.py, line 253 to
S3{sat_id}_OL_1_EFR____%Y%m%dT%H%M%S_{end_date:08d}T{end_t:06d}_{ing_date:08d}T{ing_t:06d}_{duration:04d}_{cycle:03d}_{orbit:03d}_{frame:04d}_{proc_location}_O_N{r_or_t}_{base_collection:03d}.zip

@7yl4r
Copy link
Member Author

7yl4r commented Mar 4, 2019

Oh hey, I think that is the longer filename pattern (called slstr in data.py)?
Try using --ingest_key slstr instead of dhus_abbrev in ftp.py.

Also I have fixed the permissions on imars-etl so you should be able to edit now.

@7yl4r
Copy link
Member Author

7yl4r commented Mar 4, 2019

Oh also note that you can try parts of that command out on userproc:

# should list the files:
find /srv/imars-objects/ftp-ingest/fl_sen3/  -type f    -name "S3A_OL_1_EFR___*.zip"

# try to load the file (and maybe get a permissions error)?
imars-etl load --product_id 36 --sql "status_id=3 AND area_id=12 AND provenance='sebastian'" --ingest_key slstr --duplicates_ok --nohash /srv/imars-objects/ftp-ingest/fl_sen3/S3A_OL_1_EFR____20180609T152151_20180609T152451_20180610T200648_0179_032_125_2520_LN1_O_NR_002.zip

oh yeah and imars-etl has a --dry_run option for testing:

imars-etl load --dry_run --product_id 36 --sql "status_id=3 AND area_id=12 AND provenance='sebastian'" --ingest_key slstr --duplicates_ok --nohash /srv/imars-objects/ftp-ingest/fl_sen3/S3A_OL_1_EFR____20180609T152151_20180609T152451_20180610T200648_0179_032_125_2520_LN1_O_NR_002.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants