Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add a block/task to an existing job #451

Merged
merged 16 commits into from
Mar 3, 2020

Conversation

eliemichel
Copy link
Contributor

@eliemichel eliemichel commented Jun 26, 2019

Hello,

I started working on issue #292 (Add a block/task to an existing job) and this is a work in progress PR so not meant to be merged right away. Feel free to review and discuss it though.

Done

  • Core feature for appending tasks to existing block
  • JSON API for appending tasks to an existing block
  • Making the newly created tasks Ready, so that they start immediately
  • Adding an "append new blocks" mechanism
  • Prevent afwatch from crashing when adding new blocks
  • Add methods to the Python API
  • Add commands to afcmd (@lithorus)
  • Update documentation

Known issues

Numeric block

I'm not sure it makes sense to append tasks to numeric blocks, so I did not implement it. Only non-numeric block can have tasks appent.

Block dependencies

I haven't checked that the newly created tasks respect the dependency system, but they should without too much trouble. However, a conceptual issue occurs for blocks that depend on the one where new tasks are appent because now a job/block can switch from "DONE" back to "RUN". Since it was once Done, dependent jobs may have started already, so what should they become? Restart? Pause? Continue?

Anyway I see no use of making a job depend on one that may be extended with new tasks eventually, so the question may actually never be an issue in real life applications.

afwatch

When appending new blocks to a job that is opened in afwatch, one must reopen the job window to see the update. It feels like a minor issue so I don't plan on fixing it.

API

JSON

The JSON API changes are new operation types:

  • append_tasks which takes a tasks list following the very same spec as when submitting new jobs, e.g.:
{
 "action":
 {
  "user_name"  : "elie",
  "host_name"  : "my_pc",
  "type"       : "jobs",
  "ids"        : [3],
  "block_ids"  : [0],
  "operation"  :
  {
   "type"      : "append_tasks",
   "tasks"     :[
    {
        "name"    : "Extra Process A",
        "command" : "python -c \"print('Process EXTRA task A')\""
    },
    {
        "name"    : "Extra Process B",
        "command" : "python -c \"print('Process EXTRA task B')\""
    }
    ]
  }
 }
}
  • append_blocks which takes a blocks list also following jobs submission spec, e.g.:
{
 "action":
 {
  "user_name"  : "elie",
  "host_name"  : "my_pc",
  "type"       : "jobs",
  "ids"        : [3],
  "operation"  :
  {
   "type"      : "append_blocks",
   "blocks"    :[
    {
      "name"              : "New numeric block",
      "tasks_name"        : "frames @#@-@#@",
      "service"           : "generic",
      "parser"            : "generic",
      "flags"             : 1,
      "frame_first"       : 1,
      "frame_last"        : 100,
      "frames_per_task"   : 10,
      "frames_inc"        : 2,
      "command"           : "python -c \"print('Process frames @#@-@#@')\"",
      "working_directory" : "E:\\tmp"
    },
    {
      "name"              : "New non numeric block",
      "tasks_name"        : "frames @#@-@#@",
      "service"           : "generic",
      "parser"            : "generic",
      "working_directory" : "E:\\tmp",
      "tasks"             : [
      {
        "command" : "python -c \"print('Process task A')\""
      },
      {
        "command" : "python -c \"print('Process task B')\""
      }
      ]
    }
    ]
  }
 }
}

Python

Two command methods have been added to the Cmd object in af.py:

  • appendBlocks(jobId, blocks)

Example:

import af

block = af.Block('generic', 'generic')
block.setCommand("python -c \"print('Process frames @#@-@#@')\"")
block.setNumeric(1, 100, 10)

cmd = af.Cmd()
print(cmd.appendBlocks(3, [block]))
  • appendTasks(jobId, blockId, tasks)

Example:

import af

task = af.Task('test')
task.setCommand("python -c \"print('Process task A')\"")

cmd = af.Cmd()
print(cmd.appendTasks(3, 0, [task]))

@lithorus
Copy link
Member

Really nice work. I can add the methods to the afcmd module when it has been pulled in.

@eliemichel
Copy link
Contributor Author

eliemichel commented Jun 26, 2019

Indeed, I'll add afcmd to the todo-list for the record!
Little update: new tasks start automatically, I'm now working on appending full blocks to jobs

@timurhai
Copy link
Member

Hi!
Just checked this. Great work.
Sorry i did not know that we can use checkable checkboxes here.
And i think that i checked (unchecked) some occasionally ((

@timurhai
Copy link
Member

If some new block will be added, job state should be recalculated automatically in v_refresh() at the next afserver heartbeat. Or you have tried and it was not?

@lithorus
Copy link
Member

Added a pull request to the pull request :)

@lithorus
Copy link
Member

  • Job.appendBlocks(blocks)

Example:

import afcmd

block = af.Block('generic', 'generic')
block.setCommand("python -c \"print('Process frames @#@-@#@')\"")
block.setNumeric(1, 100, 10)

job = afcmd.getJob(3)
print(job.appendBlocks([block]))

blockCopy = job.blocks[0]
print(job.appendBlocks([blockCopy]))
  • Block.appendTasks(tasks)

Example:

import afcmd

task = af.Task('test')
task.setCommand("python -c \"print('Process task A')\"")

job = afcmd.getJob(3)
block = job.blocks[0]
print(block.appendTasks([tasks]))

@eliemichel
Copy link
Contributor Author

aha didn't know that one could PR a PR, thanks @lithorus I'll merge it beginning next week!

@timurhai no worry for the check boxes, I fixed them ;) I did not try to rely on v_refresh only, I'll give it a shot!

Also, do you think of anything that may be indirectly broken by this new feature, and so that I should take care of checking?

Remove debug

Co-Authored-By: Elie Michel <[email protected]>
@timurhai
Copy link
Member

@eliemichel i do not think that lots of things can break on a new block(s) append.

Depend blocks are stored as a list of integers (blocks numbers)
https://github.com/CGRU/cgru/blob/master/afanasy/src/server/block.h#L88
to not to run regular expressions check for each block on each cycle.
So it performs once, on creation and on block parameters change:
https://github.com/CGRU/cgru/blob/master/afanasy/src/server/block.cpp#L524

And job progress and its storing should be checked.

Also afwatch/webgi should reconstruct job - may be just close it and reopen.

@yannci
Copy link
Contributor

yannci commented Jul 2, 2019

Hey Elie,
nice work, I appreciate it! Just to give you some feedback for now:

I was able to build afanasy on Centos7 and Centos6 so far. Everythings working there fine. I also applied @lithorus changes to afcmd and tested this here in our environment. This works fine, too.

I will have a deeper look into this at the end of the week and will let you know if I find any issues or if there is something that needs to be changed.

Thanks so far!
Yannic

@eliemichel
Copy link
Contributor Author

@yannci Thanks a lot for your feedback! :)

@timurhai
Copy link
Member

timurhai commented Jul 5, 2019

Hi!
Sorry i am very busy this days!
But i will definitely test it!

@eliemichel
Copy link
Contributor Author

@timurhai What is the recommended way to update to the documentation? Is the website managed through a git repository? Is there anywhere I should put doc within cgru repo?

@timurhai
Copy link
Member

timurhai commented Jul 11, 2019

Hi!
Sorry i have not enough time for Afanasy.
And so will be for a several mounths!

For now all documentation is here:
http://cgru.info/afanasy/job
That sources are here:
https://github.com/CGRU/cgru.github.com/tree/master/content
(This is another CGRU repo)
And there is no content engine.
You should write html by hand.

I (and not only me) think that this is no good.
So i have a plan to split documentation.
Keep cgru.info site for just an intoduction, brief info.
And detailed documentation keep on the other site, here:
https://cgru.readthedocs.io
Using readthedocs hosting, content engine and so on.
That sources are here:
https://github.com/CGRU/cgru/tree/master/docs
This way there is no need to write html by hand.
We can use a simple markup:
https://github.com/CGRU/cgru/blob/master/docs/installation/index.rst
(that GitHub understands too)

But this new detailed documentation is just started and empty.
(I just created a sceleton for it)

@eliemichel
Copy link
Contributor Author

I added doc to the website project: CGRU/cgru.github.com#7

And I introduced a howto section in the new doc to put a guide about adding new tasks/blocks: https://github.com/eliemichel/cgru/blob/append-tasks/docs/howto/appendtask/index.rst

I think the howto is a good format as long as there is no automated doc generation. And even after it can still remain usefull for such specific task.

@timurhai
Copy link
Member

Hi. I still have not enough time for all issues of the project.
Also i started to write afwatch panel: #453
much earlier that this issue (even w/o commits). At first, i want to finish panel, as i keeping that code changes in mind, and do not have enough time for 2 issues this time (yet). AfWatch code changes should not conflict.

ps
i should finish some hard work, after it i will take a BIG vocation (i must after such work), and than i should have more time for the project to think about all issues/requests and close them.

@timurhai timurhai mentioned this pull request Feb 17, 2020
@timurhai timurhai merged commit 5fc2b07 into CGRU:master Mar 3, 2020
@timurhai
Copy link
Member

timurhai commented Mar 3, 2020

Hi!
I will check all this in the nearest feature (definitely before a new release).

@timurhai
Copy link
Member

Hello!

Much time passed...

How is going with this issue?
Somebody using it? It works fine? Is it totally completed?

After the code review, the first thing that I want to change is storing tasks and blocks in std::vector<int32_t>, to not to use double pointers. But that code is ancient, written long before this issue, it works and do not need immediate fix.

@yannci
Copy link
Contributor

yannci commented Jan 25, 2021

Hi Timur,

I am currently testing this feature while developing a PDG workflow with Houdini and Afanasy, which was one of the reasons for this feature (see #292 and #436). After merging my PR (#489) this works quite well for me. I didn't observe any crashes or problems so far. There is still the PR regarding the flags to indicate wether a job has a appended blocks or tasks (#490) open which might come in handy in afwatch etc.

Otherwise this works so far. In regard to the refactoring the use of double pointers I don't have an opinion so far since I didn't dig that deep in the code.

Best,
Yannic

@timurhai
Copy link
Member

Hi Yannic!

I think that adding a block/task feature we (i) need to see/test/develop from PDG side.
As now this is the only issue that needs and uses this feature.
For now i am have not used PDGs yet, but soon i should start.

Is there some other common (not Houdini) things that i should to check for this issue? Some c++ code?

The entire Houdini PDG/TOPs connection i can check only when i will be familiar with PDG/TOPs.

timurhai added a commit that referenced this pull request Jul 6, 2021
… contains new block/task ids.

Now if there is Action::answer_kind is not set
(an empty string, used to be 'error','log','info'),
Action::answer data will be written as a raw string in 'object' field.
This was server can return a custom JSON object containing some data as an answer on action.

Needed on appending new tasks on a dynamic Houdini PDG job.

References #451, #436.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants