Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make IrrBAW "Fibreable" a-la Naughty Dog #214

Open
devshgraphicsprogramming opened this issue Jan 25, 2019 · 11 comments
Open

Make IrrBAW "Fibreable" a-la Naughty Dog #214

devshgraphicsprogramming opened this issue Jan 25, 2019 · 11 comments
Labels
enormous large < task size < epic

Comments

@devshgraphicsprogramming
Copy link
Collaborator

devshgraphicsprogramming commented Jan 25, 2019

We want to achieve this, so that we can control the pre-emption of threads and affinity
http://twvideo01.ubm-us.net/o1/vault/gdc2015/presentations/Gyrling_Christian_Parallelizing_The_Naughty.pdf

NOTE: Engine should still work and be compatible with non-fibered/jobbed execution.

However we don't want to take responsibility for the job-scheduling.
We can build a default scheduler for the user to build-off, but should make it easily replaceable (like IAssetLoaderOverride).

We shall provide fiber-safe replacement for std::mutex and std::condition_variable.
The library should be able to change between normal C++11 threading and fiber-threading via a compile flag ( switching out std::mutex, std::confition_variable for alternates).

We should look into and how they achieve the stack allocation, register saving etc.
https://www.boost.org/doc/libs/1_69_0/libs/context/doc/html/index.html

For some ideas for synch primitives:
https://www.boost.org/doc/libs/1_69_0/libs/fiber/doc/html/index.html
https://www.boost.org/doc/libs/1_69_0/libs/coroutine2/doc/html/index.html
https://www.boost.org/doc/libs/1_69_0/doc/html/lockfree.html

@devshgraphicsprogramming devshgraphicsprogramming added the enormous large < task size < epic label Jan 25, 2019
@manhnt9
Copy link
Contributor

manhnt9 commented Jan 29, 2019

ASIO has this capability that I used to create a thread-pool for jobs scheduling.
Just for you to know how it works to have more design ideas, I don't think you're interested in using asio in IrrlichtBAW though 😃

from any where in the code
  submit lamdas (or any function objects) - call these jobs

spawn multiple threads
  call asio's run function
  jobs are automatically executed on these threads
  jobs which are wrapped in something called strand won't be called in parallel
  jobs can have order dependency too

So you can consider to have the same I/O service objects for scheduling in a job example, maybe.

I've learnt a bit about fiber but haven't coded with it yet. Will do soon since I'm also interested in it.
Probably gonna use Boost.Fiber in my engine.

@devshgraphicsprogramming
Copy link
Collaborator Author

Jobs are not pwoerful enough, I need an std::mutex and std::condition_variable replacement that can "pause" a job (save its whole stack) and "resume" it at a later time.

I considered Boost.Fiber but I don't like the fact it schedules your fibers for you.

I want fiber scheduling to live outside of IrrBAW and boost, ergo the reason for using Boost::Context.

@manhnt9
Copy link
Contributor

manhnt9 commented Jan 30, 2019

How do you think about Coroutine?

Well I think I forgot, I probably want Intel TBB more than Boost.Fiber for my game engine.

@devshgraphicsprogramming
Copy link
Collaborator Author

Coroutine is still stackless... i.e. coroutine is a generalized routine (routine = function call).

@manhnt9
Copy link
Contributor

manhnt9 commented Mar 25, 2020

I'm also considering this: https://github.com/dougbinks/enkiTS

@devshgraphicsprogramming
Copy link
Collaborator Author

devshgraphicsprogramming commented Apr 24, 2020

Mutexes, Barriers, wait for all, async file I/O and custom schedulers
https://github.com/lewissbaker/cppcoro

@devshgraphicsprogramming
Copy link
Collaborator Author

http://www.1024cores.net/home/lock-free-algorithms/tricks/fibers

We should probably benchmark using the case of filtering a 4096^2 or 8192^2 image's mip-maps on a CPU with the following techniques:

  • OpenMP
  • C++11 threading with semaphores
  • thread based tasking (thread pools)
  • Forward launching (we launch tasks up-front not by examining a chain of awaits) C++20 coroutines [stackless]
  • Boost::fiber / f_context fibers with custom scheduler (naughty dog style) NOT boost::fiber but stackfull coroutines

We should gather data about the performance characteristics of the following hardware:

  • An ARM 8.2 chip like Nvidia Orin or whatever they'll put in the new switch consoles
  • A consumer/server grade 32 thread CPU
  • A server grade 128 thread CPU

We should gather a performance chart of time-to-finish vs. task size for every "concurrency method"

@devshgraphicsprogramming
Copy link
Collaborator Author

Unlike your traditional coroutine vs. fiber benchmark, we'll be focusing on tasks with a duration in microseconds, not nanoseconds

@devshgraphicsprogramming
Copy link
Collaborator Author

Really want/need this in core C++23
www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0876r9.pdf

@devshgraphicsprogramming
Copy link
Collaborator Author

Asynchronous I/O requires an IFile implementation which implements virtual memory caching of the contents (with pages aligned to and sized in 4096 increments) OR to read the whole file in at once (at least cache it in contiguous memory)

Probably an IFilePool would be useful to amortize cache costs.

@devshgraphicsprogramming
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enormous large < task size < epic
Projects
None yet
Development

No branches or pull requests

2 participants