Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native support for multiple agent types #1894

Merged
merged 6 commits into from
Dec 18, 2023

Conversation

EwoutH
Copy link
Member

@EwoutH EwoutH commented Nov 30, 2023

On 2023-12-17 this PR was fully updated based on all feedback.

This PR adds a new agents dictionary to the Mesa Model class, enabling native support for handling multiple agent types within models. This way all modules can know which agents and agents type are in the model at any given time, by calling model.agents.

Motivation

NetLogo has had agent types, called breeds, built-in from the start. It works perfectly in all NetLogo components, because it's a first class citizen and all components need to be designed to consider different breeds.

In Mesa, agent types are an afterthought at best. Almost nothing is currently designed with multiple agent types in mind. That has caused several issues and limitations over the years, including:

Especially in scheduling, space and datacollection, lack of a native, consistent construct for agent types severely limits the possibilities. With the discussion about patches and "empty" this discussion done again. You might want empty to refer to all agents or only a subset of types or single type. That's currently cumbersome to implement.

Basically, by always having dictionary available of which agents of which types are in the model, you can always trust on a consistent construct to iterate over agents and agent types.

Conceptualisation

@tpike3 has made a really nice conceptual overview how Mesa now looks with this construct:

Screenshot_312

Implementation Details

  • The Model class now uses a defaultdict(list) to manage agents. This eliminates the need for explicit checks when adding new agent types.
  • Updated the Agent class to automatically register itself with the model upon creation, leveraging the defaultdict functionality.
  • Modified the remove method in the Agent class, ensuring safe removal of agents.

Advantages

In the agent initialization, the agent gets added to the Model's agents dictionary. Thereby, you can always, from everywhere:

  • Acces the available agent types (using the model.agent_types property)
  • Iterate over the different agent types, by:
    for agent_type, agents in self.agents.items():
        yield agent_type, agents
  • Get all the agents of a certain type by model.agents_by_type[AgentType]

This will make proper multiple agent support in all Mesa components (the scheduler, grid, datacollector and visualisation) easier, because there is now a consistent model variable they can access.

Backwards compatibility

This PR itself breaks nothing, it just adds a dictionary to the model.

If you have a single agent, looping over the agents will just mean that only one agent type is iterated over (since there is always at least one agent type). So this work well with both model with a single and multiple agent types.

Future implications

While this PR itself won't break current models, if we modify the scheduler, datacollector or grids to use this construct, it might start breaking models. By using weak references (weakref), we might be able to limit backwards breaking changes.

Review

Since this is a very important change to the core of Mesa, I recommend really critically reviewing this PR, both conceptually (what are the implications for users) and implementation wise (is this the most efficient, scalable and flexible solution).

Any feedback is appreciated!

Copy link

codecov bot commented Nov 30, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (76229a5) 77.35% compared to head (2154cb1) 77.53%.
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1894      +/-   ##
==========================================
+ Coverage   77.35%   77.53%   +0.17%     
==========================================
  Files          15       15              
  Lines        1007     1015       +8     
  Branches      220      221       +1     
==========================================
+ Hits          779      787       +8     
  Misses        197      197              
  Partials       31       31              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mesa/model.py Outdated
@@ -41,6 +41,29 @@ def __init__(self, *args: Any, **kwargs: Any) -> None:
self.running = True
self.schedule = None
self.current_id = 0
self.agents_by_type = {}

def create_agent(self, agent_type, *args, **kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rht Could we type hint that agent_type should be an instance of Agent? How would that work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken it should be the same as in the agent class where you pass in the model model: Model so it would be agent:Agent you will need to import the Agent module though.

@rht
Copy link
Contributor

rht commented Nov 30, 2023

I'd say a less drastic change would be to modify the schedulers. Currently, there needs to be a permutation of [random activation, staged activation, simultaneous activation, discrete event, ...] x [1 agent, multiple agent]. This needs to be reduced to 1D of [random activation, staged activation, simultaneous activation, discrete event, ...] that is multiple type by default.

There are 2 problems:

  • if you specify agents_by_type in the model, the implementation-specific collection container will be tied to the model class and can't be changed. There could be a fast backend typed dict implementation via Numba/Cython/etc, and if you solder in the agents_by_type, it is no longer modular.
  • the agents container is replicated 3x in 3 objects: the model, scheduler, and space.

@rht
Copy link
Contributor

rht commented Nov 30, 2023

the agents container is replicated 3x in 3 objects: the model, scheduler, and space.

This is not the case given that you said the scheduler will access model.agents_by_type directly, so that it is only 2x.

@rht
Copy link
Contributor

rht commented Nov 30, 2023

There could be a fast backend typed dict implementation via Numba/Cython/etc, and if you solder in the agents_by_type, it is no longer modular.

Actually, it could still be modular even in the model class. There could be an argument container_backend that could be default, numba_typeddict, etc.

Note this all can be implemented without breaking compatibility by modifying the schedulers to by multiple type by default, so can happen without having to wait until Mesa 3.0. The remaining issue of whether to put the agents_by_type in the model or in the scheduler can be decided later.

@rht
Copy link
Contributor

rht commented Nov 30, 2023

Another problem is that this is limited to grouping based on agent classes. Sometimes you want to "groupby" agents with various other metrics (inside/outside a bar in the El Farol model, the condition in Epstein civil violence model, or solvent/insolvent bank). This is the same issue I raised in the multi-agent data collection thread #348 (comment). See also https://github.com/tpike3/multilevel_mesa.

@EwoutH
Copy link
Member Author

EwoutH commented Nov 30, 2023

Thanks for the extensive replies!

if you specify agents_by_type in the model, the implementation-specific collection container will be tied to the model class and can't be changed. There could be a fast backend typed dict implementation via Numba/Cython/etc, and if you solder in the agents_by_type, it is no longer modular.

Sorry, I don’t understand the problem fully. Could you maybe give an example?

This is not the case given that you said the scheduler will access model.agents_by_type directly, so that it is only 2x.

Is there a fundamental reason we couldn’t keep everything in the model and let both the scheduler and the grid access if from there.

The remaining issue of whether to put the agents_by_type in the model or in the scheduler can be decided later.

For me, model feels like the right place. It’s the big overarching construct, from which other modules can access.

Another problem is that this is limited to grouping based on agent classes. Sometimes you want to "groupby" agents with various other metrics (inside/outside a bar in the El Farol model, the condition in Epstein civil violence model, or solvent/insolvent bank).

I think you’re spot on here: this is another problem. I’m suggesting native support for different kind (classes) of agents throughout the model, like NetLogo has breeds. You’re suggesting a universal way to filter agents on current state values. State values are parametric differences, while different classes are structural. So I think that are two different problems.

@rht
Copy link
Contributor

rht commented Nov 30, 2023

if you specify agents_by_type in the model, the implementation-specific collection container will be tied to the model class and can't be changed. There could be a fast backend typed dict implementation via Numba/Cython/etc, and if you solder in the agents_by_type, it is no longer modular.

Sorry, I don’t understand the problem fully. Could you maybe give an example?

I already replied that it is not a problem, because you could pass a custom agent container constructor to the class.

Is there a fundamental reason we couldn’t keep everything in the model and let both the scheduler and the grid access if from there.

The grid/space is its own agent container that stores spatial relations between the agents. Needed for getting the neighbors.

I think you’re spot on here: this is another problem. I’m suggesting native support for different kind (classes) of agents throughout the model, like NetLogo has breeds. You’re suggesting a universal way to filter agents on current state values. State values are parametric differences, while different classes are structural. So I think that are two different problems.

If I were to summarize. #348 (comment) does collect() based on various ways to structure the agents. In this case, it is step() on various ways to structure the agents.

@rht
Copy link
Contributor

rht commented Nov 30, 2023

I suppose I need to describe some concrete examples. I replaced model.agents_by_type with model.agents so that it is more concise.

# staged activation by type and condition
# Both wolf and sheep
model.agents.do_each("move")
model.agents.get_by(class=Sheep).do_each("eat_or_die")
model.agents.get_by(class=Wolf).do_each("hunt_or_die")
# Both wolf and sheep
model.agents.get_by(condition=lambda a: a.check_is_ready()).do_each("reproduce")

# RiskAverseTrader and AdaptiveTrader
# This is a string because a group could be a class that both classes inherit from, or for some conditional.
# The group argument is for when it is final, i.e. unchanged after the model initialization. For dynamic grouping, use the condition argument.
model.agents.get_by(group="trader").do_each("trade")
  • It's a bit more verbose than NetLogo's.
  • It's not declarative, because I suppose the scheduler is meant to describe chronology of actions.

@EwoutH
Copy link
Member Author

EwoutH commented Nov 30, 2023

This is a great example and I really like the cleanness. It reads well from left to right, like proper Python.

It's also nice to see the step not being one solid thing, but being modular and flexible. The Staged and ByType schedulers are also not that necessary anymore, and could maybe be replaced.

I have another idea, I will try to work it out tomorrow. But I already like this discussion and talking about the different options!

@rht
Copy link
Contributor

rht commented Nov 30, 2023

I have another idea, I will try to work it out tomorrow. But I already like this discussion and talking about the different options!

I was using system 1 (in this context) to write that. But deliberating on system 2 is better, and need some time.

@EwoutH
Copy link
Member Author

EwoutH commented Dec 1, 2023

I started typing out this long detailed reply, only come to the conclusion that I basically ended up with the same solution you did.

  • model.agents solves the problem to access all agents from everywhere
  • get_by() solves the problem to easily and quickly select a subset of agents

I have still a few questions however:

  1. What is agents exactly? Still a dict? Or a dataframe?
  2. How does get_by() work on agents? Is it a method of it? (I imagined get_by as a model method, returning a set of agents)
  3. What arguments does do_each() take? An agent or model function? As string? And probably something to randomize the order?

I would really like to see an implementation for your approach, since I think it's the right direction! We might still encounter some minor things but generally I really like it.

Okay, is my idea. I don't necessarily know if it's better, but maybe we can reach best of both.

It's based these premises:

  • A different agent class is by definition an different agent type. If you model an agent like a different class, there is some reason you felt the need to subclass, which could either be having different attributes or different methods. Otherwise you could just define an attribute for classification, like strategy is "risk averse" or "adaptive".
  • Since a different class has different attributes or methods, it is not guaranteed to be compatible among functions, so they should (by default) not be called together

Then, like I said before, I see this as two different problems:

  1. A model-wide construct to keep track of which agents and agent types there are, which can be accessed from all modules
  2. A efficient way to select/filter agent based on type, properties or states.

(note that in Python, a property and a state are both modelled as an agent attribute. How ever, the convention is to call things that don't change during the model run properties (they are static) and things that do change states (they are dynamic).

Now, the solution for problem 1) I have offered in this PR. I think that still is the best way to do it, just keep a dictionary of types. Is conceptually easy, straightforward to use and aligns well with PEP 20 in my opinion.

There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.

(I know I'm cheating since I'm Dutch)

Then, the solution for problem 2. I would propose to have an select_agents method added in the Model class that can select a subsection of agents based on different criteria. It could looks like this:

Edit: Started working on it and we basically have the same solution.

@tpike3
Copy link
Member

tpike3 commented Dec 2, 2023

I started typing out this long detailed reply, only come to the conclusion that I basically ended up with the same solution you did.

  • model.agents solves the problem to access all agents from everywhere
  • get_by() solves the problem to easily and quickly select a subset of agents

I have still a few questions however:

  1. What is agents exactly? Still a dict? Or a dataframe?
  2. How does get_by() work on agents? Is it a method of it? (I imagined get_by as a model method, returning a set of agents)
  3. What arguments does do_each() take? An agent or model function? As string? And probably something to randomize the order?

I would really like to see an implementation for your approach, since I think it's the right direction! We might still encounter some minor things but generally I really like it.

@EwoutH So this would be my take based on this conversation

Agents are python objects that are stored in a dictionary in mesa.time.py. Type only exists right now in RandomActivationByType to do get_by() should be fairly trivial to reference the dictionary key which will then return a list of those agent objects.*

do_each would then reference the function in that agent type object, So you would randomize the order of agents in the list which is already part of RandomActivationByType returned when you called get_by() and then iterate through the list executing the function passed in via do each()

*This leads to the question of whether the agents by type should be more basic (e.g. part of the time.py base scheduler), but code a little test a little :)

@EwoutH
Copy link
Member Author

EwoutH commented Dec 3, 2023

Thanks for getting back!

do_each would then reference the function in that agent type object, So you would randomize the order of agents in the list which is already part of RandomActivationByType returned when you called get_by() and then iterate through the list executing the function passed in via do each()

Wait, so you first get all the agent step functions, and then filter them? That sounds less efficient than first filtering the list of agents and then executing the step functions of those agents (but I'm not sure).

Originally, I imagined a model.select_agents() method that returned a selection of agents. Something like (from the model perspective):

rich_traders = self.select_agents(class=Trader, attributes=["wealth">=5])
total_wealth_rich_traders = sum(a.wealth for a in rich_traders)

However, I liked rth's proposal just as much, especially the do_each() part.

This leads to the question of whether the agents by type should be more basic (e.g. part of the time.py base scheduler), but code a little test a little :)

So in this proposal and in my opinion, definitely yes. And even more general than only in time.py, it should be in the model by default so you can always trust in any Mesa Module, Class and method that the type is available and you can iterate over it.

Agents are python objects that are stored in a dictionary in mesa.time.py.

This PR proposes to change that aspect (see the code diff and commit/PR messages). The agent types should be accessible from both the model (model.agents_by_type, or model.agents like rht proposed) and each agent (agent.agent_type, which is just the Class).


I'm now really curious for a full (draft) implementation of rht's proposal. @rht are you willing to make one?

@tpike3
Copy link
Member

tpike3 commented Dec 8, 2023

Apologies for the delay, This is a pretty big architectural change, but merits discussion. And it makes me realize our current architectural drawing is out of date (below).

My thought, would be to discuss the architecture as I think it will reduce the cognitive load on the discussion. The idea would be I update the architecture design and then we can discuss around that. Let me know what you think.

Mesa CSSSA

@EwoutH
Copy link
Member Author

EwoutH commented Dec 13, 2023

@tpike3 looking forward to your new conceptualisation!

AgentPy also uses a construct like this in almost all their example models:

class VirusModel(ap.Model):
    def setup(self):
        # Create agents
        self.agents = ap.AgentList(self, self.p.population, Person)
class WealthModel(ap.Model):
    def setup(self):
        # Create agents
        self.agents = ap.AgentList(self, self.p.agents, WealthAgent)

So the concept of having one single variable that contains all the agents - in our case stored by type - in not new in Python ABM modelling.

@EwoutH
Copy link
Member Author

EwoutH commented Dec 16, 2023

Todo:
- Write tests for create_agent() method
- Update Boltzmann Wealth Model / tutorial

Edit: Both are not relevant anymore, the new implementation already takes care of both.

@EwoutH
Copy link
Member Author

EwoutH commented Dec 17, 2023

@quaquel thanks for you feedback. I looked at it, and have the following thoughts:

  1. Using class names (.__name__) instead of direct class reference: I think going with class references is more logical in Mesa. We might want to filter, select, check or do other things depending on the type. Having the class reference themselves is then more flexible, it allows direct method access, inspection and type checking.
    • As example, we might have a class Predator, which is a subclass of Animal. If we want to filter all animals, we can easily do that with the class reference (with isinstance()), but not with the class name.
  2. Using the __subclasses__ dunder method: I didn't know this existed and this is very smart! The only concern I have with using that __subclasses__() method in Python does not track instantiation; it only tracks class definitions. So if we have some definitions but don't use them in a model run, there might be classes where there are no instances in the model of, leading to unexpected behavior.
    class Model:
        @property
        def agent_types(self):
            return Agent.__subclasses__()
    Considering that, an variable that's updated by the create_agent() and remove_agent() methods might be more fitting.
  3. Specifying type as Property in the Agent Class instead of assigning it on init: I see the advantages of having it be a property and it's probably best practice. My only concern is the readability for less experienced programmers. I don't think I've ever encountered a model where the Agent class is changed during the model run, because that's the only case I can think of where it would be beneficial to define type as a property.
    class Agent:
        @property
        def type(self):
            return self.__class__
  4. Using Weak References is a really interesting idea. It would looks something like this right?
    class Model:
        def __init__(self, *args: Any, **kwargs: Any) -> None:
            self.agents_by_type = weakref.WeakValueDictionary()
    
        def create_agent(self, agent_type: type, *args, **kwargs):
            """Creates an agent of the given type with a unique ID."""
            agent = agent_type(self.next_id(), self, *args, **kwargs)
            self.agents_by_type.setdefault(agent.type, weakref.WeakSet()).add(agent)
            return agent
    It would provide benefits in automatic cleanup and provide a backup if agents are removed using something else than the remove_agent() method, preventing memory leaks. However, I think the model would be the only place that they are explicitly referenced, assuming we modify the schedulers to use the model's agents_by_type in the future. If that only reference is a weak reference, it might . Considering that, I'm slightly leaning towards a robust remove_agent() method and keeping it a regular dict. That would be in line with the concept that in the end, the Model knows what Agents are part of it.
    • It might be interesting to use weak references in the scheduler in the future however!

Thanks again for you suggestions, I'm learning a lot diving deeper in those Python constructs and thinking critically about how they would apply in our situation. So if you have the time, keep it coming!


@rht, @Corvince and @wang-boy, I'm also really curious on your takes on this!

@quaquel
Copy link
Member

quaquel commented Dec 17, 2023

Just some thoughts without diving too deeply into MESA itself but coming at it as a generic devs/abm modeler with ample Python experience who also happens to teach ABM with MESA to MSc. students (and has build a pPython connector to NetLogo without ever having built a single NetLogo model).

I agree that having an agent_by_type overview at the model level is a good idea.

Similarly, keeping track of agents within the model should not be a concern of the user. This means using weakrefs everywhere (so in all schedulers and spaces) except in one place. You need to use hard refs in one place because, otherwise, agent instances are garbage collected once all references have been removed.Model.agent_by_type might be an ideal place for this one hard reference. Using weakrefs means no endless bookkeeping and removes possible locations for memory leaks.

What I don't like is burdening the user with additional api calls when creating and killing agent instances (i.e., create_agent and remove_agent). In particular, since all of this can, in principle, be handled within the agent class. As long as the user subclasses this and calls super in the __init__, the rest should just work. This means probably tying into some of Python's dunder methods. For example, __new__ and __subclasses__ can play a role in this.

When creating a new Agent instance, this should handle the creation of the identifier (do you even really need this....?) as well as adding the instance to Model.agent_by_type. If you want to remove an agent, you probably need a kill method on the agent class. This kill method should then remove the agent from Model.agent_by_type. This means that you have to think carefully about the data structure for Model.agent_by_type. It seems like defauldict(set) might be the easiest solution, although some benchmarking is probably in order. The reason for a kill method on the agent is that there is no Python equivalent for __new__ or __init__. The destructor dunder method is __del__, but this is only called once all references to the instance have been removed.

As an aside, I think looking at NetLogo for how you want to handle anything is strange. NetLogo is based on a now 50-year-old language. Why confine yourself to choices made 50 years ago if you can access a much more modern language with very sophisticated OO features?

@EwoutH
Copy link
Member Author

EwoutH commented Dec 17, 2023

Right, I forgot that Agents have access to their model instance from very start, since they’re initiated with it. I’m my head is was a problem, because I thought some things needed to happen from the model scope (so in a model method like create_agent), but since the agent knows it’s model, it can just happen from the agent __init__.

Thanks, I’m going to try to move the create and kill logic into the agent.

Edit: Not having to move people to start using create_agent() is also a major benefit that improved backwards compatibility. Obviously the way to go.

@EwoutH
Copy link
Member Author

EwoutH commented Dec 17, 2023

I reimplemented this PR fully in c29ab08 based on the discussions and updated the PR description. In short:

  • I removed the create_agent() model method, that's now handled in the Agent init
  • I replaced and renamed the agent_by_type dict to a agents defaultdict
  • I added a remove() method to the Agent
  • I use type(Agent) instead of Agent.__class__ (its more pythonic)
  • I added a agent_types property that gives a list of all agent classes in the model

With these changes, there is now no backwards incompatible behavior and users don't need to modify existing models.

mesa/agent.py Outdated Show resolved Hide resolved
Tracks agents in the model with a defaultdict.

This PR adds a new `agents` dictionary to the Mesa `Model` class, enabling native support for handling multiple agent types within models. This way all modules can know which agents and agents type are in the model at any given time, by calling `model.agents`.

NetLogo has had agent types, called [`breeds`](https://ccl.northwestern.edu/netlogo/docs/dict/breed.html), built-in from the start. It works perfectly in all NetLogo components, because it's a first class citizen and all components need to be designed to consider different breeds.

In Mesa, agent types are an afterthought at best. Almost nothing is currently designed with multiple agent types in mind. That has caused several issues and limitations over the years, including:

- projectmesa#348
- projectmesa#1142
- projectmesa#1162

Especially in scheduling, space and datacollection, lack of a native, consistent construct for agent types severely limits the possibilities. With the discussion about patches and "empty" this discussion done again. You might want empty to refer to all agents or only a subset of types or single type. That's currently cumbersome to implement.

Basically, by always having dictionary available of which agents of which types are in the model, you can always trust on a consistent construct to iterate over agents and agent types.

- The `Model` class now uses a `defaultdict` to store agents, ensuring a set is automatically created for each new agent type.
- The `Agent` class has been updated to leverage this feature, simplifying the registration process when an agent is created.
- The `remove` method in the `Agent` class now uses `discard`, providing a safer way to remove agents from the model.
In the tests the MockModel currently isn't properly initialized, because super().__init__() isn't called, and thus the Model init isn't ran. This commit fixes that.
@EwoutH
Copy link
Member Author

EwoutH commented Dec 17, 2023

Final review checklist

  • PR description is updated including conceptual model
  • No breaking changes or user actions needed
  • Docstring is updated
  • Tests are added
  • Commits are cleaned up:
  • CI passes
  • Code coverage is increased

@tpike3 and @jackiekazil please do final review. Let's get this merged!

@Corvince
Copy link
Contributor

I am sorry for putting on brakes on this, but I think one detail needs to be changed.

This stores agents in sets. While this appears to be the right datatype it has one big disadvantage: In Python the order of items in a set is non-deterministic. So in contrast to dictionaries they don't preserve insertion order.

Since mesa is used for research we shouldn't introduce non-deterministic behavior to allow completely reproducible results.

Since dictionaries work very similar, you can probably (ab)use dictionary keys instead of using a set.

Otherwise this really looks very clean and since it has no breaking changes very mergeable.

@EwoutH
Copy link
Member Author

EwoutH commented Dec 17, 2023

In Python the order of items in a set is non-deterministic. So in contrast to dictionaries they don't preserve insertion order.

Fair point. Changed from a defaultdict of sets to a defaultdict of lists (see a8bb15f).

@quaquel
Copy link
Member

quaquel commented Dec 18, 2023

You are taking a massive performance hit by going to lists when you want to remove agents.

I also fail to understand why, for research purposes, the order needs to be deterministic. The primary purpose of this is to keep track of agents per type. If you want deterministic behavior, you can always do a sort on ID and return a list. This would solve the performance loss while also ensuring deterministic behavior. Also, there are more sophisticated data structures that could be created that have the retrieval speed for removal of a set while providing deterministic behavior (e.g., ordered sets or some more involved linked list kind of thing such as a dict of dicts: [type]:{[id]:instance}).

So the question becomes: what is done more frequently: removing agents from the dict or retrieving the collection of agents for a specific agent type?

@EwoutH
Copy link
Member Author

EwoutH commented Dec 18, 2023

I did a benchmark to see the impact. Since most models never remove agents, it tool the Wolf-Sheep model from mesa-examples and ran 5 times with the default settings, 125 times. Here are the results:

benchmark_results

The t-test is insignificant.

from scipy import stats
stats.ttest_ind(list_results, set_results)

TtestResult(statistic=-1.2073182676004544, pvalue=0.22845980311728672, df=248.0)

mesa_with_wolfsheep_benchmark.zip

Given that with a model that frequently deletes agents, the performance impact is unnoticeable, I would say lets go with the lists.

If we find some weird edge-case later we can always patch it. Perfect is the enemy of done.

@EwoutH
Copy link
Member Author

EwoutH commented Dec 18, 2023

@quaquel did some benchmarks (thanks!):

self.agents = defaultdict(list)
self.agents[type(agent)].append(agent)

Performance for adding and removing 1000 agents: 2.06 ms ± 3.11 µs

self.agents = defaultdict(set)
self.agents[type(agent)].add(agent)

Performance for adding and removing 1000 agents: 439 µs ± 1.3 µs

self.agents = defaultdict(dict)
self.agents[type(agent)][agent] = None

Performance for adding and removing 1000 agents: 460 µs ± 2.11 µs

All options are imperfect:

  • Lists are slow (and allow duplicates)
  • Sets don't keep insertion order (making runs non-deterministic)
  • dicts are ugly in usage (due to their redundant None values)

What we need in an OrderedSet: https://discuss.python.org/t/add-orderedset-to-stdlib/12730/85

Unfortunately, that doesn't exist in the standard library (yet).

There are some packages, the most popular being ordered-set: https://pypi.org/project/ordered-set/

from ordered_set import OrderedSet

self.agents: defaultdict[type, OrderedSet] = defaultdict(OrderedSet)

But we have to add that to the requirements.

Personally, I'm happy with:

  • Using lists
  • Using the OrderedSet from ordered-set

The dict with all values None I find simply too ugly from and UX point of view. Teaching Mesa is complicated enough as it's already. You wouldn't have to specify .keys() everytime you want some set of agents.

@EwoutH
Copy link
Member Author

EwoutH commented Dec 18, 2023

Benchmarked ordered-set, it's extremely slow (22.9 ms ± 1.09 ms, written in pure Python).

There's another package called Ordered-set-37 and it's a lot faster, almost as fast as the other set/dict options (615 µs ± 2.34 µs).

So, let's vote:

  • 🚀: use defaultdict(dict) = None
  • 🎉: use defaultdict(OrderedSet)
  • ❤️: use defaultdict(list)

@Corvince
Copy link
Contributor

Thanks for the performance benchmarks! I think for large lists, list should even perform much worse.

I voted for dict. It's a bit ugly, but it's also what happens for Ordered Set 37 and I think explicitly handling it in mesa instead of a hidden in a dependency is more explicit

@tpike3
Copy link
Member

tpike3 commented Dec 18, 2023

self.agents = defaultdict(dict)
self.agents[type(agent)][agent] = None

This is really great @EwoutH! Substantive discussion and forward progress. I am very much leaning to option 1 but instead of the None would be possible to do

self.agents[type(agent)][agent.unique_id] = agent

This will get rid of the None values and although you will have things like .keys() I think that can also be a plus as it will give users more flexibility. In addition I would not be opposed to adding those simple functions (e.g. .keys(), .values() to model so users can get lists of agent ids or agent objects in an effort to mitigate your UX concerns. Let me know what you think

@tpike3
Copy link
Member

tpike3 commented Dec 18, 2023

Thanks for the performance benchmarks! I think for large lists, list should even perform much worse.

I voted for dict. It's a bit ugly, but it's also what happens for Ordered Set 37 and I think explicitly handling it in mesa instead of a hidden in a dependency is more explicit

I agree and tried to think of a middle ground, let me know what you think

@Corvince
Copy link
Contributor

Ah I didn't catch the last part of your comment. Dict should only be used internally and we should return a list via a getter. Turning the keys into a list should be negligible in terms of performance

@EwoutH
Copy link
Member Author

EwoutH commented Dec 18, 2023

Thanks for both your responses. My biggest concern is that with the dict you can't simply do

model.agents[SomeType]

anlymore. Now you have to add .keys(), and that's just ugly. I already hear a hunderd students ask "What does the .keys() do?" and then I have to explain it's a stupid implementation detail.

You can indeed solve that by hiding everything behind utility function, and while I'm a big fan of utility functions, in this case I think it just shouldn't be needed. It doesn't follow the spirit of PEP 20.

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Flat is better than nested.
Readability counts.
Special cases aren't special enough to break the rules.

Having agents be a dict is just so much more explainable than having it be a nested dict, even if the values are just None.

@tpike3
Copy link
Member

tpike3 commented Dec 18, 2023

I registered my vote, but this is a great discussion and I am excited to see how it ends up. My two cents for what is worth are very much in line with @Corvince

However, I do appreciate @EwoutH concerns having my own students struggle with dictionaries. Regardless, great job @EwoutH however this ends up it is going to be an awesome move forward.

@EwoutH
Copy link
Member Author

EwoutH commented Dec 18, 2023

Okay, it seems the majority is for a dict with None values. Let's do that.

I propose to handle all agent selection utility in the proposed select_agents() method from #1905.

This commit introduces dics nested into a single defaultdict for agent storage, using agent instances as inner keys and None as inner values. The decision is based on:

1. Determinism: A dictionary ensures deterministic behavior for reproducible research outcomes.
2. Performance: Benchmarks showed that dictionaries offer a good balance between performance and functionality, especially in models with frequent agent updates.
3. Usability and Explicitness: While the use of None values is unconventional, this approach is practical and avoids the complexity of nested structures or external dependencies. It aligns with clear and explicit coding practices, making the framework more accessible and maintainable.

The choice of a defaultdict with None values addresses the need for deterministic behavior, performance efficiency, and clarity in implementation. Additional utility functions like `select_agents()` will be added to enhance usability.

Discussion for historical reference: projectmesa#1894 (comment)
@EwoutH
Copy link
Member Author

EwoutH commented Dec 18, 2023

Implemented in 2154cb1. Let's get this merged.

Copy link
Member

@tpike3 tpike3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM--- Thanks @EwoutH

And I always appreciate you improving the docstrings as you go!

@tpike3 tpike3 merged commit 9495a5a into projectmesa:main Dec 18, 2023
13 checks passed
@EwoutH
Copy link
Member Author

EwoutH commented Dec 18, 2023

Thanks everyone for all the feedback and critical thinking. We've made a great step here!

Next up: Selecting subsets of agents: #1905.

@Corvince
Copy link
Contributor

Thanks for both your responses. My biggest concern is that with the dict you can't simply do

model.agents[SomeType]

anlymore. Now you have to add .keys(), and that's just ugly. I already hear a hunderd students ask "What does the .keys() do?" and then I have to explain it's a stupid implementation detail.

Yes, this should absolutely be the right syntax and I think we should make this work. I think we can implement our own OrderedSet (light) class by simply creating a class that inherits from dict and setting getitem. And we would use this to set the agents internally as _agents: defaultdict[type, OrderedSet] and public as agents: dict[type, list(agent)].

See also the second part of this answer

I think this should work. But for now lets celebrate the merge! 🚀

EwoutH added a commit to EwoutH/mesa that referenced this pull request Dec 18, 2023
This commit introduces a new feature to the Mesa Agent-Based Modeling framework: a select_agents method. This enhancement is inspired by NetLogo's n-of functions and a smart idea from @rht as discussed in projectmesa#1894. It aims to provide a flexible and efficient way to select and filter agents in a model, based on a wide range of criteria.

#### Features
The `select_agents` method offers several key functionalities:

1. **Selective Quantity (`n`):** Specify the number of agents to select. If `n` is omitted, all matching agents are chosen.

2. **Criteria-Based Sorting (`sort` and `direction`):** Sort agents based on one or more attributes. The sorting order can be set individually for each attribute.

3. **Functional Filtering (`filter`):** Use a lambda function or a callable to apply complex filter conditions.

4. **Type-Based Filtering (`agent_type`):** Filter agents by their class type, allowing for selection among specific subclasses.

5. **Flexible Size Handling (`up_to`):** When `True`, the method returns up to `n` agents, which is useful when the available agent count is less than `n`.
EwoutH added a commit to EwoutH/mesa that referenced this pull request Dec 18, 2023
This commit introduces a new feature to the Mesa Agent-Based Modeling framework: a select_agents method. This enhancement is inspired by NetLogo's n-of functions and a smart idea from @rht as discussed in projectmesa#1894. It aims to provide a flexible and efficient way to select and filter agents in a model, based on a wide range of criteria.

#### Features
The `select_agents` method offers several key functionalities:

1. **Selective Quantity (`n`):** Specify the number of agents to select. If `n` is omitted, all matching agents are chosen.

2. **Criteria-Based Sorting (`sort` and `direction`):** Sort agents based on one or more attributes. The sorting order can be set individually for each attribute.

3. **Functional Filtering (`filter`):** Use a lambda function or a callable to apply complex filter conditions.

4. **Type-Based Filtering (`agent_type`):** Filter agents by their class type, allowing for selection among specific subclasses.

5. **Flexible Size Handling (`up_to`):** When `True`, the method returns up to `n` agents, which is useful when the available agent count is less than `n`.
@EwoutH EwoutH added the feature Release notes label label Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Release notes label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants