-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor algorithms package #104
Comments
I wonder if |
I also wonder if the |
I've also noticed that quite a lot of algorithms classes are non-final and expose protected members for extension via inheritance. I wonder if there's a way we can make those algorithms extendable via composition and/or the Builder pattern instead. (This is something which I have rather limited experience with, so it's not clear to me yet how we'd investigate this.) |
The Layout algorithms can be (and in some cases are) used by some people who want a layout algorithm but want to do their own rendering. Layout and rendering can, and should, be kept separate. The one fly in the ointment is that currently they use a Dimension instance to specify the size of the canvas, and we should create our own class so as to remove that dependency on Swing. We could conceivably flatten the package hierarchy in that way as well. I suggest that we need a design doc for figuring out where things go; issue updates is not a good place for that sort of discussion. The fact that most algorithms classes are non-final and either don't need to be (because subclassing them is not really appropriate or useful) or could be customized in another way is certainly something we can look into as part of this overall issue. |
I'd like to see us remove from all Layouts, the dependencies on java.awt.Dimension and java.awt.geom.Point2D I'd also like to see the Layout implementations be only a Function<V,P> that holds no cached locations state. We can provide delegates that do caching and leave users the option of implementing their own preferred caching mechanisms if they choose. |
Using delegate classes to do caching for Layouts sounds like a rather good idea to me. I imagine we'd replace |
Glad you like the idea about caching. |
Oh yeah, of course! Having methods like |
Hmm, perhaps as an alternative to delegate classes, we could allow customisation through a Builder. Or alternatively, we could have two specific Static Factory Methods for each Layout: one that returns a "normal" Layout, and another that returns a caching variant. @tomnelson Do you have any thoughts about these two ideas? |
I was thinking that it would be nice to have Layout implementations that contain only the functional algorithm code and no persisted state of the graph. Those functional classes could be used (by delegates... subclasses ....) that apply their own caching, including doing things like using persistent storage for very large graphs. |
Okay, that sounds reasonable to me. 👍 |
I agree that we should have our own internal versions of |
I'm not entirely sure what you're proposing that a I think you'd still want to have a single Note that So perhaps what this proposal boils down to is the notion that the I don't think that there's much harm in doing something like that, and one could argue that it's a cleaner design in some ways, but I'm not sure that our users would derive much benefit from it, either, at least with our current visualization architecture. In practice, if you're trying to lay out a graph that's so big that you can't store the node locations in memory, you're already dead in the water, because there's no way that either the force-directed layouts or the rendering are going to tolerate you trying to visualize a few million nodes (they might not even if we had better (read: any) spatial data structures, but they sure won't now). There's also a real question in my mind of the utility of trying to render such a graph at all, but that's a different discussion. Are there anticipated benefits that I'm missing? I'm certainly open to using builders for creating layouts, and that would make such a change easier in future. But I'd like to hear more about the anticipated benefits that providing this flexibility would provide. (Side point: I don't think that there's any particular point in |
I'm starting with the notion that Layout is not part of the
jung-visualization module. Maybe there are cases where someone wants to
make a graph, apply a layout algorithm, and ask what nodes are within some
distance from some point. I think that would work for extremely large
graphs that have locations stored on disk. Whatever the possible
applications are, we did not put Layout in jung-visualization, so let's not
let visualization challenges limit the design of our Layouts.
You brought a lot of clarity to my initial (naive) ideas. I think that my
real target is to separate the way we store locations from the way we apply
an algorithm. We already mostly have this: AbstractLayout contains the code
that stores locations while the subclass implementations have the various
algorithms. Perhaps if AbstractLayout had a way to inject the thing that
stores the locations.... maybe we are almost there (or close) by changing
the old lazy maps to the CacheBuilder. I believe that your paragraph 4 says
it perfectly.
As to the replacement of java.awt.Dimension and java.awt.geom.Point2D, We
seem to be leaning towards writing new classes to add to jung. My
preference is to use something 'good' that someone else wrote, unless it is
kind of obscure. We went down that path with collections-generic way back
when there we no better alternatives.
Instead of writing our own Dimension and Point, we could use the Point2i
and Point2d classes from the vecmath package. vecmath seems like a
reasonable dependency (to me) and the 2 classes I mentioned seem to have
equivalent features to those in the awt classes, without the awt baggage.
…On Fri, Sep 15, 2017 at 2:10 PM, jrtom ***@***.***> wrote:
I'm not entirely sure what you're proposing that a Layout would look like
in this proposed architecture. I guess you're talking about splitting
Layout into two types: one for the algorithm that calculates node
positions, and one that stores and retrieves them.
I think you'd still want to have a single Layout type, though, unless we
decide that we no longer want to support animating changes in layout. That
is, if the visualization system only knows about the type that reports node
location, then it has no way to tell the Layout to update positions, either
in response to user input or in response to time passing/the visualization
updating. It's been a while since I've looked at this aspect of the
visualization code, but IIRC it's the visualization system's responsibility
to call Layout.step().
On the other hand, the algorithm necessarily must be able to both read
from and write to the node locations, so the type that includes the
algorithm may as well also expose the node locations on request. (Otherwise
the visualization system would have to keep track of both of these things,
and would probably end up creating a wrapper type to keep them together
anyway, for convenience.)
Note that Layout itself (that is, the interface) doesn't have any notion
at all of where the position information is being stored. All it has is the
apply() method that returns a position for a given node.
So perhaps what this proposal boils down to is the notion that the Layout
instances that we provide should have the option of being given an instance
of something like a (writeable) Function on creation (which would
presumably default to an internal Map, as usual).
I don't think that there's much harm in doing something like that, and one
could argue that it's a cleaner design in some ways, but I'm not sure that
our users would derive much benefit from it, either, at least with our
current visualization architecture. In practice, if you're trying to lay
out a graph that's so big that you can't store the node locations in
memory, you're already dead in the water, because there's no way that
either the force-directed layouts or the rendering are going to tolerate
you trying to visualize a few million nodes (they might not even if we had
better (read: any) spatial data structures, but they sure won't now).
There's also a real question in my mind of the utility of trying to render
such a graph at all, but that's a different discussion.
Are there anticipated benefits that I'm missing?
I'm certainly open to using builders for creating layouts, and that would
make such a change easier in future. But I'd like to hear more about the
anticipated benefits that providing this flexibility would provide.
(Side point: I don't think that there's any particular point in Layout
implementing Function any more, now that we're in a Java 8 world. The one
place I can think of where it's at all relevant is that it makes it
marginally easier to have a Layout whose initial positions depend on
another's, and even that capability wouldn't go away, we'd just either need
to use a Layout::getPosition reference or provide an overload of
setInitializer that took a Layout. Am I missing anything?)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#104 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAGbB4alGE1J7ZOg_CdLxbCgtDidRU5Oks5sir2TgaJpZM4Ouyqr>
.
|
Hi @jrtom, as it's not terribly obvious to me which parts of your last comment were directed at who, I'll do my best to answer the part which I think you meant for me. If I've missed anything, please let me know. :)
I admit that I was just throwing vague, potentially naive thoughts to the wind, in the hope that it would sprout some fruitful ideas. So I haven't given much thought about the benefits and drawbacks that a Builder would bring. But as to what I have thought so far, here is what I'd imagine a Layout usage to look like (assuming it operates on Graph<V> graph = ...;
Dimension dimension = ...;
Layout<V> layout =
Layout.builder()
.graph(graph)
.initializerFunction((V v, Point2D point) -> ...)
.size(dimension)
.resetRunnable(...)
.locationFunction(...)
.initializeRunnable(...)
.build(); However, having written this idea up as code, I can see a few drawbacks:
|
The problem I see with Point2d at least (which I found a javadoc for here) is that its IMO, immutable tuples would be better suited here. |
@jbduncan |
@tomnelson Oh, I hadn't considered that. I personally suggest we look for a way to make use of immutable tuples, nonetheless. But if we come to the conclusion that it's impractical to remove the mutability because it objectively worsens performance or it is just physically impossible, then I'll happily concede. :) |
FWIW, I believe that the Javafx equivalent of Point is immutable. That is one of the few things I noticed when I glanced at how Javafx would interact with our Layout classes (that use mutable Point2D). I like the idea of them being immutable in our algorithms module, and not letting visualization dictate how our Layouts are designed. (I'm not suggesting using the javafx Point in Layout!) |
@jbduncan, I'm responding to your comment first because it's easiest. :) The short version is: I should not have put those two sentences in the same paragraph, because I did not intend to conceptually link them. :) The "flexibility" that I was referring to was the notion of separately specifying the storage for the nodes. I'm pretty much on board with the notion of using builders for constructing most things, as I think they make the code more self-documenting (and eliminate the combinatorial explosion of constructors). The major question to my mind in re: Layout builders would be whether we should have a single entry point for all builders, e.g.: Layout<N> layout = Layout.forGraph(graph)
.fruchtermanReingold() // or algorithm(FRUCHTERMAN_REINGOLD [1])
.size(dimension)
.build(); or an entry point for each algorithm: Layout<N> layout = FRLayout.forGraph(graph).size(dimension).build(); Option (1) advantages:
Option (2) advantages:
[1] We might want to consider renaming our algorithms according to how they work, not after the names of the authors of the algorithm. Most people probably have no idea which one is which based on the existing names. |
@jrtom Option (1) tempts me the most, because of the reasons you explained above and because of the fact that we'd be using an interface static method for I'm very interested by your idea in Option (1) to allow the selected algorithm to affect the type of the returned builder. Do you have any thoughts yet on what the implementation would look like, especially for user-provided algorithms? |
Responding to the notion of using someone else's library for analogues to Dimension/Point2D: (1) As a general statement, I would prefer to keep the number of dependencies that JUNG has to a minimum. It simplifies licensing weirdness (which we've had problems with in the past) and just generally reduces coordination costs. In this case the surface area for these types inside our library would probably be fairly limited, i.e., they'd be used by the (2) Arguably what we really want for these things are value types, which Java doesn't yet support. In lieu of those, I'd probably prefer to take a dependency on @autovalue (for these and for other types, as discussed in #94 ) so that we can at least minimize the boilerplate code (and bugs), and then migrate to using value types later, perhaps. I agree that these types should be immutable, and +100 on not letting the visualization model dictate our algorithm class designs. |
@jbduncan re: option (1), my basic idea is to have a top-level Builder class that you use to get the basic instance for the algorithm and then have specialized Builder classes that provide options that are specific to the algorithm, or at least the kind of algorithm. Guava's MultimapBuilder is an example of this kind of pattern. Whether we would have a separate Builder type for each algorithm, or have kinds, perhaps even a class hierarchy, of builders (e.g., static vs. dynamic) is an open question. I wouldn't necessarily insist on using an interface static method, although we certainly could. (I'm so used to the idea that I can't use most of the cool Java 8 stuff in developing for Guava because it has to be backwards compatible that my reflex is "we can't use that feature", so by all means call out this sort of option where it makes sense; I was mostly using a shorthand. :) |
Responding to @tomnelson :
I will point out that right now we can't efficiently support this, either, because we don't make use of spatial data structures that would make this feasible: in order to answer this question right now in JUNG, one has to iterate over all of the nodes and calculate their respective distances from the specified point. Also, this representation would also implicitly presume that either location updates are infrequent or that we've got a pretty hefty back end that can handle high-frequency updates, such as what the force-directed layouts do. So possibly this is something that we should only support for a static layout (not necessarily, just putting it out there). And in fact for static layouts, this makes sense: given Another more radical angle to consider: maybe we should take this even farther and not support automated (algorithm-driven) updates to a layout while it's being rendered. At that point the rendering code actually wouldn't need a |
I agree that spatial data structures would be a very good thing in some jung-algorithms Layout implementations. Is it enough of a priority to go before some of the other work being done? For our currently single-threaded layout implementations, it will be interesting to see the performance difference between calling setLocation on every node's mutable Point in every 'step' of the algorithm compared with removing a Point from the locations Map, making a new Point, then storing it in the Map, along with the gc of all those abandoned Points. |
@tomnelson We've gotten along (slowly) without spatial data structures for this long, we can continue to do so for as long as we need to; it's a question of where we want to put our energies. Side note: You make a good point about mutable data structures in the context of updates; it is almost certainly faster to update those objects than it is to create new ones. So perhaps focusing on immutability is not the best option here. I'd be interested to hear your thoughts (and @jbduncan 's) on some of the more out-there suggestions I made yesterday. |
I think that your statements about the static vs dynamic layouts comes down to whether we have them implementing IterativeContext or not. I see no reason to change that. It would be a shame to lose the pretty animations. I'm not that concerned right now about making the visualization code easier. I mostly just want to know who's got the Graph. I'm thinking of writing a (naive) version of something like SpringLayout that uses spatial data. That way, you will have to stop saying 'since we don't have spatial data structures' and can only say 'since we don't have any GOOD spatial data structures'. I would like to have a starting point to improve on. Maybe some design ideas will emerge. |
@tomnelson I'm not particularly advocating for getting rid of the capability to update visualizations while the layout is updating, I'm just trying to [encourage us to] explore the design space a bit more. :) Regarding "who's got the graph", IMO the answer should be that the visualization model has the graph; that seems like the right place to put it. This implies that if the visualization model's graph is replaced, it's the model's job to create a new layout for the new graph; it also implies that the model must listen for updates to the graph, and dispatch updates to the layout (and the rendering piece) as appropriate. The one fly in the ointment with that plan is that you want some way to be able to enable the scenario in which the user wants to visualize a new graph, but because the new graph has the same node set as the old one [or a subset], they want to use the same node positions. So we should think about how that should work. Yes, there's a lot of duplication in the I would be happy to have someone provide an implementation of |
I'm saddened that we use the word 'Layout' instead of calling it 'TheThingThatHoldsTheGraphAndItsNodeLocationsAndCanComputeThemWithAnAlgorithm' because if we named it the latter, there would be less of a problem with getting the graph-like-thing from it. I really don't like having the graph-like-thing stateful in 2 (or more!) places and requiring event dispatch to keep them in sync, Oh, you want a Non trivial implementation..... Sorry, I'm starting easy! I have to go back to old versions where the visualization still works to enjoy testing it though. |
Something has to hold the graph instance for visualization purposes. I agree that, ideally, there would be only one such instance so that we don't have to worry about keeping the instances in sync. So let's walk through this a bit. The input requirements of layouts (in terms of what they need to know about the graph topology) differ:
(Side note: I've wanted for years to have a better initialization state for the force-directed layouts than "random positions"; a relatively simple option would be to start with a node, traverse the graph, and place nodes at random positions that are a function of the positions of already-placed nodes that they're connected to. It should help force-directed layouts converge a lot faster.) There is no (and will be no) single interface that all of these things implement. A The visualization system, by contrast, presumes that it has a graph and can at least iterate over the node and edge sets, and recover the endpoints for the edges. (I don't recall a context at the moment in which the visualization system needs to know anything further about the topology.) Note that some of this information (which node is the root? how are values associated with edges?) are not relevant to the visualization system. Kinds of synchronization issues that we have to worry about:
I'd have to go digging through the visualization code, but I believe that currently, when paint() is called, everything gets re-rendered, even if only a small part of the graph has changed. I would imagine that if we had finer-grained information about such changes that we could make repainting way faster, but maybe I'm just terminally confused. (Note that [More on this in a bit; I'm about to start my commute and switch computers.] |
We already have a trivial implementation of a spatial data structure, it's the one where you have to iterate through all the nodes and get their locations from the layout. :) A slightly less trivial version would be to divide the layout canvas into a grid, and then cause each grid section to know what nodes were in it (each node could "know" what section it was in via an appropriate transformation of its coordinates). At that point if you want to know what's within radius X of a point, you get its grid, figure out what grids comprise the area defined by the radius, and iterate over those grids to find out what's in them; you can even do so in an appropriate pattern if you're looking for the closest element. (This is basically something like a single-level quadtree, I think.) |
That's a lot to respond to. Here is my summary: I made a grid-based spatial data structure today, very much like you described. |
Agreed on the basic requirements of a Going back to my mini-lecture... Layouts and visualizations have different requirements for what kinds of information they need about the graph on which they're operating, although in some cases they do overlap. Once you've figured out the positions for each node, the graph topology (and the algorithm by which the positions are determined) become irrelevant. I think this may be a key insight. Put a different way: a layout algorithm is a function that takes node positions (which may be either explicitly specified or unspecified) and some information about the graph, and generates a new set of node positions. In this formulation:
So at this point we can set it up like this:
// this one needs input locations and a graph
Function<N, Location> nodeRepulsion(Function<N, Location> nodeLocations, Graph<N> graph)
// this one needs only a set of nodes
Function<N, Location> circleLayout(Set<N> nodes)
// this one needs only a root and a way of getting the successors (children) of a node
Function<N, Location> treeLayout)(N root, SuccessorsFunction<N> successorsFunction) Note that force-directed algorithms that have separate alternating steps (e.g. node repulsion and edge length adjustment) can actually be made into separate algorithms and composed. One thing that the above design is lacking is a unified interface for layout algorithms, so that the ones that need iteration can use a single interface, but I haven't figured one out, or even determined that one necessarily must exist. For now, we'll presume a @FunctionalInterface
interface GraphLayoutAlgorithm<N> {
Function<N, Location> apply(Function<N, Location> locations, Graph<N> graph);
} The iterative wrapper for such an algorithm might then look something like this: class GraphLayoutIterator<N, E> {
// these are initialized by a Builder that must specify at least one or the other
int maxIterations;
Predicate<Function<N, Location>> convergencePredicate;
public static Function<N, Location> iterate(Graph<N>graph, GraphLayoutAlgorithm<N> algorithm) {
// iterate until we've hit max iterations or the convergence predicate is satisfied
// return the final node locations
}
} (Note that a similar thing can happen to things like At this point, I think we've got a design where the layout algorithms don't need to maintain a reference to a graph, because they are given that information as they need it; they're basically stateless. And that means that the visualization system--specifically, the (I really need to start creating design docs for these discussions, doing it in the comments on an issue is kind of terrible. :P ) Anyway, I'd be very interested in getting reactions to this proposal, but so far I think I like it; it seems to make things simpler overall, by making the individual components simpler and composable. |
Awesome to hear about the grid-based spatial data structure--thanks! |
I need to read this about 5 more times before I can give a better answer. For now, here is my gut reaction: We have this thing that is like a mediator between positional storage, an algorithm, and a graph. I believe that if we conflate the idea of the 'mediator' thing with the 'algorithm', then our need for purity in the way we define items from the study of graph theory cause us to have thoughts like: 'Why should FRLayout be a source of the graph it uses?' If we just think of this thing as a place where all 3 of those meet, then there is no shame in using it as a vehicle for all of them. For example, in some imaginary rendering toolkit, there could be a graphics context like thing that carries around the colors, pen-styles, fonts, etc to draw stuff. I could say something like: 'It is not the responsibility of the graphics context to tell you what color it is using internally. If you want to know the color, you need to store the color separately and do your own thing to keep it in sync with what the graphics context is using". Please design Layout the way that feels best from a Graph Theory perspective. The visualization code will have to adapt to that, probably by making a new class that is a mediator between 'Layout' and Graph. :( I will continue to play with spatial data structures. What I have now is better suited to detecting collisions in an animation than for graph layout, but it is a start. I'll push a branch with it when I have something that is not too embarrassing. It will likely be a pre-common.graph branch so we can 'see' it work. |
Your example: public static Function<N, Location> iterate(Graphgraph, GraphLayoutAlgorithm algorithm) { // iterate until we've hit max iterations or the convergence predicate is satisfied |
re: static method of More in a separate comment. |
A short comment about One of the points I was trying to make in my most recent mini-lecture was that the requirements for what a layout algorithm needs to know about a graph, and what the visualization system needs to know, are not necessarily the same in all cases. To me, that's one important reason for experimenting with making them separate. The next steps here from me will be to put a design doc together to give us a place to thrash these ideas out in a better venue than issue comments, and to then figure out in more detail what the APIs, and the resultant user code, would look like. I've started on one and will link to it from here once it's somewhat more complete. FYI, I don't necessarily insist on the above-outlined design; it's almost guaranteed that anything I come up with will have flaws that are more easily spotted by others. But I think that it's worth exploring. |
Getting back (for a moment) to the question of where Layout and layout implementations should be (algorithms or visualization) I believe that for anyone building a new visualization system (as I have experimented/played with in Javascript and Python), they will want to run the layout algorithms on the (javascript or python) client side so they will have to be converted too. Even in JavaFX, you would probably want to move the layouts over to visualization, if only to deal with conversion of the Point and Dimension classes. |
I have a branch where I removed all java.awt.Dimension usage from jung-algorithms. The layouts only utilize the width and height internally so this is very low-hanging fruit (other than a lot of editing of source). |
Potential areas for improvement:
The text was updated successfully, but these errors were encountered: