Skip to content

Commit

Permalink
Add occurrences to output dumps.
Browse files Browse the repository at this point in the history
  • Loading branch information
pbloem committed May 9, 2022
1 parent c80ae2e commit f90af05
Show file tree
Hide file tree
Showing 5 changed files with 65 additions and 11 deletions.
31 changes: 27 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ java -jar motive.jar --type fast --file data.txt --minsize 3 --maxsize 10 --samp
The "full" experiment includes the precise DS model as well. This is a bit slower.
```bash
java -jar motive.jar --type full --file data.txt --minsize 3 --maxsize 5 --samples 100000 --maxmotifs 30

```

### Input Data format
Expand All @@ -67,13 +66,37 @@ You can find some examples [packaged with the Nodes library](https://github.com/

The GML format is also supported with the switch ``--filetype gml``. This is not well tested, so your mileage may vary.

#### Large data

If your graph is large (in the order of millions of edges), you may need to use a disk-backed store. For this, the graph should be pre-loaded. This is done with the command

```bash
java -jar motive.jar --type preload --file data.txt
```

The graph will be loaded into a disk-backed database called `graph.db`. You can then run an experiment with the type `fast.disk`.

```bash
java -jar motive.jar --type fast.disk --file data.db
```

Make sure to use the `-Xmx` argument (before the `-jar` argument) to set the heap size as large as you can on both commands. For a node with 64 Gb of memory, we found that `-Xmx56g` was the largest to give stable results.


### Output format

The command line tool produces its primary output as a collection of text files. For each (potential) motif found, one .edgelist file is produced. This file captures _only the structure_ of the motif, not its labels in the original graph.
The command line tool produces its primary output as a collection of text files. For each (potential) motif found, one .edgelist file is produced. This file captures _only the structure_ of the motif, not its labels in the original graph. That is, if the motif has the shape `o -> o <- o`, then the corresponding edgelist file may look like

```
0 1
1 2
```

For each motif, there will be a CSV file containing its occurrences. If the motif has k nodes (i.e. k=3 in the example above), then each line in the occurrences file identifies a place where the motif occurs by providing a graph node integer (corresponding to the integers in the input file) for each node in the motif.

The files numbers.csv contains the compression ratios for each motifs. metadata.json contains some additional information, mostly required for producing the correct plot.
The files `numbers.csv` contains the compression ratios for each motifs. The columns are, in order, the motif frequencies, the compression ratios for the ER model, compression ratios for the EL model, and (if the `full` experiment is run) compression ratios for the DS model. In practice the EL model provides a good trade-off between speed and motif quality.

Currently, the instances of each motif in the graph are not written to a file. We hope to add this functionality in a future update. From within the code, this information _is_ available. After running the sampling experiment, you can use the method ``occurrences(...)`` in DPlainMotifExtractor and UPlainMotifExtractor to get a list of instances for a given motif.
`metadata.json` contains some additional information, mostly required for producing the correct plot.

### Plotting

Expand Down
10 changes: 7 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
<configuration>
<source>1.8</source>
<target>1.8</target>


</configuration>
</plugin>

Expand Down Expand Up @@ -73,6 +75,7 @@
</execution>
</executions>
</plugin>

</plugins>
</build>

Expand All @@ -89,11 +92,12 @@
<artifactId>json</artifactId>
<version>20160212</version>
</dependency>

<dependency>
<groupId>data2semantics</groupId>
<groupId>com.github.Data2Semantics</groupId>
<artifactId>nodes</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
<version>v0.1.20</version>
</dependency>
<!-- <dependency> -->
<!-- <groupId>data2semantics</groupId> -->
<!-- <artifactId>nodes</artifactId> -->
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/nl/peterbloem/motive/MotifModel.java
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
import java.util.TreeSet;
import java.util.concurrent.ExecutorService;

import javax.jws.WebParam.Mode;
//import javax.jws.WebParam.Mode;

import org.nodes.DGraph;
import org.nodes.DLink;
Expand Down
32 changes: 29 additions & 3 deletions src/main/java/nl/peterbloem/motive/exec/CompareLarge.java
Original file line number Diff line number Diff line change
Expand Up @@ -163,8 +163,7 @@ public void main() throws IOException
for(Graph<String> sub : subsAll)
frequenciesAll.add(ex.frequency((DGraph<String>)sub));

final List<List<List<Integer>>> occurrences =
new ArrayList<List<List<Integer>>>(subsAll.size());
final List<List<List<Integer>>> occurrences = new ArrayList<List<List<Integer>>>(subsAll.size());
for(Graph<String> sub : subsAll)
occurrences.add(ex.occurrences((DGraph<String>)sub));

Expand Down Expand Up @@ -281,10 +280,37 @@ public void run() {
int i = 0;
for(Graph<String> sub : subs)
{
// * Write the motif structure as a graph
File graphFile = new File(String.format("motif.%03d.edgelist", i));
Data.writeEdgeList(sub, graphFile);

i++;

// * Write all occurrences of the motif to a file
List<List<Integer>> occs = ex.occurrences((DGraph<String>)sub);

File occFile = new File(String.format("motif.%03d.occurrences.csv", i));
BufferedWriter occWriter = new BufferedWriter(new FileWriter(occFile));

for (List<Integer> occ : occs)
{
boolean first = true;
for (int val : occ)
{
if (first)
first = false;
else
occWriter.write(",");

occWriter.write(val + "");

}

occWriter.write("\n");
}

occWriter.close();

i ++;
}

JSONObject obj = new JSONObject();
Expand Down
1 change: 1 addition & 0 deletions src/main/java/nl/peterbloem/motive/exec/Run.java
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,7 @@ public static void main(String[] args)
}

Global.log().info("Experiment finished. Time taken: "+(Functions.toc())+" seconds.");

} else if ("fast".equals(type.toLowerCase()))
{
Global.log().info("Experiment type: fast");
Expand Down

0 comments on commit f90af05

Please sign in to comment.