This code amends the libDAI library to provide:
- A new parameter learning implementation called Age-Layered Expectation Maximization (ALEM), inspired by the ALPS genetic algorithm work of Greg Hornby of NASA Ames.
- A distributed parameter learning implementation using MapReduce and a population structure, with either ALEM or a large random restart-equivalent which I call Multiple Expectation Maximzation (MEM).
This code can be used to recreate the results shown in my paper:
Additionally, this work can be used (with more emphasis on high performance computing) to recreate the algorithms suggested in the following two papers:
# working directory: ./mapreduce
make
./main.sh
This is useful for testing purposes before deploying on Amazon EC2 or another Hadoop cluster.
-
Install Hadoop (tested up to v1.1.1)
-
Set up environment variables (e.g.) export HADOOP_PREFIX=/home/erik/hadoop/hadoop-1.1.1
Also set up Hadoop for pseudo-distributed operation: http://hadoop.apache.org/docs/r1.1.0/single_node_setup.html
-
Run the following (I have hadoop in my path):
# working directory: .../libdai/mapreduce
make clobber # WARNING: this clears any existing HDFS data
# which seems to cause a bug sometimes when
# accessing the HDFS
hadoop namenode -format
start-all
- Initialize a BN and some data
./scripts/init dat/bnets/asia.net 100 4
- Start Hadoop streaming
make
./scripts/streaming dat/in
- Gander at results
ls out
- Stop Hadoop
stop-all
-
$ make
-
Launch an EC2 instance and send the mapreduce folder to it.
scp -rCi ~/Downloads/dai-mapreduce.pem mapreduce [email protected]:
- ssh into the instance and launch a cluster.
ssh -Ci ~/Downloads/dai-mapreduce.pem [email protected]
# Assumes hadoop-ec2 has been configured.
# Launch a cluster of 10 (small) instances
hadoop-ec2 launch-cluster dai-mapreduce 10
# ... wait a really long time for the cluster to initialize
- Push the mapreduce folder to the cluster master.
hadoop-ec2 push dai-mapreduce mapreduce
- Login to the cluster master.
hadoop-ec2 login dai-mapreduce
- Initialize a big bnet and start Hadoop streaming
./scripts/init dat/bnets/water.net 1000 10
# Do standard EM, pop-size=10, mappers=10
./scripts/streaming dat/in -u 10 10
# ... wait a few hours
- Collect data!
ls out