1 Introduction
This feature is only available in the neo4j-enterprise edition Please add a dependency to the neo4j-enteprise gem and require it (in upcomming 2.0.0 release)
The Neo4j High Availability (HA) project has the following two goals:
- Provide a fault-tolerant database architecture, where several Neo4j slave databases can be configured to be exact replicas of a single Neo4j master database. This allows the end-user system to be fully functional and both read and write to the database in the event of hardware failure.
- Provide a horizontally scaling read-mostly architecture that enables the system to handle much more read load than a single Neo4j database.
Neo4j HA uses a single master and multiple slaves. Both the master and the slaves can accept write requests. A slave handles a write by synchronizing with the master to preserve consistency. Updates to slaves are asynchronous so a write from one slave is not immediately visible on all other slaves. This is the only difference between HA and single node operation (all other ACID characteristics are the same).
2 Installation of ZooKeeper
The example/ha-cluster example contains a complete configuration and setup for running ZooKeeper.
You can also set up zookeeper yourself by using the following instructions:
Go to zookeeper, select a mirror and grab the 3.3.2 release.
Unpack somewhere and create three config files called server1.cfg, server2.cfg and server3.cfg in the conf directory:
#server1.cfg tickTime=2000 initLimit=10 syncLimit=5 dataDir=data/zookeeper1 clientPort=2181 server.1=localhost:2888:3888 server.2=localhost:2889:3889 server.3=localhost:2890:3890
The other two config files will have a different dataDir and clientPort set, but the other parameters are identical to the first one:
#server2.cfg #... dataDir=data/zookeeper2 clientPort=2182 #... #server3.cfg dataDir=data/zookeeper3 clientPort=2183
Create the data directories:
zookeeper-3.3.2$ mkdir -p data/zookeeper1 data/zookeeper2 data/zookeeper3
Next we need to create a file in each data directory called “myid” that contains an id for each server equal to the number in “server.1” “server.2” and “server.3” from the configuration files.
zookeeper-3.3.2$ echo '1' > data/zookeeper1/myid zookeeper-3.3.2$ echo '2' > data/zookeeper2/myid zookeeper-3.3.2$ echo '3' > data/zookeeper3/myid
We are now ready to start the ZooKeeper instances:
zookeeper-3.3.2$ java -cp lib/log4j-1.2.15.jar:zookeeper-3.3.2.jar org.apache.zookeeper.server.quorum.QuorumPeerMain conf/server1.cfg & zookeeper-3.3.2$ java -cp lib/log4j-1.2.15.jar:zookeeper-3.3.2.jar org.apache.zookeeper.server.quorum.QuorumPeerMain conf/server2.cfg & zookeeper-3.3.2$ java -cp lib/log4j-1.2.15.jar:zookeeper-3.3.2.jar org.apache.zookeeper.server.quorum.QuorumPeerMain conf/server3.cfg &
For more information on ZooKeeper see here
3 Configure Neo4j.rb
You must set the Neo4j::Config['ha.db']=true configuration in order to start a HA clustered database (HighlyAvailableGraphDatabase) instead of a local graph database.
If the 'ha.db' configuration value is set to true it will also use the following configuration properties:
ha.db: true
ha.machine_id: 2
ha.server: 'localhost:6002'
ha.zoo_keeper_servers: 'localhost:2181,localhost:2182,localhost:2183'
The default configuration can be found here
4 Chef and Vagrant Scripts
Check out this cookbook.
5 Gotchas
You should only write to slave nodes. You can check if a node is a slave or master by
Neo4j.management(org.neo4j.management.HighAvailability).is_master
You can also get this info from the jconsole or neo4j-shell and the hainfo command, check the monitoring page
For more information, check the neo4j wiki.