Neo4j 3.1 beta3 + docker: Creating a Causal Cluster
Over the weekend I’ve been playing around with docker and learning how to spin up a Neo4j Causal Cluster.
Causal Clustering is Neo4j’s new clustering architecture which makes use of Diego Ongaro’s Raft consensus algorithm to ensure writes are committed on a majority of servers. It’ll be available in the 3.1 series of Neo4j which is currently in beta. I’ll be using BETA3 in this post.
I don’t know much about docker but luckily my colleague Kevin Van Gundy wrote a blog post a couple of weeks ago explaining how to spin up Neo4j inside a docker container which was very helpful for getting me started.
Kevin spins up a single Neo4j server using the latest released version, which at the time of writing is 3.0.7. Since we want to use a beta version we’ll need to use a docker image from the neo4j-experimental repository.
We’re going to create 3 docker instances, each running Neo4j, and have them form a cluster. We’ll name them instance0, instance1, and instance2. We’ll create config files for each instance on the host machine and refer to those from our docker instance. This is the config file for instance0:
/tmp/ce/instance0/conf/neo4j.conf
unsupported.dbms.edition=enterprise
dbms.mode=CORE
dbms.security.auth_enabled=false
dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=512m
dbms.memory.pagecache.size=100M
dbms.tx_log.rotation.retention_policy=false
dbms.connector.bolt.type=BOLT
dbms.connector.bolt.enabled=true
dbms.connector.bolt.listen_address=0.0.0.0:7687
dbms.connector.http.type=HTTP
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=0.0.0.0:7474
dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address=instance0
causal_clustering.initial_discovery_members=instance0:5000,instance1:5000,instance2:5000
causal_clustering.leader_election_timeout=2s
The only config that changes between instances is dbms.connectors.default_advertised_address which would have a value of instance1 or instance2 for the other members of our cluster.
We can create a docker instance using this config:
docker run --name=instance0 --detach \
--publish=7474:7474 \
--publish=7687:7687 \
--net=cluster \
--hostname=instance0 \
--volume /tmp/ce/instance0/conf:/conf \
--volume /tmp/ce/instance0/data:/data \
neo4j/neo4j-experimental:3.1.0-M13-beta3-enterprise
We create the network 'cluster' referenced on the 4th line like this:
docker network create --driver=bridge cluster
It’s a bit of a pain having to create these config files and calls to docker by hand but luckily Michael has scripted the whole thing for us.
docker.sh
function config {
mkdir -p /tmp/ce/$1/conf
cat > /tmp/ce/$1/conf/neo4j.conf << EOF
unsupported.dbms.edition=enterprise
dbms.mode=CORE
dbms.security.auth_enabled=false
dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=512m
dbms.memory.pagecache.size=100M
dbms.tx_log.rotation.retention_policy=false
dbms.connector.bolt.type=BOLT
dbms.connector.bolt.enabled=true
dbms.connector.bolt.listen_address=0.0.0.0:7687
dbms.connector.http.type=HTTP
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=0.0.0.0:7474
dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address=${1}
causal_clustering.initial_discovery_members=instance0:5000,instance1:5000,instance2:5000
causal_clustering.leader_election_timeout=2s
EOF
}
function run {
HOST=$1
INSTANCE=instance$HOST
config $INSTANCE
docker run --name=$INSTANCE --detach \
--publish=$[7474+$HOST]:7474 \
--publish=$[7687+$HOST]:7687 \
--net=cluster \
--hostname=$INSTANCE \
--volume /tmp/ce/$INSTANCE/conf:/conf \
--volume /tmp/ce/$INSTANCE/data:/data \
neo4j/neo4j-experimental:3.1.0-M13-beta3-enterprise
}
docker network create --driver=bridge cluster
run 0
run 1
run 2
Once we run the script we can run the following command to check that the cluster has come up:
$ docker logs instance0
Starting Neo4j.
2016-11-13 11:46:55.863+0000 INFO Starting...
2016-11-13 11:46:57.241+0000 INFO Bolt enabled on 0.0.0.0:7687.
2016-11-13 11:46:57.255+0000 INFO Initiating metrics...
2016-11-13 11:46:57.439+0000 INFO Waiting for other members to join cluster before continuing...
2016-11-13 11:47:17.816+0000 INFO Started.
2016-11-13 11:47:18.054+0000 INFO Mounted REST API at: /db/manage
2016-11-13 11:47:19.068+0000 INFO Remote interface available at http://instance0:7474/
Each instance is available at port 7474 but we’ve mapped these to different ports on the host OS by using this line in the parameters we passed to docker run:
--publish=$[7474+$HOST]:7474
We can therefore access each of these Neo4j instances from the host OS at the following ports:
instance0 -> http://localhost:7474
instance1 -> http://localhost:7475
instance2 -> http://localhost:7476
If we open one of those we’ll be confronted with the following dialog:
This is a bit strange as we explicitly disabled security in our config.
The actual problem is that the Neo4j browser is unable to communicate with the underlying database. There are two ways to work around this:</p>
Connect using HTTP instead of BOLT
We can tell the browser to connect to the database using the HTTP protocol rather than BOLT by unticking the checkbox:
Update the BOLT host
Or we can update the Bolt host value to refer to a host:port value that’s accessible from the host OS. Each server is accessible from port 7687 but we mapped those ports to different ports on the host OS with this flag that we passed to docker run:
--publish=$[7687+$HOST]:7687 \
We can access BOLT from the following ports:
instance0 -> localhost:7687
instance1 -> localhost:7688
instance2 -> localhost:7689
Let’s try changing it for instance2:
You might have to refresh your web browser after you change value but it usually updates automatically. We can run the :sysinfo command in the browser to see the state of our cluster:
And we’re good to go. The full script is available as a gist if you want to give it a try.
Let me know how you get on!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.