Kubernetes: Spinning up a Neo4j 3.1 Causal Cluster
A couple of weeks ago I wrote a blog post explaining how I’d created a Neo4j causal cluster using docker containers directly and for my next pet project I wanted to use Kubernetes as an orchestration layer so that I could declaratively change the number of servers in my cluster.
I’d never used Kubernetes before but I saw a presentation showing how to use it to create an Elastic cluster at the GDG Cloud meetup a couple of months ago.
In that presentation I was introduced to the idea of a PetSet which is an abstraction exposed by Kubernetes which allows us to manage a set of pods (containers) which have a fixed identity. The documentation explains it better:
A PetSet ensures that a specified number of “pets” with unique identities are running at any given time. The identity of a Pet is comprised of:
a stable hostname, available in DNS
an ordinal index
stable storage: linked to the ordinal & hostname
In my case I need to have a stable hostname because each member of a Neo4j cluster is given a list of other cluster members with which it can create a new cluster or join an already existing one. This is the first use case described in the documentation:
PetSet also helps with the 2 most common problems encountered managing such clustered applications:
discovery of peers for quorum
startup/teardown ordering
So the first thing we need to do is create some stable storage for our pods to use.
We’ll create a cluster of 3 members so we need to create one PersistentVolume for each of them. The following script does the job:
volumes.sh
for i in $(seq 0 2); do
cat <<EOF | kubectl create -f -
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv${i}
labels:
type: local
app: neo4j
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/tmp/${i}"
EOF
cat <<EOF | kubectl create -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: datadir-neo4j-${i}
labels:
app: neo4j
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
EOF
done;
If we run this script it’ll create 3 volumes which we can see by running the following command:
$ kubectl get pv
NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
pv0 1Gi RWO Bound default/datadir-neo4j-0 7s
pv1 1Gi RWO Bound default/datadir-neo4j-1 7s
pv2 1Gi RWO Bound default/datadir-neo4j-2 7s
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
datadir-neo4j-0 Bound pv0 1Gi RWO 26s
datadir-neo4j-1 Bound pv1 1Gi RWO 26s
datadir-neo4j-2 Bound pv2 1Gi RWO 25s
Next we need to create a PetSet template. After a lot of iterations I ended up with the following:
# Headless service to provide DNS lookup
apiVersion: v1
kind: Service
metadata:
labels:
app: neo4j
name: neo4j
spec:
clusterIP: None
ports:
- port: 7474
selector:
app: neo4j
---
# new API name
apiVersion: "apps/v1alpha1"
kind: PetSet
metadata:
name: neo4j
spec:
serviceName: neo4j
replicas: 3
template:
metadata:
annotations:
pod.alpha.kubernetes.io/initialized: "true"
pod.beta.kubernetes.io/init-containers: '[
{
"name": "install",
"image": "gcr.io/google_containers/busybox:1.24",
"command": ["/bin/sh", "-c", "echo \"
unsupported.dbms.edition=enterprise\n
dbms.mode=CORE\n
dbms.connectors.default_advertised_address=$HOSTNAME.neo4j.default.svc.cluster.local\n
dbms.connectors.default_listen_address=0.0.0.0\n
dbms.connector.bolt.type=BOLT\n
dbms.connector.bolt.enabled=true\n
dbms.connector.bolt.listen_address=0.0.0.0:7687\n
dbms.connector.http.type=HTTP\n
dbms.connector.http.enabled=true\n
dbms.connector.http.listen_address=0.0.0.0:7474\n
causal_clustering.raft_messages_log_enable=true\n
causal_clustering.initial_discovery_members=neo4j-0.neo4j.default.svc.cluster.local:5000,neo4j-1.neo4j.default.svc.cluster.local:5000,neo4j-2.neo4j.default.svc.cluster.local:5000\n
causal_clustering.leader_election_timeout=2s\n
\" > /work-dir/neo4j.conf" ],
"volumeMounts": [
{
"name": "confdir",
"mountPath": "/work-dir"
}
]
}
]'
labels:
app: neo4j
spec:
containers:
- name: neo4j
image: "neo4j/neo4j-experimental:3.1.0-M13-beta3-enterprise"
imagePullPolicy: Always
ports:
- containerPort: 5000
name: discovery
- containerPort: 6000
name: tx
- containerPort: 7000
name: raft
- containerPort: 7474
name: browser
- containerPort: 7687
name: bolt
securityContext:
privileged: true
volumeMounts:
- name: datadir
mountPath: /data
- name: confdir
mountPath: /conf
volumes:
- name: confdir
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
The main thing I had trouble with was getting the members of the cluster to talk to each other. The default docker config uses hostnames but I found that pods were unable to contact each other unless I specified the FQDN in the config file. We can run the following command to create the PetSet:
$ kubectl create -f neo4j.yaml
service "neo4j" created
petset "neo4j" created
We can check if the pods are up and running by executing the following command:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
neo4j-0 1/1 Running 0 2m
neo4j-1 1/1 Running 0 14s
neo4j-2 1/1 Running 0 10s
And we can tail neo4j’s log files like this:
$ kubectl logs neo4j-0
Starting Neo4j.
2016-11-25 16:39:50.333+0000 INFO Starting...
2016-11-25 16:39:51.723+0000 INFO Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:51.733+0000 INFO Initiating metrics...
2016-11-25 16:39:51.911+0000 INFO Waiting for other members to join cluster before continuing...
2016-11-25 16:40:12.074+0000 INFO Started.
2016-11-25 16:40:12.428+0000 INFO Mounted REST API at: /db/manage
2016-11-25 16:40:13.350+0000 INFO Remote interface available at http://neo4j-0.neo4j.default.svc.cluster.local:7474/
$ kubectl logs neo4j-1
Starting Neo4j.
2016-11-25 16:39:53.846+0000 INFO Starting...
2016-11-25 16:39:56.212+0000 INFO Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:56.225+0000 INFO Initiating metrics...
2016-11-25 16:39:56.341+0000 INFO Waiting for other members to join cluster before continuing...
2016-11-25 16:40:16.623+0000 INFO Started.
2016-11-25 16:40:16.951+0000 INFO Mounted REST API at: /db/manage
2016-11-25 16:40:17.607+0000 INFO Remote interface available at http://neo4j-1.neo4j.default.svc.cluster.local:7474/
$ kubectl logs neo4j-2
Starting Neo4j.
2016-11-25 16:39:57.828+0000 INFO Starting...
2016-11-25 16:39:59.166+0000 INFO Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:59.176+0000 INFO Initiating metrics...
2016-11-25 16:39:59.329+0000 INFO Waiting for other members to join cluster before continuing...
2016-11-25 16:40:19.216+0000 INFO Started.
2016-11-25 16:40:19.675+0000 INFO Mounted REST API at: /db/manage
2016-11-25 16:40:21.029+0000 INFO Remote interface available at http://neo4j-2.neo4j.default.svc.cluster.local:7474/
I wanted to log into the servers from my host machine’s browser so I setup port forwarding for each of the servers:
$ kubectl port-forward neo4j-0 7474:7474 7687:7687
We can then get an overview of the cluster by running the following procedure:
CALL dbms.cluster.overview()
╒════════════════════════════════════╤═════════════════════════════════════════════════════╤════════╕
│id │addresses │role │
╞════════════════════════════════════╪═════════════════════════════════════════════════════╪════════╡
│81d8e5e2-02db-4414-85de-a7025b346e84│[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687,│LEADER │
│ │ http://neo4j-0.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│347b7517-7ca0-4b92-b9f0-9249d46b2ad3│[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-1.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│a5ec1335-91ce-4358-910b-8af9086c2969│[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-2.neo4j.default.svc.cluster.local:7474]│ │
└────────────────────────────────────┴─────────────────────────────────────────────────────┴────────┘
So far so good. What if we want to have 5 servers in the cluster instead of 3? We can run the following command to increase our replica size:
$ kubectl patch petset neo4j -p '{"spec":{"replicas":5}}'
"neo4j" patched
Let’s run that procedure again:
CALL dbms.cluster.overview()
╒════════════════════════════════════╤═════════════════════════════════════════════════════╤════════╕
│id │addresses │role │
╞════════════════════════════════════╪═════════════════════════════════════════════════════╪════════╡
│81d8e5e2-02db-4414-85de-a7025b346e84│[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687,│LEADER │
│ │ http://neo4j-0.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│347b7517-7ca0-4b92-b9f0-9249d46b2ad3│[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-1.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│a5ec1335-91ce-4358-910b-8af9086c2969│[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-2.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│28613d06-d4c5-461c-b277-ddb3f05e5647│[bolt://neo4j-3.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-3.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│2eaa0058-a4f3-4f07-9f22-d310562ad1ec│[bolt://neo4j-4.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-4.neo4j.default.svc.cluster.local:7474]│ │
└────────────────────────────────────┴─────────────────────────────────────────────────────┴────────┘
Neat! And it’s as easy to go back down to 3 again:
$ kubectl patch petset neo4j -p '{"spec":{"replicas":3}}'
"neo4j" patched
CALL dbms.cluster.overview()
╒════════════════════════════════════╤══════════════════════════════════════════════════════╤════════╕
│id │addresses │role │
╞════════════════════════════════════╪══════════════════════════════════════════════════════╪════════╡
│81d8e5e2-02db-4414-85de-a7025b346e84│[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687, │LEADER │
│ │http://neo4j-0.neo4j.default.svc.cluster.local:7474] │ │
├────────────────────────────────────┼──────────────────────────────────────────────────────┼────────┤
│347b7517-7ca0-4b92-b9f0-9249d46b2ad3│[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687, │FOLLOWER│
│ │http://neo4j-1.neo4j.default.svc.cluster.local:7474] │ │
├────────────────────────────────────┼──────────────────────────────────────────────────────┼────────┤
│a5ec1335-91ce-4358-910b-8af9086c2969│[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687, │FOLLOWER│
│ │http://neo4j-2.neo4j.default.svc.cluster.local:7474] │ │
└────────────────────────────────────┴──────────────────────────────────────────────────────┴────────┘
Next I need to look at how we can add read replicas into the cluster. These don’t take part in the membership/quorum algorithm so I think I’ll be able to use the more common ReplicationController/Pod architecture for those.
If you want to play around with this the code is available as a gist. I’m using the minikube library for all my experiments but I’ll hopefully get around to trying this on GCE or AWS soon.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.