Close

Standalone vs YARN cluster for Flink

2017/01/142017/01/14 Mingmin1 Comment

Flink offers two options to setup a cluster, one is standalone cluster, and the other is based on YARN.

Here I’ll list the pros/cons, to do a comparison.

Standalone mode

pros

no dependency on external components;
easy to add/remove TaskManager in the cluster;
easy for debug, and log retrieve;

cons

No job isolation as slots share the same JVM, refer to Job Isolation on Flink;
Need to have a zookeeper for node failure recovery;

YARN mode

More specifically, you have two choices with YARN, see yarn setup

set up Flink session, similar as a virtual cluster;
run Flink job directly on YARN

pros

job isolation provided by YARN;
node failure auto-recovery;
flexible resource capacity per TaskManager for different jobs;

cons

external cost for YARN;
So far YARN is tied closed with a distribution file system, HDFS/AWS/GoogleCloud;

In our environment, we decide to go with YARN finally. As we value the isolation feature much more than others, to support multiple tenants.