Close

Standalone vs YARN cluster for Flink

2017/01/142017/01/14 Mingmin1 Comment

Flink offers two options to setup a cluster, one is standalone cluster, and the other is based on YARN.

Here I’ll list the pros/cons, to do a comparison.

Standalone mode

pros

no dependency on external components;
easy to add/remove TaskManager in the cluster;
easy for debug, and log retrieve;

cons

No job isolation as slots share the same JVM, refer to Job Isolation on Flink;
Need to have a zookeeper for node failure recovery;

YARN mode

More specifically, you have two choices with YARN, see yarn setup

set up Flink session, similar as a virtual cluster;
run Flink job directly on YARN

pros

job isolation provided by YARN;
node failure auto-recovery;
flexible resource capacity per TaskManager for different jobs;

cons

external cost for YARN;
So far YARN is tied closed with a distribution file system, HDFS/AWS/GoogleCloud;

In our environment, we decide to go with YARN finally. As we value the isolation feature much more than others, to support multiple tenants.

One thought on “Standalone vs YARN cluster for Flink”

Run Flink Jobs on YARN | Streaming says:

2017/01/16 at 12:28

[…] Standalone vs YARN cluster for Flink […]

Comments are closed.