Standalone vs YARN cluster for Flink

Flink offers two options to setup a cluster, one is standalone cluster, and the other is based on YARN.

Here I’ll list the pros/cons, to do a comparison.

Standalone mode

pros

  • no dependency on external components;
  • easy to add/remove TaskManager in the cluster;
  • easy for debug, and log retrieve;

cons

  • No job isolation as slots share the same JVM, refer to Job Isolation on Flink;
  • Need to have a zookeeper for node failure recovery;

YARN mode

More specifically, you have two choices with YARN, see yarn setup

  • set up Flink session, similar as a virtual cluster;
  • run Flink job directly on YARN

pros

  • job isolation provided by YARN;
  • node failure auto-recovery;
  • flexible resource capacity per TaskManager for different jobs;

cons

  • external cost for YARN;
  • So far YARN is tied closed with a distribution file system, HDFS/AWS/GoogleCloud;

In our environment, we decide to go with YARN finally. As we value the isolation feature much more than others, to support multiple tenants.