ecelerity-cluster.conf

March 26, 2020 Contributors

Name

ecelerity-cluster.conf — The cluster-specific configuration file included from within ecelerity.conf

Description

ecelerity-cluster.conf configures cluster behavior. The configuration options of the ecelerity-cluster.conf file are discussed here.

# Ensure that we have appropriate privileges
Security {
  user = "ecuser"
  group = "ecuser"
  # Linux
  Capabilities = "cap_net_admin+ep cap_net_bind_service+ep cap_net_raw+ep cap_sys_resource+ep"
  # Solaris
  Privileges = "basic net_privaddr net_bindmlp sys_resource sys_net_config net_rawaccess"
}

ec_logger "ec_logger_cluster" {
  mainlog = "cluster:///var/log/ecelerity/mainlog.cluster=>master"
  paniclog = "cluster:///var/log/ecelerity/paniclog.cluster=>master"
  rejectlog = "cluster:///var/log/ecelerity/rejectlog.cluster=>master"
  acctlog = "cluster:///var/log/ecelerity/acctlog.cluster=>master"
}

bounce_logger "bounce_logger_cluster" {
  bouncelog = "cluster:///var/log/ecelerity/bouncelog.cluster=>master"
}
# The ECCluster_Listener stanza is available as of version 3.0.15
ECCluster_Listener {
  Listen "*:4802" {}
}

cluster {
  #cluster_listener = *:4802 Replaced by the ECCluster_Listener stanza
  #mbus_daemon = 4803 Deprecated in version 3.4
  cluster_group = "ec_cluster"
  control_group = "ec_console"
  logs = [
    rejectlog = "/var/log/ecelerity/rejectlog.cluster"
    paniclog = "/var/log/ecelerity/paniclog.cluster"
    mainlog = "/var/log/ecelerity/mainlog.cluster"
    acctlog = "/var/log/ecelerity/acctlog.cluster"
    bouncelog = "/var/log/ecelerity/bouncelog.cluster"
  ]
  Replicate "inbound_cidr" {}
  Replicate "outbound_cidr" {}
  Replicate "outbound_domains" {}
  Replicate "outbound_binding_domains" {}
  Replicate "shared_named_throttles" {}

  # DuraVIP network topology hints
  Topology "10.1.1.0/24" {
    cidrmask = "32"
    interface = "eth1"
  }
}

Note

IPv6 addresses are much more flexible than IPv4 addresses in terms of their formatting options. They also use a different delimiter character than IPv4 addresses (a colon instead of a period). This means that in certain contexts, an IPv6 address can create parsing ambiguities.

The accepted convention is to require that, in circumstances where a configuration parameter can also contain something other than an IP address, that an IPv6 address must be enclosed in square brackets. In practical terms, this means that things like the Gateway, Routes and Listen options must have IPv6 addresses enclosed in brackets. Others, such as Peer, Relay_Hosts and Prohibited_Hosts do not require the IPv6 address in brackets.

You cannot view the contents of the ecelerity-cluster.conf file using the system console from the cluster manager. You can only view the contents of this file from a cluster node because it is included from the ecelerity.conf file. The ec_logger module defined here applies to cluster nodes only. The ec_logger module defined in the eccluster.conf file on the cluster manager records events that occur on the cluster manager. Since mail does not transit the cluster manager, only the paniclog will have entries.

The Security Stanza

The definition of the Security stanza in the ecelerity-cluster.conf file applies to the cluster nodes only and usually differs from the configuration found in the eccluster.conf file.

For a discussion of the Security stanza options see security.

Clustered Logging

The ec_logger defined in the ecelerity-cluster.conf file is typically one of three ec_loggers used in a cluster configuration. These loggers, with their conventional instance names, are as follows:

  • ec_logger "ec_logger_cluster" – typically defined in the ecelerity-cluster.conf file and used when creating consolidated log files on the cluster manager

  • ec_logger "ec_logger" – the node-specific log files

  • ec_logger "ec_logger_rt" – the node-specific log files used by the web UI

Here we are only concerned with the ec_logger "ec_logger_cluster" logger.

The Momentum clustering module provides two facilities that aid administrators in setting up cluster-wide consolidated logging. The first of these is a node-local clustered I/O layer which is provided by the clustering module as the cluster:// URI schema. A typical log destination looks like cluster:///var/log/ecelerity/mainlog.cluster=>master where cluster:// tells the I/O abstraction layer to use node-local segmented data format, /var/log/ecelerity/mainlog.cluster is the directory in which the node-local log stream will be stored (created on demand), and =>master specifies that a subscriber named "master" should be added to the node-local log stream if it is created on demand.

The second part of the clustered logging solution is the log file service (provided over the ECCluster_Listener). This service lets subscribers connect to Momentum and request a "replay" of logs since their last checkpoint and then checkpoint the reader. This is a durable logging mechanism for aggregation. The log file server is configured in the logs dictionary of the cluster module configuration.

Each logfile that should be serviced in this fashion is given a key name and a corresponding local path that should match the path portion of the cluster:// log destination specified in the other Loggers throughout your configuration.

For an in-depth discussion of consolidated cluster logging see “Log Aggregation”.

ECCluster_Listener

Any direct, point-to-point communication between cluster nodes that does not require membership-wide ordering semantics will be performed over TCP/IP via the port specified in the Listen stanza within the ECCluster_Listener stanza. Any node can establish a connection to the destination node at the address specified by the ECCluster_Listener and point-to-point communication will ensue.

**Configuration Change. ** This option is available as of version 3.0.15 and replaces the cluster_listener option.

The following table displays all options valid in the ECCluster_Listener scope and within a Listen stanza within an ECCluster_Listener scope.

Option/Description Default Scopes
accept_queue_backlog – The accept queue backlog 0 control_listener, eccluster_listener, ecstream_listener, esmtp_listener, http_listener, listen, msgcserver_listener, xmpp_listener
concurrency – Define number of available threads 0 control_listener, eccluster_listener, ecstream_listener, esmtp_listener, http_listener, listen, threadpool, xmpp_listener
disable_nagle_algorithm – Disable nagle algorithm on sockets false control_listener, eccluster_listener, ecstream_listener, esmtp_listener, global, http_listener, listen, xmpp_listener
enable – Enable or disable a listener scope true control_listener, eccluster_listener, ecstream_listener, esmtp_listener, http_listener, listen, msgcserver_listener, xmpp_listener
events_per_iter – Employ when using a Concurrency greater than 1 0 control_listener, eccluster_listener, ecstream_listener, esmtp_listener, http_listener, listen, xmpp_listener
file_mode – File access rights in octal notation 0660 control_listener, eccluster_listener, ecstream_listener, esmtp_listener, http_listener, listen, msgcserver_listener, xmpp_listener
listen_backlog – The listen backlog 500 control_listener, eccluster_listener, ecstream_listener, esmtp_listener, http_listener, listen, xmpp_listener
pool_name – Associate a threadpool with a listener control_listener, eccluster_listener, ecstream_listener, esmtp_listener, http_listener, listen, xmpp_listener
tcp_recv_buffer_size – The size of the TCP receive buffer size 4096 control_listener, eccluster_listener, ecstream_listener, esmtp_listener, http_listener, listen, xmpp_listener
tcp_send_buffer_size – The size of the TCP send buffer 4096 control_listener, eccluster_listener, ecstream_listener, esmtp_listener, http_listener, listen, xmpp_listener

For information regarding IPv6 addresses and Listen stanzas, see the section called “Listeners and IPv6 Addresses”.

Cluster Communications

mbus_daemon

The most important underlying component of the clustering system is the underlying messaging bus. The Momentum clustering module utilizes a messaging bus that provides extended virtual synchrony (EVS) messaging semantics. The Momentum instance will attach to this node over some form of inter-process communication socket (IPC) (currently either AF_INET or AF_UNIX) as specified by the mbus_daemon configuration option.

**Configuration Change. ** As of version 3.4, cluster communication is handled by the msgc modules rather than by mbus. For more information see “msgc – Modules”.

cluster_group

The DuraVIP™ system will coordinate IP ownership responsibilities via the cluster_group EVS group.

control_group

Each node can respond to normal console commands received on the control_group. The cluster console manager utilizes this group to issue cluster-wide configuration commands to update and discover changes in configuration information. For more information about cluster console commands see “Cluster Management Using Console Commands”.

Under normal circumstances, the mbus_daemon, cluster_group and control_group should be left at their default values (as shown in the configuration above).

Replication

The replication component of the clustering module is considered its most powerful and versatile feature. The replicate directive allows you to employ a sound and efficient replication framework to the data managed within Momentum. Such metrics as the number of current connections from a specific netblock are calculated locally by referencing an internal structure called a CIDR tree. By specifying Replicate = "inbound_cidr" {}, we tell the clustering subsystem to share all the local information about inbound connections tracked in its CIDR tree with every other node in the cluster (and vice versa). Using this shared information, the replication system will maintain an aggregated "cluster-wide" CIDR tree representing all inbound connections to the cluster.

The same is possible for outbound connections via Replicate "outbound_cidr" {} as well as outbound connections grouped by destination domain via Replicate "outbound_domains" {}. For outbound connections, it may be desirable to be more granular than aggregating on a cluster-wide premise. This is discussed in more detail in the cluster data replication section.

The Replicate "outbound_binding_domains" {} stanza ensures that the Cluster_Scope_Max_Outbound_Connections option works cluster-wide. This option was introduced in Momentum 3.2 and is included in the default ecelerity-cluster.conf file.

In addition to native Momentum data, it is possible to replicate user controlled data sets as well (such as caches). This provides a transparent and convenient mechanism to cache data from a module level in a medium that is accessible via every node participating in the cluster. This is discussed in more detail in “Data Replication”.

DuraVIP™ Network Topology

The DuraVIP™ featureset maintains the availability of MultiVIP© bindings and listener services on IP addresses despite node failures. Each binding or listener that should be managed in this fashion should be marked with a Enable_Duravip = true option.

Because Momentum is responsible for adding and removing the corresponding IP addresses, more information must be known about the IP networks and physical interfaces on which these IPs will reside. Within the cluster module configuration, options in the Topology scope provide this additional information.

interface

In the Topology configuration shown in “The ecelerity-cluster.conf file”, 10.1.1.0/24 informs the clustering module that IPs in the range specified will be added to the eth1 ethernet interface.

cidrmask

When bringing an IP address online, you must also know the netmask it will be using. The cidrmask option indicates the number of bits in the netmask for a given IP address. Above, we see that the IP address should be added with a /32 netmask (i.e. 255.255.255.255). It is most common to add IP aliases with a 255.255.255.255 netmask, but this can vary between operating systems.

Other Cluster Configuration Options

Find below a list of all cluster options not covered by the categories discussed above. For a complete list of all cluster options see Table 9.4, “cluster options”.

log_group

When this is enabled, the panic log messages are broadcast over spread, using the specified group name. Another spread enabled application, or the spuser tool, can then listen in on paniclog events.

nodename

Override the node name that is used to canonically identify this cluster node. The nodename is determined according to the following logic: When ec_ctl runs, it determines the node name (and subcluster) as configured from cluster.boot and exports EC_SUB_CLUSTER and EC_NODE_NAME to the environment. If you do not explicitly configure the nodename option, the cluster module will look for the EC_NODE_NAME environment variable and take that as the value. If EC_NODE_NAME is not set in the environment, it will use the system hostname, truncated at the first ‘.’. Note also that modules can use the cluster_nodename hook to determine the effective value of the nodename.

nodeaddr

nodeaddr is the canonical cluster address for the node. If not specified, gethostbyname(nodename) is used to determine the address. The address must be routable via the cluster network, and must not be 127.0.0.1.

log_active_interval

This option, along with log_idle_interval, is used to tune centralized logging (logmove). When logmove is actively sending data to the manager, it will sleep for log_active_interval seconds between each segment send. When the job idles (no segments are pending), then it will sleep for log_idle_interval seconds before looking for another segment, The default value for this option is 1.

log_idle_interval

The amount of time to sleep before looking for another segment. The default value for this option is 10.

heartbeat_start_delay

How many seconds to wait after startup before the cluster heartbeat is activated. The default value for this option is 15.

heartbeats_per_sec

How often to send a heartbeat. The heartbeat is used to help detect "byzantine" nodes in the cluster. The default value for this option is 1.

if_check_interval

How often to run through a maintenance cycle to make sure that the interfaces plumbed on the system match up to the cluster internal view. The default value for this option is 30.

if_down_limit

As part of the maintenance cycle, when detecting that we need to plumb an IP address, how long to wait before deciding that we should bring it online. This avoids rapid "flapping". The default value for this option is 4.

duravip_balance_set_size

When balancing DuraVIP™s, how many to process as a batch in response to a balance request. Clusters with large numbers of DuraVIP™s (especially when they are not explicitly preferenced) will take less time to converge if this number is increased. It is imperative that this number be set consistently across all nodes, as inconsistent values across the nodes will result in a cluster that will not converge (since the nodes will not all agree on the same parameters). Therefore, it is strongly recommended that all the nodes be brought down before changing this option. The value of this option must be greater than 1.

arp_all_hosts

When plumbing a DuraVIP™, you can either aggressively send out ARP information to ensure that the network knows about the IP address assignment (true), or target the ARP to specific hosts of interest (false). You may consider changing this to false if your network experiences problems with the burst of ARP traffic around the DuraVIP™ move. The default value for this option is true.

view_mature_time

How long a DuraVIP™ view needs to remain unchanged before considering it "mature". Increasing the value will make the cluster take longer to fully converge and balance DuraVIP™s. Reducing the value will make it take less time. This option should not generally need to be altered, but you may consider doing so if the cluster is experiencing instability. Best to seek advice from support if that is the case. The default value for this option is 5.

view_balance_interval

How often DuraVIP™ views are subject to balancing. This option is similar to view_mature_time and should to be adjusted without consultation with support. The default value for this option is 10.

unconditional_rebind

Whether the full set_binding logic is invoked when assessing messages for internal cluster message moves or whether to use an optimization that avoids calling out to whatever set_binding logic is in place. The default value for this option is true.

view_broadcast_interval

When non-zero, how often to speculatively broadcast a view announcement to the cluster. Should not be needed except in rare cases when the cluster does not seem to be in sync with views; only enable as directed by support. The default value for this option is 0.

See Also

mbusd, mbus.conf.