Errors occurred when dynamically adding node to existing pacemaker cluster

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Errors occurred when dynamically adding node to existing pacemaker cluster

samuel yu
I found something abnormal during my operation about dynamically adding node to existing pacemaker cluster.
My cluster is pcs/cman based, and versions of pcs/corosync are:
# rpm -qa |grep corosync
corosynclib-1.4.7-1.el6.x86_64
corosync-1.4.7-1.el6.x86_64
rpm -qa |grep pacemaker
pacemaker-cli-1.1.12-4.el6.x86_64
pacemaker-1.1.12-4.el6.x86_64
pacemaker-cluster-libs-1.1.12-4.el6.x86_64
pacemaker-libs-1.1.12-4.el6.x86_64

In current cluster, there are 3 nodes existing: node101/node103/node192(DC node), now I want to dynamically adding node194 into the cluster. Then something wrong:
Before adding, status of cluster are:
pcs status
Cluster name: kvm_storage
Last updated: Mon Sep 26 16:16:19 2016
Last change: Thu Sep 22 14:39:56 2016
Stack: cman
Current DC: 172.28.217.192 - partition with quorum
Version: 1.1.11-97629de
6 Nodes configured
18 Resources configured

Online: [ 172.28.217.101 172.28.217.103 172.28.217.192 ]
OFFLINE: [ 172.28.217.193]
.....
.....

In node101, the conf file are /etc/cluster/cluster.conf:
<?xml version="1.0"?>
<cluster config_version="4" name="kvm_storage">
        <logging debug="off"/>
        <clusternodes>
                <clusternode name="172.28.217.101" nodeid="1"/>
                <clusternode name="172.28.217.103" nodeid="2"/>
                <clusternode name="172.28.217.192" nodeid="3"/>
        </clusternodes>
        <logging>
                <logging_daemon debug="on" logfile="/var/log/cluster/corosync.log" name="corosync"/>
        </logging>
        <dlm enable_fencing="0"/>
        <totem token="32000"/>
        <quorumd interval="2" label="qdiskCluster73" min_score="7" tko="9">
                <heuristic program="ping -c 1 172.28.217.126 -t 2 -w 1" score="3"/>
                <heuristic program="ping -c 1 172.28.217.73 -t 2 -w 1" score="3"/>
                <heuristic program="/usr/local/odpm/checkFCStatus.sh" score="5"/>
        </quorumd>
</cluster>
Then I make the conf file of node194 according to node101, and only add one line:
<clusternode name="172.28.217.194" nodeid="4"/>

then I started the service cman and pcs, it ran successfully.
I use command "ccs_sync -f /etc/cluster/cluster.conf" to update the conf file of other nodes in this cluster.

[root@node194 ~]# service cman status
cluster is running.
But after typing command "pcs status", I found that in the area OFFLINE, there are something abnormal:
[root@node194 ~]# pcs status
Cluster name: kvm_storage
Last updated: Mon Sep 26 16:19:40 2016
Last change: Mon Sep 26 14:56:26 2016
Stack: cman
Current DC: 172.28.217.192 - partition with quorum
Version: 1.1.11-97629de
6 Nodes configured
18 Resources configured

Online: [ 172.28.217.101 172.28.217.103 172.28.217.192 ]
OFFLINE: [ 172.28.217.193 172.28.217.194 Node4 ]

which seems like that node194 are added two times respectively in forms of IP format and in forms of nodename format, so the cluster was not able to bring this node online.

I have checked the relevant log in DC node192, and find that in /var/log/cluster/corosync.log
Sep 26 16:29:24 [4556] node192       crmd:  warning: crm_find_peer: Node 'Node4' and '172.28.217.194' share the same cluster nodeid: 4
Sep 26 16:29:24 [4556] node192       crmd:  warning: crm_find_peer: Node 'Node4' and '172.28.217.194' share the same cluster nodeid: 4
Sep 26 16:29:24 [4556] node192       crmd:   notice: election_count_vote: Election 35241 (current: 35241, owner: 172.28.217.192): Processed no-vote from 172.28.217.194 (Peer is not part of our cluster)

I don't know what to do about this? Are there any parameters in pacemaker's conf file to combine the IP address with node name?

If any problems, just mail me.
Many thanks for your great support!!



Reply | Threaded
Open this post in threaded view
|

Re: Errors occurred when dynamically adding node to existing pacemaker cluster

samuel yu
This post has NOT been accepted by the mailing list yet.
BTW, in /var/log/messages, it states:
Sep 27 09:32:52 node194 crmd[12746]:   notice: crm_timer_popped: We appear to be in an election loop, something may be wrong
Sep 27 09:32:52 node194 crmd[12746]:  warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING

Should this be a bug? Or any wrong conf?