Quantcast

How to restart a failed resource

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to restart a failed resource

Brian Milnes
Folks,

 It's just not clear to me how to clear a resource and get it to restart
without clearing the CIB and
rebooting.

I have the following setup:

 debian lenny, 2.6.21.7-2.fc8xen, amazon AMI, no serial, heartbeat
2.1.3-6lenny1.

Crm_mon shows:

 Resource Group: openvpn_nfs_lvm_ebs
    ebs-1       (aws::ocf:ebs): Started ip-10-244-47-99
    lvm-1       (heartbeat::ocf:LVM):   Started ip-10-244-47-99
    nfs-1       (heartbeat::ocf:Filesystem):    Started ip-10-244-47-99
    openvpn-1   (lsb:openvpn):  Started ip-10-244-47-99

Failed actions:
    ebs-1_start_0 (node=ip-164, call=6, rc=1): complete

Because I had a script bug while I was starting my disk resource (ebs-1).
# crm_failcount -G -U ip-164 -r ebs-1
 name=fail-count-ebs-1 value=1
# crm_failcount -D -U ip-164 -r ebs-1
 crm_failcount -G -U ip-164 -r ebs-1
 name=fail-count-ebs-1 value=0

And yet, when I

 crm_resource -r openvpn_nfs_lvm_ebs -M -H ip-164

the resource does not move and daemon.log shows:
Aug  6 01:14:36 ip-164 crm_resource: [1199]: ERROR: unpack_rsc_op: Remapping
ebs-1_start_0 (rc=1) on ip-164 to an ERROR
Aug  6 01:14:36 ip-164 crm_resource: [1199]: WARN: unpack_rsc_op: Processing
failed op ebs-1_start_0 on ip-164: Error
Aug  6 01:14:36 ip-164 crm_resource: [1199]: WARN: unpack_rsc_op:
Compatability handling for failed op ebs-1_start_0 on ip-164
Aug  6 01:14:36 ip-164 crm_resource: [1199]: WARN: main: here i am - 3

 Questions:

1) How do I restart a failed resource, after I fix a script or other
problem?

2) Should this have happened?

 Thanks, Brian
_______________________________________________
Linux-HA mailing list
[hidden email]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to restart a failed resource

Michael Schwartzkopff
Am Donnerstag, 6. August 2009 03:30:42 schrieb Brian Milnes:

> Folks,
>
>  It's just not clear to me how to clear a resource and get it to restart
> without clearing the CIB and
> rebooting.
>
> I have the following setup:
>
>  debian lenny, 2.6.21.7-2.fc8xen, amazon AMI, no serial, heartbeat
> 2.1.3-6lenny1.

Upgrade! At least to version 2.1.4 of heartbeat.
Better use pacemaker and 2.99 of heartbeat of even better:
pacemaker and openais. Packages for debian see www.clusterlabs.org/howto/

> Crm_mon shows:
>
>  Resource Group: openvpn_nfs_lvm_ebs
>     ebs-1       (aws::ocf:ebs): Started ip-10-244-47-99
>     lvm-1       (heartbeat::ocf:LVM):   Started ip-10-244-47-99
>     nfs-1       (heartbeat::ocf:Filesystem):    Started ip-10-244-47-99
>     openvpn-1   (lsb:openvpn):  Started ip-10-244-47-99
>
> Failed actions:
>     ebs-1_start_0 (node=ip-164, call=6, rc=1): complete
>
> Because I had a script bug while I was starting my disk resource (ebs-1).
> # crm_failcount -G -U ip-164 -r ebs-1
>  name=fail-count-ebs-1 value=1
> # crm_failcount -D -U ip-164 -r ebs-1
>  crm_failcount -G -U ip-164 -r ebs-1
>  name=fail-count-ebs-1 value=0
>
> And yet, when I
>
>  crm_resource -r openvpn_nfs_lvm_ebs -M -H ip-164
>
> the resource does not move and daemon.log shows:
> Aug  6 01:14:36 ip-164 crm_resource: [1199]: ERROR: unpack_rsc_op:
> Remapping ebs-1_start_0 (rc=1) on ip-164 to an ERROR
> Aug  6 01:14:36 ip-164 crm_resource: [1199]: WARN: unpack_rsc_op:
> Processing failed op ebs-1_start_0 on ip-164: Error
> Aug  6 01:14:36 ip-164 crm_resource: [1199]: WARN: unpack_rsc_op:
> Compatability handling for failed op ebs-1_start_0 on ip-164
> Aug  6 01:14:36 ip-164 crm_resource: [1199]: WARN: main: here i am - 3
>
>  Questions:
>
> 1) How do I restart a failed resource, after I fix a script or other
> problem?

crm_resource -C -r openvpn_nfs_lvm_ebs

> 2) Should this have happened?

Yes.

--
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: [hidden email]
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
_______________________________________________
Linux-HA mailing list
[hidden email]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to restart a failed resource

Andrew Beekhof-3
On Mon, Aug 10, 2009 at 12:15 PM, Michael
Schwartzkopff<[hidden email]> wrote:

> Am Donnerstag, 6. August 2009 03:30:42 schrieb Brian Milnes:
>> Folks,
>>
>>  It's just not clear to me how to clear a resource and get it to restart
>> without clearing the CIB and
>> rebooting.
>>
>> I have the following setup:
>>
>>  debian lenny, 2.6.21.7-2.fc8xen, amazon AMI, no serial, heartbeat
>> 2.1.3-6lenny1.
>
> Upgrade! At least to version 2.1.4 of heartbeat.
> Better use pacemaker and 2.99 of heartbeat of even better:
> pacemaker and openais. Packages for debian see www.clusterlabs.org/howto/

looks like your paste buffer got confused :-)
the url is: http://www.clusterlabs.org/wiki/Install#Debian
_______________________________________________
Linux-HA mailing list
[hidden email]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Loading...