Symptoms

Odin Automation CP becomes unavailable in 5 minutes after pau service startup.

The service does not report anything on the start attempt:

[root@osscore ~]# service pau start
[root@osscore ~]#

while in a good case it should be:

[root@osscore ~]# service pau start
Starting pau (via systemctl):                              [  OK  ]
[root@osscore ~]#

Checking /var/log/pa/console.log for "Controller Boot Thread" gives:

180604 18:51:35 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0039: Creating http management service using socket-binding (management-http)
180604 18:56:37 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[

Cause

One of the Wildfly deployments has troubles getting launched, which puts down the whole application server.

Resolution

  1. Make sure that it is not hardware resources shortage that prevents the successful startup. For this purpose, check the CPU/RAM amount available on the node, and if it is rather low, try to increase the JBoss blocking timeout. Edit /usr/local/pem/wildfly-10.1.0.Final/standalone/configuration/standalone-full-ha.xml file, find the system-properties section and add the line:

    <property name="jboss.as.management.blocking.timeout" value="300"/>
    

    300 (5 minutes) is the default timeout, try to increase it twice.

  2. Check standalone-full-ha.xml for the list of deployments and try to disable them one by one to see if helps. To disable, either remove the corresponding section in the file or comment it:

    • originally:

          <deployments>
          <deployment name="whl-endpoint-war.war" runtime-name="whl-endpoint-war.war">
              <content sha1="cfffa73de5dd20a8346885bbe8fb4b78b3700b77"/>
          </deployment>
          <deployment name="core-ear.ear" runtime-name="core-ear.ear">
              <content sha1="94fa7e6901c123da5b268b71abff0e23e93b31e6"/>
          </deployment>
          <deployment name="pui-war.war" runtime-name="pui-war.war">
              <content sha1="41706b47047eda73b95af09c60f6d95324982a14"/>
          </deployment>
      
    • commented pui-war.war section:

          <deployments>
          <deployment name="whl-endpoint-war.war" runtime-name="whl-endpoint-war.war">
              <content sha1="cfffa73de5dd20a8346885bbe8fb4b78b3700b77"/>
          </deployment>
          <deployment name="core-ear.ear" runtime-name="core-ear.ear">
              <content sha1="94fa7e6901c123da5b268b71abff0e23e93b31e6"/>
          </deployment>
          <!-- <deployment name="pui-war.war" runtime-name="pui-war.war">
              <content sha1="41706b47047eda73b95af09c60f6d95324982a14"/>
          </deployment> -->
      

    Do the same for each application one by one until the service starts correctly.

  3. Investigate why the application could not be deployed. After the service is up and running it is possible to run the deployment manually :

    [root@osscore ~]# bash /usr/local/pem/wildfly-10.1.0.Final/bin/jboss-cli.sh
    You are disconnected at the moment. Type 'connect' to connect to the server or 'upported commands.
    [disconnected /] connect
    [standalone@localhost:9990 /] deploy --force /usr/local/pem/u/bss-war.war
    WFLYCTL0344: Operation timed out awaiting service container stability
    

    The likely reason is a problem with remote services . For example, for Cloud Infrastructure endpoint it could be OACI Instance Manager API outage, for BSS application - Billing database unavailability, etc.

  4. After the issues are resolved, it is necessary to deploy the removed applications again and restart the service.

Internal content