Ansible tips’n’tricks: checking if a systemd service is running

I have been working on an Ansible playbook to update Oracle’s Tracefile Analyser (TFA). If you have been following this blog over the past few months you might remember that I’m a great fan of the tool! Using Ansible makes my life a lot easier: when deploying a new system I can ensure that I’m also installing TFA. Under normal circumstances, TFA should be present when the (initial) deployment playbook finishes. At least in theory.

UPDATE 230216: TFA has been merged into Autonomous Health Framework (AHF), the reminder of this post has been adapted to the new naming. TFA is part of AHF, it didn’t go away – if anything it got better! The service to check for is still called oracle-tfa.service so from that point of view nothing changed.

Check, don’t trust

As we know, life is what happens when you’re making other plans, and I’d rather check whether AHF is installed/configured/running before trying to upgrade it.

I have considered quite a few different ways to do this but in the end decided to check for the oracle-tfa.service: if the service is present, TFA (and AHF) must be as well. There are probably other ways, maybe better ones, but this one works for me.

Checking for the presence of a service

Ansible offers a module called service_facts since version 2.5 to facilitate working with services. I also tried the setup module but didn’t find what I needed. Consider the following output, generated when gathering service facts:

TASK [get service facts] *******************************************************
 ok: [192.168.56.11] => {
     "ansible_facts": {
         "services": {
             "NetworkManager-wait-online.service": {
                 "name": "NetworkManager-wait-online.service", 
                 "source": "systemd", 
                 "state": "stopped"
             }, 
             "NetworkManager.service": {
                 "name": "NetworkManager.service", 
                 "source": "systemd", 
                 "state": "running"
             }, 
             "auditd.service": {
                 "name": "auditd.service", 
                 "source": "systemd", 
                 "state": "running"
             }, 

[ many more services ]

            "oracle-tfa.service": {
                 "name": "oracle-tfa.service", 
                 "source": "systemd", 
                 "state": "running"
             }, 

[ many more services ]

This looks ever so slightly complicated! And indeed, it took a little while to understand the syntax for accessing a single service. My first attempt were all but unsuccessful.

Getting the syntax right

Thankfully I wasn’t the only one with the problem, and with a little bit of research ended up with this code (tested on Ubuntu 22.04 with the distribution-provided Ansible 2.10)

---
 - hosts: databases
   tasks:
   - name: get service facts
     ansible.builtin.service_facts:

   - name: try to work out how to access the service
     ansible.builtin.debug:
       var: ansible_facts.services["oracle-tfa.service"]

Awesome! When running this on a system with AHF/TFA installed, it works quite nicely:

TASK [try to work out how to access the service] *******************************
 ok: [192.168.56.11] => {
     "ansible_facts.services[\"oracle-tfa.service\"]": {
         "name": "oracle-tfa.service", 
         "source": "systemd", 
         "state": "running"
     }
 }
 

 PLAY RECAP *********************************************************************
 192.168.56.11            : ok=3    changed=0    unreachable=0    failed=0

The same code fails on a system without AHF/TFA installed:

TASK [try to work out how to access the service] *******************************
 ok: [192.168.56.11] => {
     "ansible_facts.services[\"oracle-tfa.service\"]": "VARIABLE IS NOT DEFINED!
      'dict object' has no attribute 'oracle-tfa.service'"
 }
 

 PLAY RECAP *********************************************************************
 192.168.56.11            : ok=3    changed=0    unreachable=0    failed=0

Now the trick is to ensure that I’m not referencing an undefined variable. This isn’t too hard either, here is a usable playbook:

---
 - hosts: databases
   tasks:
   - name: get service facts
     ansible.builtin.service_facts:
 
   - name: check if the TFA service is running
     ansible.builtin.fail:
       msg: Tracefile Analyzer is not present, why? It should have been there!
     when: ansible_facts.services["oracle-tfa.service"] is not defined

The “tasks” include getting service facts before testing for the presence of the oracle-tfa.service. I deliberately fail the upgrade process to make the user aware of a situation that should not have happened.

Hope this helps!