Information about Red Hat live kernel patch (kpatch)

Kernel live patching has been around since about 2010 through various forms in Linux distributions.

Even with Oracle’s ksplice and so on, there have been other people who has been using the live patch capabilities.

However, as a RH employee, I always had a more skeptical view on safety of it.

Recently I had a customer asking for more detailed information, and had a chance to do bit of research on this topic.

First Red Hat’s kpatch;
https://access.redhat.com/solutions/2206511

  • It has formally released and supported from RHEL 8.1, RHEL 7.7; RHEL-7.6, and the kernel-3.10.0-957.35.1.el7.
  • RH does not provide kpatch for all kernel patches but available for selected Important and Critical CVEs.
  • Kpatch patches are cumulative. – You can’t pick and choose! – It means that when you get a new live kernel patch for the kernel, it will have all the fixes of the previous live kernel patch, along with the new fixes. You can safely upgrade the loaded live kernel patch to a newer version.
  • Starting with RHEL 8.5 and kernel-3.10.0-1160.45.1.el, kernels will receive live kernel patches for 6 months. Therefore customers will need to upgrade the kernel and reboot at least twice per year.

How does kpatch work?

If you’re running a kernel version that supports it, you can (and should) take advantage of live kernel patching. This code execution method works alongside kernel probes and function tracing. Instead of relying on redirection using a breakpoint for kernel probes or a predefined location (in the case of function tracing), live patching is generally done by redirecting the code as close to the function entry as possible.

This new method allows for a function to be immediately redirected through a ftrace handler, so instead of calling an older, vulnerable function, it is redirected to a patched version of the function.

To reiterate it;
The kpatch kernel patching solution uses the livepatch kernel subsystem to redirect old functions to new ones. When a live kernel patch is applied to a system, the following things happen:

  1. The kernel patch module is copied to the /var/lib/kpatch/ directory and registered for re-application to the kernel by systemd on next boot.
  2. The kpatch module is loaded into the running kernel and the new functions are registered to the ftrace mechanism with a pointer to the location in memory of the new code.
  3. When the kernel accesses the patched function, it is redirected by the ftrace mechanism which bypasses the original functions and redirects the kernel to patched version of the function.

What’s the differences between kpatch and other live kernel patch solution?

For kpatch vs kGraft, there has been detailed discussion at the linuxplumber conference in 2014.

https://blog.linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_LivePatching.txt

More to read:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/managing_monitoring_and_updating_the_kernel/index#applying-patches-with-kernel-live-patching_managing-monitoring-and-updating-the-kernel

Building an execution environment in a disconnected environment

Today is just for me to add a link for me to remember.

Below is a great summary of the issue and how it can be resolved when you try to build an ansible execution environment in a disconnected environment.

https://cloudautomation.pharriso.co.uk/post/ansible-builder-disconnected/

Creating an ansible execution environment with a container image from a container repository with a self-signed certificate

When you try to build an ansible execution environment, you may need to use a container repository with a self-signed certificate.

This will fail with the following error;

.....
ERROR! Unknown error when attempting to call Galaxy at 'https://<URL>/api': <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)>
Error: error building at STEP "RUN ANSIBLE_GALAXY_DISABLE_GPG_VERIFY=1 ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path "/usr/share/ansible/collections"": error while running runtime: exit status 1

This can be resolved by adding “ANSIBLE_GALAXY_CLI_COLLECTION_OPTS : “–ignore-certs” “

[svc_aap_install@AGAEALP3001264 ee-infoblox-build]$ cat execution-environment.yml 
---
version: 1

build_arg_defaults:
  EE_BUILDER_IMAGE: '<URL>/ansible-builder-rhel8'
  EE_BASE_IMAGE: '<URL>/ee-supported-rhel8'
  ANSIBLE_GALAXY_CLI_COLLECTION_OPTS : "--ignore-certs"

Testing newly create execution environment

Based on previous posts, you probably have created an ansible execution environment.

After you have created an execution environment, how do you test it?

The new ansible CLI tool to run an ansible playbook is called “ansible-navigator”.
It’s not only a ansible-playbook execution binary, but it has more features to it such as:

  • Review and explore available collections
  • Review and explore current Ansible configuration
  • Review and explore Ansible documentation
  • Review execution environment images available locally
  • Review and explore an inventory
  • Run and explore a playbook

But for the testing, this is what you need to do to test the newly created execution environment.

$ ansible-navigator run -eei localhost/<newly created EE name> --pp never <test ansible playbook>.yml

From above, the important option is “–pp never” which stands for “pull policy” “never”.
This means that, since it’s already on the localhost, please don’t download the image.
If for some reason, you forget the option, you will see the following error;

Trying to pull localhost/<new ee name>:latest...
WARN[0000] failed, retrying in 1s ... (1/3). Error: initializing source docker://localhost/<new ee name>:latest: pinging container registry localhost: Get "https://localhost/v2/": dial tcp 127.0.0.1:443: connect: connection refused
WARN[0001] failed, retrying in 1s ... (2/3). Error: initializing source docker://localhost/<new ee name>:latest: pinging container registry localhost: Get "https://localhost/v2/": dial tcp 127.0.0.1:443: connect: connection refused
WARN[0002] failed, retrying in 1s ... (3/3). Error: initializing source docker://localhost/<new ee name>:latest: pinging container registry localhost: Get "https://localhost/v2/": dial tcp 127.0.0.1:443: connect: connection refused
Error: initializing source docker://localhost/<new ee name>:latest: pinging container registry localhost: Get "https://localhost/v2/": dial tcp 127.0.0.1:443: connect: connection refused
[ERROR]: Execution environment pull failed

Also, the ansible-navigator’s default UI mode is “TUI” mode rather than printing out errors to stdout. If you would like to run ansible-navigator in stdout mode, just add an option;

$ ansible-navigator run --eei localhost/<newly created ee> --pp never <test ansible>.yml --mode stdout

Prep’ng a RDS Database instance for Satellite 6 installation

This is a note for myself on what had to be done to prep RDS database to connect it with Red Hat Satellite 6 installation.

UPDATE:
Currently, Red Hat Satellite using RDS doesn’t work for 2 reasons;
* For Red Hat Satellite 6.10, Postgresql 12.1 the only version of DB supported as an external DB
https://access.redhat.com/documentation/en-us/red_hat_satellite/6.10/html/installing_satellite_server_from_a_disconnected_network/performing-additional-configuration#postgresql-as-an-external-database-considerations_satellite

While I was trying to install oldest Postgresql available was 12.5.

  • Require “rh-postgresql12-postgresql-evr” pkg
    Installation will fail saying ;
    rh-postgresql12-postgresql-evr pkg with a matching version with the RDS, which doesn’t exist.
postgres=> GRANT foreman to postgres;
GRANT ROLE
postgres=> GRANT candlepin to postgres;
GRANT ROLE
postgres=> GRANT pulp to postgres;
GRANT ROLE
  • Create databases

Ansible Automation Platform – What is Ansible Automation Execution Environment i.e. EE?

With Red Hat’s Ansible Automation Platform 2.x, one of the big change is the introduction of Ansible Execution Environment.

Then questions rise;
* What is an Ansible Execution Environment?
* What is it for?

What is an Ansible Automation Execution Environment?

Below is a simple diagram summarising what it is.

High level overview of Automation Execution Environment

It is an optimised container environment that contains required “binaries”, “python+other Libraries” and ansible collections to execute an Ansible playbook(s).

Business/Technical Problems to solve:
To Provide a simplified & consistent execution environment to enhance automation development experiences

When a developer/user develops automation on their own environment and shares their own ansible playbooks with other team members, depending on their own development environment vs others, the automation experience could be very different. (Developing Ansible playbooks in a Mac vs Linux)

By creating and using the Ansible Automation Execution Environment, it provides the same development/execution experience.

– Multiple python environments to manage that creates maintenance overhead.

One of the main struggles was that Ansible Tower users had requirements for multiple python virtual environments as the number of users or number of use cases increased. E.g. for use cases, requirements of python 2.7 vs python 3.x or some modules requiring specific versions of python modules.

Above has resulted, within Ansible tower creating multiple python virtual environments, and if you have a cluster of ansible tower nodes, the administrator had to ensure all tower nodes have exactly the same python virtual environment configurations.

Ansible Automation Platform Execution Environment & tzdata

With a colleague of mine, we were testing migration of a ServiceNow – system provisioning ansible workflow from Ansible 2.9 -> Ansible Automation platform 2.x and hit an issue.

No such file or directory: '/usr/share/zoneinfo/zone.tab

My immediate thought would be that “tzdata” pkg is not installed on the UBI.

So added “tzdata” into bindep.txt and rebuild the EE image, but it didn’t work with error saying that nothing to install. (i.e. its already installed, just the file is not available, tested this with “append” in execution-environment.yml

[3/3] STEP 6/6: RUN ls -la /usr/share/zoneinfo/zone.tab
ls: cannot access '/usr/share/zoneinfo/zone.tab': No such file or directory
Error: error building at STEP "RUN ls -la /usr/share/zoneinfo/zone.tab": error while running runtime: exit status 2

To get rid of this issue, I had to just use append to “reinstall” using microdnf tzdata as below;

$ cat execution-environment.yml
---
version: 1
dependencies:
  galaxy: requirements.yml
  system: bindep.txt

additional_build_steps:
  prepend:
  append:
    - RUN microdnf reinstall -y tzdata
    - RUN ls -la /usr/share/zoneinfo/zone.tab
[3/3] STEP 6/7: RUN microdnf reinstall -y tzdata
Downloading metadata...
Downloading metadata...
Downloading metadata...
Downloading metadata...
Downloading metadata...
Package                                Repository       Size
Reinstalling:
 tzdata-2021e-1.el8.noarch             ubi-8-baseos 485.0 kB
   replacing tzdata-2021e-1.el8.noarch
Transaction Summary:
 Installing:        0 packages
 Reinstalling:      1 packages
 Upgrading:         0 packages
 Obsoleting:        0 packages
 Removing:          0 packages
 Downgrading:       0 packages
Downloading packages...
Running transaction test...
Reinstalling: tzdata;2021e-1.el8;noarch;ubi-8-baseos
Complete.
--> 0a9e9e0a1ed
[3/3] STEP 7/7: RUN ls -la /usr/share/zoneinfo/zone.tab
-rw-r--r--. 1 root root 19419 Sep 20 16:34 /usr/share/zoneinfo/zone.tab
[3/3] COMMIT servicenow-ee-29

Ansible Automation Platform – developer high-level workflow

For a customer recently, I had to talk about with Ansible Automation 2.x, what is required to develop ansible playbooks.

Here is a high-level workflow diagram that I drew;

Ansible Automation Platform – developer high-level workflow

So what it is that… When you are writing a playbook and testing it, you need the following components:

  • Ansible IDE tool – my current favourite is VSCode, because there are so many nice extensions + Red Hat recently have released ansible extension
VSCode Ansible extension
  • Ansible-Core – the command line tool, the language and framework that makes up the foundational content before you bring in your customized content.
  • Ansible-Builder – to build execution environments
  • Ansible-navigator – to run, test playbooks with execution environments

If you haven’t built an execution environment, the very first thing that you need to do is to build an execution environment, as below:

4 files that you need to create are;

  • bindep.txt – Bindep is a tool for checking the presence of binary packages needed to use an application / library, so whatever is defined in this file will be installed.
  • requirement.txt – The python entry points to a Python requirements file for pip install -r …
  • requirement.yml – Outlines ansible collection requirements for galaxy to download and include into the execution environment.
  • execution-environment.yml – A definition file as an input and then outputs the build context necessary for creating an Execution Environment image

Detailed examples can be found in:
https://www.ansible.com/blog/introduction-to-ansible-builder
https://ansible-builder.readthedocs.io/en/latest/

Once the required execution environment is ready, it can be shared across your colleagues to enhance the collaboration experiences through consistencies.

Also, now you can start to develop an ansible playbook;

Finally, once you are happy with the playbook and the execution environment, it should be uploaded and managed in source management systems:

  • playbooks – Source Control Management Systems – e.g. github, gitlab….
  • EE image – e.g.) Automation hub, Quay.io, artifactory…

Then those can be properly leveraged by Ansible Automation Platform.

Ansible Automation Platform (AAP) 2.1 – released

Last week finally, AAP 2.1 was released.
Here is the release note: https://access.redhat.com/documentation/en-us/red_hat_ansible_automation_platform/2.1/html/red_hat_ansible_automation_platform_release_notes/index
Here is a blog post from Red Hat: https://www.ansible.com/blog/introducing-red-hat-ansible-automation-platform-2.1

So to recap some of highlights are;

What’s included in AAP 2.1 – https://access.redhat.com/documentation/en-us/red_hat_ansible_automation_platform/2.1/html/red_hat_ansible_automation_platform_release_notes/platform-introduction#whats-included

Automation Mesh:
This is the newest addition to Ansible Automation Platform, and replaces the isolated nodes feature in 1.2. By combining automation execution environments in version 2.0 with automation mesh in version 2.1, the automation control plane and execution plane are fully decoupled, making it easier to scale automation across the globe. You can now run your automation as close to the source as possible, without being bound to running automation in a single data center. With automation mesh, you can create execution nodes right next to the source (for example, a branch office in Johannesburg, South Africa) while execution is deployed on our automation controller in Durham, NC.

Automation mesh adds:

  • Dynamic cluster capacity. You can increase the amount of execution capacity as you need it.
  • Global scalability. The execution plane is now resilient to network latency and connection interruptions and improves communications.
  • Secure automation. Bi-directional communication between execution nodes and control nodes that include full TLS authentication and end-to-end encryption. 

Satellite 6 and its partitions

This is a note for my own to remember partitions and recommended sizing that I have been using. Below partition table was created to be inline with various security benchmarks. e.g. CIS/Essential 8

OS Partition - 75 - 80 G Required

/boot - 1GiB
/ - 15 GiB
/home - 10GiB
/tmp - 5GiB
/usr - 15 GiB
/var  - 5 GiB
/var/log - 10GiB
/var/log/audit - 10GiB
/var/tmp - 5GiB
swap - 2G
Satellite
Satellite - 650G - 1 TB required
/var/cache/pulp/ - 20 GiB
/var/lib/pulp/ - 480 GiB
/var/lib/mongodb/ - 60 GiB
/var/opt/rh/rh-postgresql12 - 20 GiB
/var/spool/squid/ - 15 GiB

NOTE: From Red Hat Satellite 6.10, Pulp 3 is being used, MongoDB will be deprecated and it will be consolidated to PostgreSQL.

So from 6.10, the recommendations will be to increase PostgreSQL to 45G and remove /var/lib/mongodb.