Changing AEM Sling mappings with cURL

August 4, 2016

At one of our ANDigital client sites I've had the pleasure of working with Adobe Experience Manager, and more specifically, automating some of the workflows involved in managing environments running AEM. One of the specific issues we've had is changing Apache Sling mappings to reflect a unique hostname that's associated with the environment.

In our specific use case, we are using a generic snapshot of an environment with all the required components in it. Being a snapshot, it has entries with a generic hostname set in the Sling mappings - something we need to change whenever a new environment spins up.

The approach we're currently using involves creating a package in AEM CRX with the correct filters (/etc/map.publish in our case), building it, downloading it, changing the hostname, re-packaging it, and finally uploading and installing it back to the AEM instance.

This is a script to do exactly that, with a locally running AEM CQ5.6 Publish instance:

#!/bin/bash
# A script to amend Sling mappings in a locally-running AEM publisher installation

# Get the new hostname from the command line
NEW_ENV_HOSTNAME=$1

# First, create the package
curl -u admin:admin -X POST http://localhost:4503/crx/packmgr/service/.json/etc/packages/sling-mappings.zip?cmd=create -d packageName=sling-mappings -d groupName=my_packages

# Add a filter in the package to include /etc/map.publish
curl -u admin:admin -X POST -F 'path=/etc/packages/my_packages/sling-mappings.zip' -F 'packageName=sling-mappings' -F 'groupName=my_packages' -F 'version=1' -F 'filter=[{"root":"/etc/map.publish", "rules":[]}]' -F '_charset_=UTF-8' http://localhost:4503/crx/packmgr/update.jsp

# Ask AEM to build the package; this should block until it's done
curl -u admin:admin -X POST http://localhost:4503/crx/packmgr/service/.json/etc/packages/my_packages/sling-mappings-1.zip?cmd=build

# Download the package
curl -u admin:admin http://localhost:4503/etc/packages/my_packages/sling-mappings-1.zip > /tmp/sling-mappings-1.zip

# Unzip the package
unzip /tmp/sling-mappings-1.zip -d /tmp/sling-mappings/

# Replace the hostname in every mapping
find /tmp/sling-mappings/ -type f -exec sed -ri "s/old-generic-hostname.com/${NEW_ENV_HOSTNAME}/g" {} +

# Re-package the zip
cd /tmp/sling-mappings && zip -r /tmp/sling-mappings-2.zip *

# Install the changed package
curl -u admin:admin -F file=@/tmp/sling-mappings-2.zip -F name="sling-mappings" -F force=true -F install=true http://localhost:4503/crx/packmgr/service.jsp

The approach is definitely less than optimal (some might even say hacky!) - but with my very limited experience of AEM still, it's something that could get us past this blocker. Done is better than perfect.

1 Comment

3 Comments

Continuous Delivery and Continuous Deployment

June 20, 2016

What's the difference between Continuous Delivery and Continuous Deployment? While this can be an old concept for some, it still is a source of frustration for many.

First, you need to know what Continuous Integration is:

Continuous Integration is the concept of testing code changes automatically, and merging passed changes in a common master branch. It is a prerequisite for any Continuous Delivery pipeline.

Continuous Delivery is confidence of knowing your codebase could be deployed to production at any given point in time. It implies you have a way of automatically testing and rejecting changes that would break your production environment.

Automated Continuous Deployment pipeline

Continuous Deployment is taking Continuous Delivery a step further, meaning every valid code change will be deployed to production automatically.

Topmost photo: Getty Images

3 Comments

4 Comments

Terraform + Chef Quick Start

April 14, 2016

I wrote an example of how to use Terraform with Chef (specifically chef-solo) to provision an environment in AWS EC2. While Terraform does include a built-in Chef provisioner, it requires running the Chef Server. This example instead ships cookbooks to the nodes with the Terraform file provisioning functionality, and uses a simple bash script to install and run chef-solo.

To run the example, you will need an AWS account with API credentials (Access Key ID and Secret Access Key).

git clone git@github.com:mjuuso/provisioning_example.git
cd provisioning_example
./run.sh <aws_access_key> <aws_secret_key>

The script will make sure you have all the prerequisites for running the example (terraform, curl, ssh and ssh-keygen installed), generate SSH keys, run terraform and then verify that the load balancer works as expected.

This will create you the following resources in AWS:

two application instances (t2.micro) running a sample Golang application server
a load balancer instance (t2.micro) with nginx proxying requests to the application nodes in round-robin fashion
an SSH key pair (generated by the wrapper script)
two Security Groups -- one for the load balancer, one for the application nodes
a Virtual Private Cloud (VPC)
two subnets within the VPC, on availability zones eu-west-1a, eu-west-1b
an Internet gateway, a routing table and routing table associations

The example could be trivially extended to include a tier of database servers, or even an autoscaling group for the application servers.

For the code, check out https://github.com/mjuuso/provisioning_example.

Find something to improve? Pull requests are very welcome!

4 Comments

1 Comment

What's the most important tool in DevOps?

April 8, 2016

I was asked the other day what I think is the most important tool in my toolbox. Often DevOps is thought to be merely about automating everything - building beautiful pipelines to ship code. Would that make vim the all-around Swiss army knife of DevOps? No.

DevOps is about the ops team attending architecture meetings from the get-go. It's about having the development team participate in the out of hours rota. Thinking about how the application is deployed, maintained and monitored before a single line of code is written. DevOps is about QA preventing defects proactively, thinking of quality as a feature of the delivery process instead of "finding bugs".

DevOps is about breaking down silos, sharing responsibility and building tools to collaborate effectively. We have a shared goal, after all. We are doing business. We need to build and ship solutions that somebody is willing to pay for. More often than not this means delivering a set of features by a certain date within a limited budget - the axis of evil, if you will. Without a high-performing team that is a beautiful, yet often unattainable goal.

Sure, automating everything is important. Instrumenting and monitoring applications and the delivery process itself - and iteratively improving - is important. The technical competence to choose the best technology for the job in hand is very important.

But the most important tool in DevOps? It's communication.

1 Comment

First look at Jenkins 2.0: Building Docker images

March 31, 2016

For a while now, I've been on the lookout for the best CI/CD automation server that suits our needs in Artirix. We were using ThoughtWorks Go for a while, but found it a bit cumbersome and demanding to maintain. This brought us to the managed ThoughtWorks Snap-CI, which lured us in with a nice interface, cool debugging capabilities (snap-shell) and integration with Github. However, there are a few things in Snap that are starting to hurt us in our daily operations. It has no support for building Docker images (the Docker preview was launched 6 months ago, but they've been quiet ever since), and there's no way for us to easily see what our worker nodes are doing. Trying to run a time-critical manual deployment when (unbeknownst to you) all of your workers are stuck running a 30min test in branch x of repo y is definitely not fun.

With the Jenkins 2.0 beta released last week, I decided to give it a go and set up pipelines for a couple of our apps that are deployed as Docker images. I have to admit, previously I haven't been too much of a Jenkins fan, in part because of the cluttered interface and hacky support for staged pipelines. It seems that 2.0 aims to fix both of these, and it looks very promising.

While it has been available in earlier Jenkins versions, the Pipelines as Code principle is a particularly good step forward promoted by 2.0. Instead of configuring build stages in the tool, they are written in a special Jenkinsfile stored within the repository. Here is a sample Jenkinsfile to build a Docker image:

node {
    stage 'Checkout'
    /* Checkout the code we are currently running against */
    checkout scm

    stage 'Build'
    /* Build the Docker image with a Dockerfile, tagging it with the build number */
    def app = docker.build "artirix/our-app:${env.BUILD_NUMBER}"

    stage 'Test'
    /* We can run tests inside our new image */
    app.inside {
        sh '/app/run_tests.sh'
    }

    stage 'Publish'
    /* Push the image to Docker Hub, using credentials we have setup separately on the worker node */
    app.push 'latest'
}

To use this, you'll need to install the Pipeline and CloudBees Docker Pipeline plugins.

With the new Organization folder type, setup is a breeze. Jenkins traverses a bunch of repos with the Github credentials I gave it, and creates a pipeline on the go for every repo and branch it finds a Jenkinsfile in. As this can be triggered by a webhook from Github, we can have Jenkins create a pipeline on demand for every branch we create - effectively allowing us to run tests for pull requests automatically. This is important for us, and alike what we had setup in Snap-CI.

“Jenkins creates a pipeline on demand for every branch we push to Github.”

Understandably the beta still has a few problems. For some reason I can't seem to save some configuration options for an organization folder, and I was unable to set up a Github webhook to rebuild the organization folder on repository changes. However the new version is definitely a leap forward, making Jenkins 2.0 a very welcome addition to the CI/CD space.

Image: Nastco/Thinkstock

1 Comment

Rethinking infrastructure

February 22, 2016

When I joined Artirix, the company had just started the journey towards Continuous Delivery. A large part of it is the cultural transformation to DevOps: improving workflows and collaboration between Dev, Ops and QA. But in addition to having the best rowers beautifully in sync, we also need a streamlined and leakless boat. Therefore an equally large part of our journey is optimising and modernising our technical environment - patching the holes that slow our boat down.

Identifying problems

One of the largest challenges we have is configuration drift, ie. the gap between actual live production configuration state, and the state our configuration management tool depicts. Ideally those states are identical. But in the real world, the two configuration sets are often wildly different. In our case this is certainly true, and can be accounted to the way we historically do configuration management. We run Puppet in a masterless configuration with Capistrano, which means that configuration changes are applied manually on demand. This is both slow (Cap rsyncs the full Puppet repository to every node and applies it locally) and non-enforcing. It's too tempting to do a quick change manually, and update configuration management to reflect that later (or, never). While a masterless setup might scale better in some circumstances, the benefits just don't outweigh the disadvantages for us.

“At some point it became easier to just SSH into production, change a setting, and go on with your life.”

Creating environments is a particularly painstaking chore for us. It takes days - sometimes even weeks - to create a new project environment. We need to provision and bootstrap new instances, modify and rewrite Puppet manifests to fit this special case, set up monitoring and backups, and add separate environment configuration in about a dozen components. It's tedious, error-prone and repeating work - in other words, a prime candidate to be automated.

Another problem with our current environments is that by design, each one is a unique snowflake. For our internal and testing environments, we tend to pack every component neatly in one box. This lowers hosting costs, but makes the environment very different from a production environment, which might span dozens of separate instances. This poses a clear problem with a Continuous Delivery pipeline: even if code functions as expected on every environment prior to production, the final step (deployment to production) is still a risky leap into the unknown.

Setting targets

As we embarked on the journey to transform our infrastructure, the first target we set was simple:

“We want to be able to run Chaos Monkey and feel good about it.”

Simply put, Chaos Monkey, developed by Netflix, is a tool that causes random failures in groups of systems. For an organisation that's used to constant firefighting, the change of mindset required for deliberately starting fires is dramatic (even though it's not unheard of for firefighters to moonlight as arsonists). The reasoning for the change, however, is simple: complex systems fail inevitably. If our infrastructure is resilient enough to withstand Chaos Monkey without catastrophic failure, it implies we have reached a certain milestone in the maturity of infrastructural design and configuration management.

We also set some other goals:

We want to kill configuration drift
We want our infrastructure to be versioned, and thus represented as code
We want our environments to be identical across stages
We want Dev and QA to be able to create and destroy environments on demand
We want to save our clients money by using resources more efficiently

Finding solutions

A lot of our problems will be solved with immutable infrastructure. Instead of managing long-lived servers, resources should be considered disposable. Instead of reconfiguring running servers and deploying new versions of software, we should start from scratch whenever possible. To achieve this, our weapon of choice is Docker with Kubernetes. Obviously, this requires refactoring all of our components to work inside containers - a task that is definitely as interesting as it sounds.

For tackling the snowflake-environment problem, we're relying heavily on autoscaling. The fundamental idea is that every environment should start the same, while production environments will automatically scale to match the load imposed on them. This will come with two benefits: our environments are very much alike across stages, and we can introduce cost savings and flexibility for our clients by automatically scaling the environments up or down depending on the load.

For creating environments on demand, we're building a simple tool with 5 basic functions:

Wrap Terraform to create Kubernetes clusters on AWS EC2
Create environments (namespaces) with specific component (Docker image) versions and services in the Kubernetes clusters
Present stdout from containers, and logs gathered by fals
Automate data migration between environments (MySQL/MariaDB, Redis, S3)
Manage hostnames for environments with AWS Route 53

Effectively a hybrid between an internal PaaS and CaaS, this will allow Dev and QA to create disposable environments on their own. Among other cool stuff, it will ultimately allow us to do things like blue/green deployments to production - increasing our confidence in deployment even further.

What will our future look like?

While our transformation is still very much in progress, I see these improvements as a big win for the business as a whole: it will increase our confidence in deployment, bringing us a step closer to the benefits of true Continuous Delivery. It will eliminate error-prone manual work, and decrease time to market for new features and projects - a real competitive advantage.

We're not only patching the holes that slow our boat down. We're adding motors.

1 Comment

Continuous Delivery with the OSv unikernel

February 17, 2016

Following on from the short introduction to unikernels, I wanted to write a few words on how we at Artirix do Continuous Delivery with OSv - a cloud operating system built on the unikernel principles.

We have an API component written in Scala, which has a test and build pipeline in Snap-CI. Fundamentally the component is very straightforward: it talks to a database, and exposes an API to be used by other components. Historically we had deployed this component as a Docker container, so we saw it as a prime candidate for our unikernel experiments.

The pipeline

Triggered on every commit to master, the pipeline we built looks roughly like this:

Pull the code from Github
Test it
Build & assemble a fat jar to be run in the unikernel
Launch an EC2 instance with a prebaked OSv image including its REST API component and the JVM
Use OSv’s built-in REST API to send the jar to the running instance
Again with the REST API, update the OSv boot command line to run our jar on next reboot
Create a new AMI (Amazon Machine Image) based on the running unikernel instance
Create a new EC2 Launch Configuration using the new AMI
Update an EC2 Auto Scaling Group to use the new Launch Configuration
Do a rolling update by stepping the desired capacity of the ASG
Clean up old AMIs, Launch Configurations, and the intermediate instance we launched on step 4

Ta-da, a deployed unikernel! The runtime for this pipeline, from code commit to finished deployment, is about 17 minutes. This is fairly high, and could be optimised.

Note that our pipeline only deploys the component to an internal environment for now. We could trivially replicate steps 8-10 for any other environment (staging and production) we want to deploy to.

Our app ships logs to Loggly for later analysis. We've also had success including New Relic in the JVM.

Limitations with our approach

The minimum billing period for a launched AWS EC2 instance is one hour. Therefore, the above pipeline will cost us an extra instance hour on every deployment. Our first thought was to work around this by purchasing a Reserved Instance and paying everything up front - bringing the hourly price to $0. Unfortunately AWS advised us that fully upfront paid Reserved Instances are in fact pay-744-monthly-hours-up-front instances, so when the number of monthly deployments + the number of running hours from beginning of month reaches 744, we'd start paying normal on-demand instance prices. However, as the component runs on a t2.micro for now, the extra billed hour doesn't really matter.

Another thing we've recognised is the need for automated smoke testing, and optional rollback, of new deployments. While arguably this is key to any CD pipeline, perhaps even more so in this case; if our unikernel image fails to start for some reason, the Auto Scaling Group will keep retrying infinitely. Due to the EC2 billing trait featured above, this could become costly quite fast.

While our approach certainly has room for improvement, it has allowed us to quickly prototype packaging and deploying an existing application as a unikernel.

1 Comment

Coming your way: Unikernels

February 17, 2016

At Artirix, our DevOps team is constantly working on improving our infrastructure. I admit, we are selfish human beings - we want to do less firefighting, never wake up to 4am PagerDuty incidents again, and stop doing repetitive work. Luckily, though, all this comes with the added benefit of increased customer value.

We are currently running a somewhat traditional setup in AWS (pictured above). While we agree that containers is the next logical step to take, we’ve had increasing interest in looking one step ahead: unikernels. We see this as a way to further simplify our infrastructure while making it more resilient.

Unikernels are operating systems that are custom-built to run a single application directly on the hypervisor.

So, what are unikernels anyway? Start by thinking about your typical server in the cloud. It's running your application, along with a plethora of supporting services enabled by default. Everything is stacked on top of an operating system that is designed to support a wide variety of platforms, devices and software. Therefore, by design, the operating system contains a lot of components that make absolutely no sense in the cloud. When was the last time you needed a mouse (or a floppy) driver module loaded into your kernel?

In a sense, the rise of containerisation only makes this worse. Consider the following stack for running a containerised application in AWS:

It may seem overly complex - probably because it is. When deploying a carelessly built container, you are effectively installing another operating system on top of an operating system. Now, consider the same application running as a unikernel:

The beauty of unikernels is simplicity. Only the things your application needs are baked into the operating system. There are no floppy or mouse drivers, unless you explicitly decided to build them in. Your application has only the necessary components to fulfil its function and run as a virtual machine directly on the hypervisor.

A unikernel contains the least amount of moving parts needed to do the job.

Needless to say, this has serious security implications as well. The attack surface of a unikernel is typically only a fraction of what a containerised application running on, say, Ubuntu Linux would have. By requiring you to explicitly build in functionality, unikernels provide exceptional visibility to what your stack is actually running.

There are several promising projects around unikernels. We’ve experimented mainly with OSv and rumprun. Take a look at MirageOS, LING, HaLVM and Clive as well.

While unikernels are certainly interesting, there are many obstacles to work around before they reach the maturity required for widespread usage. However, with the recent purchase of Unikernel Systems by Docker, we might see quick developments in unikernel monitoring, debugging and deployment - which are all needed to efficiently utilise the technology.

We’ve started a Meetup group in London for people interested in Unikernels. Become a member and check out the next meetup at http://www.meetup.com/London-Unikernels-Meetup/

p.s. Just kidding, the cover picture is not really of our AWS setup. We are a bit more modern than that. The picture is of ENIAC, built in the 1940s, courtesy of the U.S. Army.

1 Comment

Blog

Identifying problems

Setting targets

Finding solutions

What will our future look like?

The pipeline

Limitations with our approach