Tuesday, 21 May 2013

A short story about fish



About 6 months ago my team embarked on an initiative to upgrade our Continuous Integration (CI) & Automation pipeline. As an iOS team, the fact we already had a comprehensive suite of automated functional tests hooked up to Jenkins was impressive and gave us a solid foundation. However, our setup had started to creak at the seams. Inexplicable test failures, brittle tests, failing tests that miraculously worked if we gave them a swift kick in the nether regions and ran them again! We decided to evolve to a CI 2.0, which would be a lot more stable, massively reduce the amount of time we spent nursing our tests and ultimately give us better confidence in our system. We are not done yet but we have re-learned a number of basics along the way

Lesson 1: with CI & automation, fast feedback is king
Our monolithic test suites took hours to run. This discouraged developers from running them on every check-in, which is counter productive. Not running them means you increase the chances of breaking functionality or introducing defects with each commit.

To achieve faster feedback we have split our tests into smaller test suites each with its own Jenkins job which means we now start to get feedback in tens of minutes rather than hours.

Next steps :
We want to set up a grid system, to run the tests in parallel to further reduce the time to run

Lesson 2: Responsibility

When you don't have fast feedback You also get in a situation where breakages can't be easily traced back to a single code commit, as the 'window of culpability  is open for a few hours/ half a day. Anyone who commits in that period could be at fault. Who does the investigation to see what caused the breakage? Without trace-ability  it's often a case of thinking 'someone else will look into the failing tests' which means, probably, no one will look into it!

We introduced the concept of a 'Weatherman'. Someone who is responsible for investigating any breakages and getting them fixed as priority. This role is rotated on a daily basis so it doesn't become too much of a chore for any one individual.

This brought more visibility of breakages and helped change the culture of the team to have quality and visibility of quality as a priority on a daily basis.

Lesson 3: Granular feedback

I mentioned already that we broke our tests into smaller suites. Great for fast feedback, but also granular feedback which helps when trying to pinpoint the breakages. Also, if you have one failing suite you still have confidence in all the other passing suites. With the monolithic test suite we didn't have that confidence.

Lesson 4 : Keep your tools sharp & get new ones when you have to

We were using Apple's UI automation tool, but it started to show signs of flakiness and incompatibility with what we wanted to do. You would expect apple to produce a solid tool but further investigation showed that they don't really invest too much in UI automation which meant we had to start considering other tools.

We have now started to use FRANK for automated acceptance testing, which has led us down a behaviour driven development (bdd) path. (Bonus!)

Lesson 5: Appreciate legacy code

Appreciate when your legacy code is giving you value and not costing you much - you don't always have to migrate everything.

We continue to run our stable legacy tests in parallel with our shiny new frank automation tests. They are stable and don't cost us much time in maintenance and still give us good value. It would take months to fully migrate them so we won't bother unless we have a compelling case for making that investment.

Lesson 6: Adopt a culture of quality

As mentioned previously, the concept of a weatherman, who gives the team it's daily weather report, has massively helped change the culture of the team. Quality is everyone's responsibility but on a daily basis it's even more so the weatherman's responsibility!

Having experienced the pain of brittle tests packs, slow feedback, poor tools etc the team now knows the value of staying on top of our CI and automation packs and ensuring we have the speed and feedback that we need. .

Conclusion 

That's our story, we are more than halfway to having a slick CI pipeline. The improvement we have made is fantastic bearing in mind that all this was done on top of regular sprint work and deliveries. Great commitment from the team to improve the culture, environment and toolset.
Previously we were getting some of the value of CI & automation with a considerable maintenance cost. Now we are getting a lot more benefit with a lot less maintenance. Happy days.

Finally, To make sure nobody leaves disappointed ..... Here is a picture of some lovely fish!