“Why do we have to test in production?”

…is a question your management team or others may ask, and it is understandable. Most people older than 35 worked for many years under the golden rule ‘never test in production’, and they have an expectation of a ‘full end-to-end test environment’.

In fact, ‘end-to-end testing’ has in some cases always involved a mix of test and production environments, particularly in the case of developments using data feeds from different sources.

These days, with more of the tech stack in distributed systems and in ‘the cloud’, a ‘full test environment’ is less likely to be available. The reasons for this are:

  • with more businesses taking advantage of cheaper ‘cloud’ licenses for software applications, those license arrangements may not include access to test environments
  • more businesses run tech stacks of multiple integrated systems and data feeds, rather than a vertically integrated single-application system with full control over all the functions and a hosted test environment
  • maintaining up-to-date and synchronised test environments for all of the integrated applications and data feeds can be so prohibitively costly as to not practically be possible. For example, vendors might not all have equal resources or will to build or maintain them or to configure them for your environment, or your organisation might not have the necessary resources or expertise to configure them at the user end, set up test data, etc.

Plus, even if your organisation is big enough to have awesome, fully-connected and maintained test environments over the whole stack (a big if), and even if you are able to run a ‘full end-to-end test’ in that environment, you will still have to test in production as well. There are two main reasons for this:

  1. Given the complexity of the tech stack – multiple apps and data feeds, plus proliferating features and modifications in the applications themselves – the way all these elements interact is increasingly complex and hard to predict. Your test environment is unlikely to replicate the complexity in production. That is, “not all failures can be replicated in test”.
  2. Depending on the system and on your organisation’s setup, configurations and linkages might not be able to be ‘imported’ from test to production. They might be ‘copied’ after a successful test run. Of course the configurations should be checked after setup, but even after those checks are done, a production test should still be done to ensure all is well.

 

i-just-like-to-double-check-double-checking-is-my-favorite

So how do we test??

Work out what you need to test, and test what you can. What new thing are you doing, and what do you need to make sure will work, and what do you need to make sure will still work? What do you need to make sure will not happen? Once you have your list, agree how you will test or validate each item.

Talk to the tech team and the vendors. Discuss what you need to test, and get their recommendation on how to test it. (As testing should be part of the development cycle, you should have these conversations as part of the development sprints anyway).

Know what the technical team/vendor is testing locally. This can allay concerns you or the business might have about testing. If your vendor is building something, they will have done some level of testing as part of development and more as part of code review or QA; possibly even more as part of proxy user testing. Plan testing with them so you are not doubling up and are not testing the wrong things. These conversations will also help you understand the limits to the test environment, and why some things must be tested in production.

Understand the data and system flows up front. Understand that any Visio or draw.io doc of a system process flow that has NOT been produced by the people who built the system, is aspirational and is NOT likely to reflect reality. You need to validate what is actually happening with data flows between the systems. This will give you a good understanding of what to test or check, and where things could go wrong. AND it means that you will be on the same page as the development team and can discuss system flows with fewer frustrations and misunderstandings. (Bonus: it will enhance your discussions with business and users as well because you will know what the system actually does, and can cut off their misunderstandings as well).

Allocate the required space to ‘test in production’. This may be some ‘dummy’ accounts in a ring-fenced area of production, combined with live monitoring on live date, with the developers on-hand and ready to jump in and fix issues as they occur. IDEALLY these will be small, config-related issues once you are live, but that is by no means guaranteed. Real unforeseen issues can and do occur, that are not a fault of lack of testing or foresight.

Agree before go-live on the approach when these issues occur. This may be ‘roll-back’, but the better approach wherever possible is ‘roll-forward’ – i.e., fix fast and reiterate. As your system is likely to be cloud-based, this is normal and there is wider acceptance of this these days anyway, with everyone’s phone apps and websites constantly being updated, and even occasional outages in critical things like banking apps and Amazon services.

 

Some possible test approaches

  • New feature in an application: TEST environment: thorough user tests. PRODUCTION environment: Check configurations match test, quick validation that screen looks and behaves as in test environment.
  • New high-risk financial transaction function: TEST environment: thorough user tests, happy and unhappy path, as many of the possible scenarios as you can manage, if not all of them (and if not all of them, get sign-off on what you will and will not be testing). PRODUCTION environment: Check configurations match test, and do a small real transaction (e.g., $1), or monitor and validate the first real transactions.
  • New data feed: TEST environment: thorough tests as far as test data is available, to validate the API’s and data definitions and data integration into your system. PRODUCTION environment: Migrate or copy all configurations, monitor and validate the real data over the first days, weeks or months as appropriate until you are happy the data feed works.

 

These are just three possible examples. Depending on your build and your risk your test framework could obviously look different. The point is to work out what you need to test, and then work with your solution team to agree how you will test it, and don’t be locked into any preconceived ideas about what should and should not be done in production.

 

Google ‘test in prod’ or try these links if you want to know more:

Opensource.com: Testing in production: Yes, you can (and should)

Saucelabs.com: Why You Should Be Testing in Production

Segment.com: We test in production. You should too

 

One thought on ““Why do we have to test in production?”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s