At Jazz Networks quality is the responsibility of everyone, from the graduate to
the CEO. Quality matters because we are a security company, and our focus is to
protect you and your organization from potential threats. We want to ensure that
our software always performs exactly as designed by us. A change is not good
enough if it does not provide sufficient test coverage and benchmarking, no
matter who made it. Our testing process is easily reproducible by any Jazz
developer so fixes can be implemented quickly. We exceed the
as a measure of our process. We are
We use, test, and contribute to thoroughly tried and tested open source projects
where we can, but understand that our customers expect us to be experts and
fully responsible for them. Where no third-party tools exist we build them
in-house using industry best practices.
In seeking quality we have invested heavily in continuous integration (CI) at
Jazz. It is well known that finding and fixing issues later in the development
process has a greater cost, but there are no magic bullets.
Doing unit testing alone can lead to issues in integration and usability of the
system, testing at higher layers has higher cost of compute time and pipeline
latency. At Jazz we have struck a balance between the two, and are continuously
refining the process as we develop new internal tools and external features to
keep our quality high, whilst minimizing the latency of a change so that fixes
can be merged quickly when needed.
We have run over a million jobs through our pipeline - every commit being tested
and approved before merging. This is important to ensure our master branch is
always ready to have release branches made from it. We have broken our process
down into a few different steps.
Every merge request at Jazz is approved by another developer. There is no way
for any developer to merge a change without going through the approval process
no matter their seniority. It is important to us that any developer can
block a change request by asking whether it has enough testing in place, or if
something is unclear or unnecessary. Code review feeds into spreading
knowledge in the team, as people can observe and contribute to parts of the
codebase they are unfamiliar with before jumping in. All comments must be
resolved before any code can merge.
Static analysis is a technique which finds common mistakes made by
developers using source code as an input. Developers can run most of these
tools live as part of their own process (some as they are typing!), in
addition to CI running them. We use industry standard tools from Synopsys
(Coverity), and Google (go fmt, go vet, go lint), as well as a range of open
source tools for every language we use at Jazz (go-metalinter, eslint,
elm-analyse, shellcheck and many others).
There has been much written on the subject of software quality metrics, such
as cyclomatic complexity. Whilst we do not believe that having good
software quality metrics is equivalent to good software, we do believe that
having bad metrics is indicative of a problem. As a result, we measure and
limit them for every commit into our repository.
Unit testing is vital
for finding simple issues early on. A change is not considered finished until it
has good test coverage. We have found like many others that test driven
development can find errors earlier, and lead to better design choices. We run
all unit tests in our CI environment for regression and add to them as issues
are found in our code. Our CI environment also serves as a place for running
microbenchmarks for checking the performance of small components of the system.
Kernel testing framework
The Jazz agent is split into two components, the driver or module (“bottom
half”) and the application layer (“top half”).
The agent’s kernel modules contain a platform specific layer which we test
in isolation from the main agent software. This architecture helps us find
issues within the bottom half of the agent easily, and allows us to reuse
testing logic between Linux, Windows and Mac agents. We run testing on all
our supported platforms in CI, as well as providing automation for
developers to run tests locally in virtual machines. These test the event
pipeline from the operating system through our kernel module and into the
agent, as well as checking for issues between our kernel module and
operating system APIs. The tests are necessarily run natively in virtual
machines as they require access to operating system resources that cannot be
shared. As a result these tests are some of our more expensive in terms of
test resource. We also monitor performance here, to check for any impact we
have on the system performance as a whole.
Remote agent testing
Complementary to the kernel module testing, we have created an agent where
the module layer is replaced with a mock or test driver. With this we can
test the top half of the agent cost efficiently by instantiating many
instances of these tests on a single CI node. At Jazz we use containers for
the vast majority of our testing in order to reproduce environments, and the
remote agent testing is no exception. Given the platform independent nature
of the top half of the agent we can build coverage between platforms.
Additionally, we can use this framework for benchmarking of the Jazz
Platform itself without having to instantiate thousands of real, virtualized
agents and it allows us to inject faults or erroneous data into the platform
to measure any impact.
We deploy all of our agent platforms performing a variety of tests against
our test deployments internally, as the final backstop before release. This
checks agent enrollment, and performs smoke tests for various agent
features. The end-to-end testing leans heavily on the agent crash reporting
system and performance statistics. We gather statistics in the same way as we
do for customer installs which help us validate that we can diagnose issues in
We run several agents as testing endpoints long term to check for any
performance degradation and cumulative errors, as well as testing the
upgrading process. This is where we monitor CPU usage and memory
utilization, with alerts on when these exceed certain thresholds to demand
Dogfooding and usability
At Jazz we actively run the agent on every machine we can, including our CI
machines, laptops, desktops and servers. Dogfooding helps us track
performance and usability of the agent when running on real workloads.
Running the Jazz Agent allows us to gather real data which serves as an
input to our machine learning. It also keeps us honest, as we would not ship
software to our customers we would not be comfortable relying on ourselves.
Our testing process is not done yet - nor will it ever be. As issues arise we
have a policy of adding tests for regression testing, and will keep refining
our process as we develop the product further. More layers will be
introduced as we have access to more resources (in fact this blog post is
out of date at the time of press, and doesn’t cover penetration testing,
fuzzing, or how we test infrastructure), but I can comfortably say that the
Jazz Agent has the most comprehensive testing system I have ever worked
- Atlassian, “Dogfooding and Frequent Internal Releases”, July 9, 2009
- IBM, “Monitoring cyclomatic complexity”, March 28, 2006
- J. McCabe, “A Complexity Measure. IEEE Transactions on Software Engineering”, Volume SE 2 No. 4, 1976.
- Synopsys, “How much do bugs cost to fix during each phase of the SDLC?”, January 12, 2017