Architecture

From Matt Morris Wiki
Jump to navigation Jump to search

On Dealing With Application Silos

"Insanity is repeating the same mistakes and expecting different results." - Narcotics Anonymous, Nov 1981

A common approach to dealing with large clusters of functionality that are perceived to be locked away in a hard-to-get-at "silo" is to rewrite/ replace by a better/ newer implementation of the functionality, that (it is intended) will offer better capabilities for reuse, maintenance, etc. But the results are almost invariably not what people were hoping for. Typically the original silo will not be fully replaced by the new implementation, which will then become a silo in itself.

Why does this happen?

I think the main obstacle in the way of effective action on silos is that people don't have a clear understanding of why the silos are bad: the first step towards resolving any problem should be to understand what the problem is. So the question that needs answering before anything else is done is: why is an application silo a problem? If someone says the answer to this question is "obvious", then do not be cowed. Lots of clever people I've worked with (IB) have repeatedly failed to fix the problems that silos cause, often without realising that this is the case. So it seems as clear as can be that the answer is not, in fact, "obvious".

I would state the main problems with application silos to be:

  • limited reuse: there will be a lot of useful functionality that other people want to call, but can't, because it's behind a silo boundary. For instance Legal Entity lookup, Trade Clearing Eligibility, Trade Input/Manipulation, etc
  • long release cycles: the bigger the silo, the harder it is to test and the longer it will be between releases. A sufficiently long release cycle (on the order of months) will make many forms of development either incredibly expensive or practically impossible. The more a form of development relies on user feedback (GUI development is a good example) the worse it will be affected.
  • scalability issues: if lots of functionality is lumped together in a single process then we will have problems getting optimal scaling - some aspects will be compute-bound, some IO-bound, some storage-bound - but the same scaling tactic will need to be applied to all aspects inside that process boundary.
  • complexity: in practice, one of the main constraints on the ability of cash-rich but dysfunctional organsiations such as banks to improve is the complexity of their systems. Because silos have lots of functionality in one place, with the interactioms between the areas of functionality hard to understand, this makes silos very complex (given the functionality they contain).

This tells us where the pain points are: the question is then what we do about them. A few observations:

  • Replacing a silo with another silo is not the answer. The problems are all properties which follow directly from being a silo. So the answer can never be to replace one silo with another. Instead we need to break out the aspects that are most affected into separate services (the granularity can vary - current vogue is for "fine-grained SOA" / "microservices").
  • One should consider not replacing a silo. If there is functionality that no-one else wants, that doesn't need to change, where scalability is not a problem, and where the behaviour is well understood, then why would you want to rewrite it? It's fine where it is.
  • Don't proceed at random. It will be a learning process, so it's best to start with something relatively simple. Once you get going, better to tackle areas in order of how painful they are according to the criteria above. I've seen discussion around "what kind of bank infrastructure would a Google set up?" But the kinds of architectures that companies like Amazon, Google and Netflix end up with are not a happy accident, but they arise from precisely this kind of analysis: asking what the pain points are around the current architecture, and how they can be addressed.

Versioned Interfaces

Some general principles

For a while, allow new messages to be consumed as old messages

  • Means you can't extend enums with new values but need to go Enum => EnumV2
  • Offers the advantage that senders can upgrade without consumers having to upgrade in lockstep
  • Once things have moved sufficiently far along, can deprecate and retire obsolete elements to stop cruft accruing indefinitely
  • But need to support old formats for long enough that senders can be rolled back without blowing up the consumer

Do you change the name of the interface/endpoint, or do you include versioning in your protocol?

Do you change the name of the interface, or do you encode versioning in what you pass into the interface?

This is somewhat controversial: there's a big thread Stack Overflow thread on best practices for (REST) API versioning

In REST, the question is whether you embed versioning in endpoints ("/company/apiname/v3.0") or in the "Accept:" line of the header.

My own preference is for versioning the protocol rather than the interface name, as long as the practices above are followed - this avoids the need to change bindings in code, but imposes a certain level of discipline on all involved.

What if you have a big enough change that you can't, or don't want to, bridge the gap?

One might have two things with the same name, but calculated using fundamentally different representations.

Here we are no longer incrementally enriching or evolving; instead we are asking that both the sender and consumer must take the same choice out of two incompatible representations.

In this case, having different interface/endpoint names seems appropriate.

Does it all depend on the violence of change?

We seem to have two different sorts of interface change - one where the new is an incremental change to the old, and one where the new is a fundamental break. The distinguishing criterion is whether the new can readily be interpreted as the old.

In the "incremental" case, the idea of allowing interpretation works well, and avoids the complexity and hassle of churn in endpoint/interface names.

But in the "fundamental" case, we face a different kind of versioning challenge. Here one must make sure that both the sender and consumer are agreed on their idea of what is being conveyed, and expressing this in endpoint/interface names seems like it could be far more appropriate.

Differentiating between channel and interface

We might have a communication channel where information is passed (at some stage being serialised), and wish it to be shared between related, but distinct interface calls. To allow ourselves to do this, we should avoid identifying the channel with a single interface.: instead, we should add an extra discriminator to information passed through the interface, allowing multiple interfaces to share the channel.

Adding such discriminators will get tricky if the information is ultimately reduced to something relatively sparse - such as a csv file.

For files, one possibility is to accompany with another file FILENAME.metadata, which will contain

  • a checksum for the file it supports (MD5 hash or similar)
  • type information so we can disambiguate between different interfaces

Getting a decent solution to this feels tricky, because we have two things fighting each other in the csv format:

  • the simplicity and immediacy of using it
  • the lack of any side channel offered for type/interface information