Study Writeup: Microservices Overview

From Matt Morris Wiki
Jump to navigation Jump to search

This is a Study Leave write-up.

What Is It?

From Martin Fowler: Microservices

The term "Microservice Architecture" has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data.

Why Study It?

Some might say this is just fine-grained SOA - so why all the fuss? The answer is that "Microservice Architecture" sits in the middle of a very interesting cluster of concepts and trends:

  • The DevOps movement
  • Deployment, testing and configuration tools (which have had a lot of interesting change recently)
  • Getting common messaging architecture while decentralising data governance and schemas
  • Increasing dissatisfaction with the monolithic approach in many quarters
  • Conway politics: being able to better align technology and business structures

This page will cover the fundamental ideas, with some historical context added in as I go.

The immediate aim is for me to identify the related clusters of concepts that are worthy of further study in their own right. As a result this page will remain somewhat unstructured for a while, as it's aiming to capture concepts initially rather than provide a cohesive, terse overview.

Toggl code

Toggl code is WRITEUP-MSERV

Deliverables

Max 3 hours of time. Deliver write-up on this page.

Write-Up

What Is Microservices?

One could describe Microservice Architecture as lightweight/fine-grained SOA, without the perceived baggage of WS* and ESB. However when people discuss Microservice Architecture, it typically comes along with the following ideas

  • DevOps: Services need to evolve over time rather than being written once and then passing to operations/maintenance
  • Virtualisation/Containerisation: The approach requires advances in development and deployment technology
  • Scalability: fine-grained decomposition gives greater freedom in how services can be sacked up according to their respective resource requirements
  • Agility: individual services are simpler to develop and deploy
  • Messaging: the services need to communicate, and messaging buses are increasingly used for this
  • Experimentation: it becomes easier to experiment with new languages in a relatively small and clearly bounded context
  • Robustness: fault isolation is potentially improved (a memory leak in one service only affects that, rather than bringing down a whole monolothic system)
  • Conway: increased ability to get political/organisation structures and technical structures in better alignment

There are problems:

  • Distributional Complexity: IDEs may not work so well. Testing is hard. There are more failure modes. Transactions are a lot harder
  • Operational Complexity: you may have multiple instances each of many types of service to manage in production
  • Deployment Complexity: you have cross-service dependencies to worry about
  • When To Implement: typically at the start of the lifecycle, the monolith is not big. So when should Microservice Architecture be used? If you start off with an elaborate distributed architecture then development is slow

Communication patterns:

  • API gateway: sits between clients & microservices: coarse-grained API for mobile, finer-grained to desktops.
  • Interservice communication
    • Can use synchronous HTTP (REST, SOAP)
    • Can use asynch messaging (e.g. AMQP-based message broker)
    • In practice applications are likely to use a mixture

Data management can get a lot more complex

  • Each service has own database & the architectures won't necessarily align
  • Reads may require services to cache from other services in order to be timely
  • Updates are even more tricky
    • Distributed transactions are one approach but not much in fashion: REST/NoSQL don't do this.
    • Event-driven synch updates via a message broker: trades consistency for availability

Refactoring:

  • Hardly anyone starts from a green-field state
  • Identity a decent first component to extract, good candidates are:
    • constantly changing
    • clear Bounded Context
    • conflicting resource requirements compared to monolith as whole
    • presentation tier

Implementation

Here are some general points:

  • Interfaces
    • Need to select your Bounded Contexts really carefully: too small and too much coupling with very difficult code sharing issues too, too large and you don't have a coherent set of concepts.
    • You will end up needing to group related services: some degree of hierarchy will be invaluable in enabling people to make sense of things.
    • Services get exposed through client libraries, and the consumers can then build their own layers to orchestrate calls in the way that works for them.
    • Serialisation matters: latency isn't on the wire but from serialising/deserialising. Netflix use all of Google's protocol buffers, Apache Thrift, Avro: Avro has turned out best for them.
  • Teams and politics:
    • DevOps is a must: developers need to learn operations and manage the services they write and deploy. This can really help with resilient code that scales because the developers gain production context knowledge and production responsibilities. It's easier to teach developers to operate their code than to teach the operators to look after the developers' code. It goes well with pair programming (so no-one is the sole person on pager duty).
    • Culture and politics are important: microservices shift the boundaries here. One thorny issue can be teams wanting to share implementation libraries across services.
    • none of the "build the bank" vs "run the bank" thing that I find so irritating, more of a continuous culture of change and development, with dev teams also responsible for the ongoing operational management of the services they originate.
  • Monitoring
    • Every service boundary call should inject a GUID into the request to enable tracking the communication within the system. This may end up with a large GUID tree as requests are split.
    • You will need to monitor the deployment environment too.
  • Languages: While Microservices Architecture gives more freedom on languages, don't allow completely unconstrained experimentation. Phil Calcado of SoundCloud refers to a "Cambrian Explosion" of languages: Perl, Julia, Haskell, Erlang, node.js, ... eventually the "bus factor" turned out to be too low and tools consolidated on JVM: JRuby, Clojure, Scala - with some Go and Ruby for infrastructure and tooling.

Need to think hard about the stack for RPC, resilience, concurrency: Netty, Neflix, Finangle are some alternatives: Soundcloud had good experience with Finagle (including adapting it from Twitter's core heavy Thrift usage to use more HTTP).

Still potentially needing a section

  • Multi-lanugage APIs: Google's protocol buffers vs Apache Thift vs Avro
  • Stacks: Netty, Netflix, Finangle
  • CQRS
  • Event Sourcing

Virtualisation

Specific writeups:

APIs

Specific writeups:

Messaging

Specific writeups:

Why Messaging?

Messaging is a core concept for Microservices. Messaging is an asynchronous model, as opposed to the more synchronous RPC. The asynch approach is more complicated, but ultimately the act of communication between services is complicated enough that it is justified to treat it as an entity in its own right.

More specifically, while RPC appears seductive in its relative simplicity, it has the following drawbacks:

  • Non-Local Exceptions: one needs some way of marshalling server-side error information back and enabling the caller to distinguish between local and remote exceptions
  • Indirect Memory Allocation: what if an RPC call takes an array, and the client passes one with a million entries, causing the server to OOM?
  • Blocking Calls: the synchronous mode of RPC calling means one needs to put calls on threads to take care of servers potentially blocking, raising the possibility of thread starvation
  • Static interfaces: the RPC call will need to match interfaces between client and server, causing problems when trying to evolve

The extra indirection that introducing messaging gives rise to allows these problems to be solved in whichever way is best for the context.

See here for more on these problems: The case: RPC vs. Messaging

Points To Capture

Further Reading