We’re building the Financial Platform for the world. We have over $500 billion worth of assets on our platform that translates to 1.4 million nodes and 6.3 million edges in our financial graph from 268 thousand accounts. Every day we process hundreds of gigabytes of data from a few hundred sources and our API server handles 2.5 million requests every day for thousands of users. With our quickly expanding client base, these numbers are also growing rapidly.
In this post, we will go through the tech stack that we’ve used so far to build Addepar and handle these challenges.
Our Tech Choice Philosophy
At Addepar, we use the right tool for the job. We would never choose a technology, library, or language just because it’s sexy.
We’re careful and deliberate about introducing new technologies. Supporting multiple tools is expensive. We consider everything from the ongoing maintenance and operational costs to the effort required to develop and implement best practices, code standards, and tooling across many languages. Consequently, you can see from the description below that, for the most part, we try to keep the stack as simple as possible, but no simpler, for our problem domain.
Current Tech Stack
We use Amazon Web Services. We incorporate many AWS features into our system architecture, but we primarily stick to the generic ones (like software-defined networking and elastic scaling of resources) to avoid relying on Platform as a Service offerings for our mission-critical systems. Doing so allows us to be provider-agnostic while leveraging AWS Infrastructure as a Service solutions.
The Addepar platform is a sophisticated, service-oriented system that’s capable of processing hundreds of thousands of financial accounts daily. We have a robust data pipeline for ingesting and transforming custodial and vendor data input, a normalization layer that enforces strong guarantees about our financial data, and a computational stack that drives our realtime web-based analysis views and comprehensive report generation tools.
Languages, Build Tools & Configuration Management
For simplicity and to enable developer productivity, we’re firmly committed to developing all of Addepar’s services in the most recent version of Java (currently Java 8).
We use Facebook’s Buck build tool. We love that Buck includes support for incremental builds, TestNG support (our unit test library of choice) with test result caching, 3rd party dependency management, and native python support.
Python is our primary scripting language: we reach for it to quickly solve lightweight problems, often related to automation of our build and test infrastructure, ad-hoc data normalization, performance testing, and API development. There’s even an open source Python library created by one of Addepar’s clients that wraps our publicly exposed REST API in an easy-to-use format.
We also use Python for configuration management. Specifically, we’re currently leveraging the Salt ecosystem, which is written in Python and uses a great deal of the commonly used python libraries like Jinja. Salt acts as a distribution mechanism, propagating changes from the dedicated Git repository that stores our essential configuration settings onto our infrastructure.
We rely on HashiCorp’s Vault infrastructure for our critical secure configuration settings such as passwords, API keys, and SSL certificates.
Data Storage & Web Technology
Addepar receives and processes a wide variety of data in a multitude of formats every day. This complex and somewhat unwieldy dataset, when combined with our goal of persisting every piece of data Addepar has ever received, makes tracking and auditing complicated. Currently, to keep things as simple as possible, we persist daily custodian feed data in MongoDB. We store the data in both the original form it was received and in the transformed form produced by our data processing pipeline. This enables us to easily go back and ‘reprocess’ a particular piece of data if needed.
Once data is processed and normalized by our pipeline, we persist it in a finalized form in MySQL. For our most frequently accessed financial data, we have a custom in-memory cache that is populated and synced to any data written to the database. Right above this cache sit our distributed web servers, which are responsible for serving any public facing HTTP requests from both our web app and our public REST APIs. We heavily lean on Jetty and Jersey for any use case requiring an HTTP interface, including our publicly facing endpoints. Finally, above the server layer, we utilize NGINX for load balancing, fault tolerance, and zero downtime deployments.
Messaging & Distributed Coordination
Since we abide by a service-oriented architecture in the backend, inter-process communication is critical. Addepar has invested fairly heavily in Kafka, particularly for persistent message queues, data replication, and fault tolerance.
Kafka’s coordination power is built on Zookeeper – a highly reliable distributed coordination server that’s also a natural supplement to Kafka. We use Zookeeper’s Curator library to solve advanced problems like distributed locks and service naming.
We’re committed contributors to the Ember community. We’ve developed a number of open source projects, including Ember Charts, Ember Table, and Ember Widgets, that we also use actively in our application. Because ours is an Ember application, our templating language is Handlebars/HTMLBars and some of our dependencies are in jQuery/Lodash. We use Ember Qunit as our testing framework, which lets us develop rapidly while ensuring a stable experience for our users. To deliver our powerful data visualizations, we use the D3 library. And we are currently using SASS to bring the great Addepar design language to bear on the application.
While this is the current stack that our system runs on, we’re always looking for ways to improve it. If you have any suggestions for our team, or if you’d like to learn more about engineering at Addepar, please email us at email@example.com!