It’s been a while since I published anything. While I worked at AWS there was a corporate PR muzzle that meant I was barred from saying much of anything, let along interesting things. I’ve also been quiet this past year while working on a new project, Archodex. It’s not ready to show off publicly, yet, but if you need help finding where your secrets are used across your workloads or anything else it can help you figure out in your platform environment, do get in touch. (You can always reach me at chasedouglas@gmail.com, don’t be a stranger if I can be of any help!)
In between things I built a POC side-project to scratch an itch and learn more about AWS Aurora DSQL. I’m a big believer in the value of a horizontally scalable relational database, and I wanted to find out how well it worked. (If you came here specifically for DSQL takes, you can jump straight to them here.)
Motivation
My password manager has been bugging me. It’s become a bit onerous to manage sharing items with family, and it felt like it got really bloated/slow recently. It turns out there was a good reason it got slower1, but speed wasn’t my only issue, and it got me wondering what I could do to solve my daily annoyances with it. I wanted my own password manager solution that worked how I wanted it to. And it turns out I can have my cake and eat it to!
Password managers aren’t new. Bruce Schneier created the first password manager in 1997, and although the tech has changed considerably since then, password managers should be considered a commodity rather than a luxury good at this point. It turns out there is such a “commodity” password manager solution via the combination of the official Bitwarden clients (e.g. the their browser extensions, iOS and Android apps, and web app) + the community open source Vaultwarden backend service for cross-device syncing.
You might be thinking: Hang on, I’m going to trust a community open source app with all my passwords? This concern is mooted by the fact that all encryption and decryption occurs inside the official Bitwarden clients. The API service, which is what Vaultwarden provides, merely stores and retrieves already encrypted items. In theory, you could even make your vault database and files public because they cannot be decrypted without your master password.
But also, I wanted to play with the new Amazon Aurora DSQL service, which promises a truly serverless2 PostgreSQL-comptible(-ish) database :).
A Serverless Password Manager Backend
Having my own password manager with cross-device syncing requires operating a cloud service. But I don’t want to deal with the typical operational headaches of managing an always-on service. I previously built a company to help people run cloud apps with less operational headaches using serverless services. I wondered if I could find a password manager with a serverless backend.
Enter Vaultwarden. Vaultwarden is a reimplementation of the official Bitwarden backend API service. What makes it ideal here is that it is written in Rust. Rust software both initializes and executes fast. This is important because serverless APIs (especially low-throughput APIs like a personal Vaultwarden service) often execute with cold starts, where response latency includes process initialization.
Vaultwarden Serverless Architecture
The current Vaultwarden implementation is architected as a stateful single-node service. It saves vaults and other files directly to the filesystem, serves the official Bitwarden web app assets (rebranded as “Vaultwarden” assets) from a directory on disk, and connects to SMTP servers or uses a local Sendmail installation to send emails. And it uses a relational database (Postgres, MySQL, or SQLite) for account records and to store encrypted items. These would all be a pain to operate and maintain long-term as a personal project, and there are AWS Serverless services for each component that greatly reduce this operational toil. Inserting these services results in the following Vaultwarden Serverless architecture:
CloudFront CDN
├─ API Lambda Function
│ ├─ Data S3 Bucket (for encrypted file attachments, etc.)
│ ├─ Aurora DSQL Database
│ └─ Amazon Simple Email Service (SES)
└─ Web App Assets S3 Bucket
You can find the code for this Vaultwarden Serverless POC here.
AWS Lambda for the API Service
AWS Lambda is used for the API service. Personal password manager workloads don’t hit the API frequently (vaults are cached client-side), so we can give our Rust function a ton of memory and CPU to reduce execution time while still staying within the monthly free-tier usage limit. We use the Lambda Function URL feature with the Lambda Web Adapter so we don’t have to modify any code to make the API just work inside a Lambda Function.
Amazon S3 for Data and Web Assets
Amazon S3 is the OG serverless service3. Although folks will tell you it is not a filesystem, it can be used to replace a filesystem as a general store of files. For example, instead of saving vault item attachments to /app/data/attachments/<encrypted name> on the local disk, we can save it to s3://<data bucket>/attachments/<encrypted name> in an AWS S3 Bucket. Similarly, instead of serving web assets from /app/web-vault/<asset filename>, we can serve them from s3://<web asset bucket>/<asset filename>. There is also support for streaming files from S3 via presigned URLs, meaning our API Function is left mostly to process API requests rather than act as a proxy for file downloads.
Amazon Simple Email Service (SES) for Sending Emails
Amazon SES is used for sending email. Not because sending email is hard, but because sending email that won’t go straight to spam folders is hard. After SES is configured for sending email from our domain with DKIM, SPF, and DMARC, we can then use it to send email on our behalf.
Amazon CloudFront to Combine Routes and Serve from a Custom Domain
Amazon CloudFront can be fiddly to work with, but solves a couple challenges for us. It routes web app asset requests straight to our web assets bucket and routes API requests to our Lambda Function URL. CloudFront also makes it easy to serve the backend from a custom domain.
Amazon Aurora DSQL for the Database
We finally come to Aurora DSQL, AWS’ first truly-Serverless relational database offering. Vaultwarden can speak PostgreSQL. But can it work with DSQL, and how much effort will it take?
How Difficult is it to Refactor a Workload for DSQL?
Aurora DSQL is a new service in Preview, meaning most functionality is present and working, but there are both known and unknown issues, some of which will be resolved before GA. The feature scope for GA can also change, meaning we can’t fully know for sure which features will or will not be supported by then. In rare cases, Preview functionality has been removed before GA due to security or operational concerns. Altogether, this means insights noted here are based on a point-in-time snapshot of service functionality that may or may not be representative of the service at GA. With that caveat, here are my key learnings from refactoring Vaultwarden to use DSQL:
Connecting / Authenticating
DSQL provides a PostgreSQL-protocol-compatible endpoint, meaning any existing PG client should be able to talk to it. However, DSQL only supports TLS-encrypted connections authenticated via temporary credentials that are valid for only 15 minutes. AWS is absolutely making the right decision here in enforcing authentication and encryption best-practices, but it may not be easy to refactor existing applications and libraries to use this approach. Vaultwarden uses the synchronous Diesel crate for database queries, and adding temporary credentials for authentication required a decent amount of new and complex code. Connection encryption and authentication alone will require significant application developer involvement in order to adopt DSQL for existing workloads. Even if a workload uses Amazon Aurora PostgreSQL with IAM authentication today, it will need to be modified to use the new DSQL SDK to generate signed credentials. You cannot simply swap in an Amazon Aurora DSQL endpoint for an existing Amazon Aurora RDS endpoint.
Migrations
DSQL only supports one DDL statement per transaction. It is very common for existing migrations to contain multiple DDL statements, however. For example, a migration may move data from columns in one table to a new table by creating the new table, copying the data to the new table, then deleting columns from the original table. The creation of the new table and the deletion of columns from the old table involve (at least) two separate DDL statements. You also want this entire migration to occur within one transaction to ensure either all the data is copied into the new table, or the database is left in its original state. We don’t want to end up in some half-way state with a new table and some partially copied and/or partially deleted data spread across two tables.
This is a significant cause for concern, but one that is not necessarily a deal-breaker. It means that migrations have to be performed with even more attention, and that migrations should be written such that an operator can manually complete or revert individual migrations if they fail (for any potential reason) part-way through. The tradeoff is that while DSQL should make day-to-day operations easier (e.g. because it can horizontally scale, it should have less potential for failing at larger data set sizes), it may make migrations more challenging. At least migrations happen at known points in time instead of other kinds of failures that can happen at random times.
Foreign Keys
DSQL does not support Foreign Keys (FKs). This is a more commonly known DSQL limitation, and was the first concern I audited the Vaultwarden codebase for. Many database schemas define FKs to ensure data consistency. For example, a schema could use FKs to ensure that when a user account record is deleted all the other records associated with the user are also deleted. FKs are also often used for upsert (i.e. insert-if-not-exists-otherwise-update) functionality, though there are other ways this can be accomplished.
Whether FKs are a concern for your workload depends on its schema and existing queries. In Vaultwarden, FKs are defined for all database backends (PostgreSQL, MySQL, and SQLite), and are used in MySQL and SQLite queries for upsertion, but are not used in PostgreSQL queries. The Vaultwarden 2FA database model save()
function demonstrates an example of this implementation difference.
To make Vaultwarden work on DSQL I needed to remove all FK definitions from the schema. I accomplished this by initializing a local PostgreSQL database for Vaultwarden then running pg_dump
to get all the table definitions. I then searched and removed all FK definitions. Lastly, I saved this schema as one large initial migration.
Adding Columns with Default Values or Constraints
Later in development I rebased my changes on top of the latest changes from upstream. The upstream changes included a migration that added a column to a table. This is one of the most trivial kinds of migrations that can be performed, as no existing data was touched. However, even this simple migration caused a headache.
It turns out that DSQL cannot add columns with default values or constraints. This isn’t clearly and explicitly stated anywhere in the documentation. The only mention of it is within a page that ostensibly describes the “supported” subsets of PostgreSQL commands. The Supported PostgreSQL features in Aurora DSQL page currently lists the following as a supported operation:
Did you catch the issue when your eyes scanned this row of information, which appears at the end of a long table of “supported” statements in the AWS documentation? Let me give you a hint by providing the syntax of the ADD COLUMN
clause from the official PostgreSQL docs:
ADD [ COLUMN ] [ IF NOT EXISTS ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
...
and column_constraint is:
[ CONSTRAINT constraint_name ]
{ NOT NULL |
NULL |
CHECK ( expression ) [ NO INHERIT ] |
DEFAULT default_expr |
GENERATED ALWAYS AS ( generation_expr ) STORED |
GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ] |
UNIQUE [ NULLS [ NOT ] DISTINCT ] index_parameters |
PRIMARY KEY index_parameters |
REFERENCES reftable [ ( refcolumn ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ]
[ ON DELETE referential_action ] [ ON UPDATE referential_action ] }
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
The problem is that the DSQL ADD COLUMN statement cannot have constraints like a DEFAULT value or a requirement that the value be NOT NULL let alone all the other potential constraints one might need.
I want to be clear that my main concern is not that you cannot add a column with constraints after a table is created. That is a real and significant concern, and I sincerely hope they address it by GA as it is a very common element of database schemas. Instead, my main concern is that this needs to be documented much better. AWS will quickly burn a lot of good will if they confuse customers into thinking their workloads can operate on DSQL, and only find out later on that they can’t. The DSQL docs have two pages where people are likely to look for issues using the service with existing workloads: Unsupported PostgreSQL features in Aurora DSQL and Known issues in Amazon Aurora DSQL. But this significant ADD COLUMN limitation cannot be found on either page, nor is it clearly stated in the “supported features” page even though the statement is listed there.
Both the underlying ADD COLUMN limitation and this documentation issue give me significant concerns in using DSQL for critical workloads.
Conclusion
There is a lot to love about this Vaultwarden Serverless concept. There are also significant concerns with DSQL, which is a critical piece of the solution. How do we reconcile the concerns with the benefits?
I’ll start first with workload-agnostic takeaways. The core Lambda + S3 + CloudFront + SES components are rock-solid. They can also be used with any off-the-shelf database backend, including other AWS solutions like Aurora Serverless. But based on the learnings from this POC it would be hard to recommend DSQL for most workloads due to:
The inability to perform multiple schema changes in a single transaction
The lack of support for basic functions like adding a column with constraints
Poor information architecture within the DSQL docs that make it hard to assess all the limitations relative to upstream PostgreSQL up front
However, my takeaway for Vaultwarden is more nuanced. The database requirements are well defined (e.g. no current usage of Foreign Keys beyond their definitions in the schema) and the data can be easily backed up via pg_dump. Further, the Bitwarden clients cache database-stored items locally and can re-export them all without needing access to the API service. If my API service went down, I could still migrate all my items to the official Bitwarden service (or other third-party service) from an export from any one of my clients. I also feel comfortable operating on top of DSQL for my own personal use, knowing that I can dig into the details and address issues if they arise. But I also wouldn’t recommend others deploy Vaultwarden using DSQL unless they can do the same.
That said, I love the promise of DSQL, and I hope that over time many of these limitations will disappear. It would be great to have a truly serverless relational database that worked for the majority of existing and new workloads.
I’ve since learned that most password managers increased the default number of PBKDF2 iterations they use to generate the vault decryption key from your master password, which literally makes it take longer to unlock a vault. The time it takes to unlock your vault is proportional to the difficulty of brute force attacking a vault, and CPUs/GPUs are faster than ever, so increasing the default number of iterations was a good thing. That said, there is a new algorithm on the block, Argon2, which prevents brute force attacks via larger memory space usage rather than just CPU cycles, meaning vaults unlock faster. My password manager didn’t support Argon2, giving me another reason to look elsewhere.
AWS coined the term “Serverless” with the launch of Lambda, where the meaning included “instantaneously scales up and to zero.” Other orgs within AWS got jealous of Lambda’s adoption rate and decided to slap “Serverless” onto new offerings that merely autoscaled, slowly and oftentimes not to zero. /me looks side-eyed at Aurora Serverless, which lacked scale-to-zero until recently, and even then takes up to 15 seconds to resume from zero.
Depending on who you ask. SQS launched in beta first, S3 reached GA first.