12 Comments

The timestamps one is interesting. Even if the UTC integer is the canonical representation of a time I think it still makes sense to record the timezone for the user/operation and what was shown on the users screen. There are all kinds of situations where someone auditing the books will want to know the local time and not just the time – for instance if a tax changes at midnight but you have multiple timezones to consider or if you want a month/quarter/year view of the data.

Can be represented as a tuple (UTC timestamp, timezone name, ISO-8601 representation in local time)

Expand full comment
author
Jul 11·edited Jul 11Author

Thank you for the comment! I should have been more clear in the post which is totally my fault.

My recommendation is centered around the main canonical timestamp in your event level data. I think it is okay to have additional timestamp fields that derive from the integer timestamps in other formats as you have proposed, but I think the source of truth should always be the integer timestamp. Your example of representing it as a tuple, is a perfectly reasonable solution for views of your underlying dataset. I believe the event level data set (at the lowest granular level) should prefer integer timestamps, and can optionally have additional time related fields derived from the integer timestamp. Thank you again for commenting, and hope you found value from my post!

Expand full comment
Jul 13·edited Jul 13Liked by wasteman

Interesting read, thanks for sharing!

Totally agree with the timestamps, daylight saving bugs are super annoying. I remember we stored date and time just as datetime in the local timezone (pacific time) and had to run a script to convert 2 times a year when it was daylight savings - ugh

For currencies amounts, always store as integer or big integer in the micros (see forex notations) to avoid rounding errors altogether.

Use an event store for immutability (timescale DB for example), blockchain is usually unnecessary unless it's an untrusted entity, it's ok to be public to everyone and you are fine increasing your storage cost by 1000x :D

Another recommendation for systems like that is to thoroughly test everything with automated tests, 80% test coverage won't do it here, aim for almost 100% and cover all happy, error and edge cases as much as possible.

Expand full comment

I am confused by idempotency

Let’s say your system is a web app

Your frontend (regardless if it’s a html form or react app) is the one creating the uuid right?

Then using session storage to store and reuse the uuid

This helps persist the uuid across page refreshes

Then the backend sends a clear success so ask to clear the stored uuid in order to generate a new one

This is what you meant right?

Expand full comment
author

This is specific to financial events. So in your example, the session_id is not granular enough to uniquely identify a financial event unless your frontend app only allows one financial event per session (which is possible but unlikely). Instead you need to model your financial events in a way, that you system can uniquely identify each financial event and process them to create accounting entries. If your system accidentally sends the same event twice (e.g bad internet, internal message queue has accidental duplicates, etc), if your code is idempotent we can guarantee that the event was only processed once.

Expand full comment

So what I understand from your comment is that your business system that generates the financial events in the diagram, you define financial events to be ONLY events generated by the system, yes?

Assuming your business system is a web app or has web forms, then your user who sends a payment via a HTTP POST on the frontend of the business system and accidentally sends twice, that's handled by the business system. Only when the payment is processed, and properly stored, then it will fire off ONE and ONLY ONE financial event to the sub-ledger, yes?

Expand full comment
author

One clarification I would make, is that I would break your financial events into even more granular levels. One for placing the order, one for the payment pending, one for payment processed, and potentially another to confirm cash has actually been received in your bank account. All of these events are necessary for a clear audit trail of what occurred in your business.

An example of a valid way of processing these events is as so: 1) Placing order = debit Accounts Receivable, credit Revenue 2) Payment processed and cash received = debit Cash, credit Accounts Receivable. Those other financial events are still necessary for debugging and having an audit trail, but don't necessarily have accounting entries tied to them.

But yes to the original point, every financial event can be processed only once. Even if there are accidental duplicate records sent, your system should be able to dedupe the event and check that we already processed this event before.

Thanks again for reading my post, and I hope that helps! Feel free to DM if you have more questions as well!

Expand full comment

Yes i intend to DM to clarify my understanding of what you’re saying in idempitency

Probably later this week.

Expand full comment
Jul 11Liked by wasteman

> Micros of a currency = smallest unit * 1,000,000. E.g $1.23 = 1,230,000 micros.

Could you elaborate on "smallest unit" definition here?

Expand full comment
author
Jul 11·edited Jul 11Author

Ahh good question, after reading this over I realized my statement is totally misleading and incorrect. I probably should have said something like "base currency". So for example dollar is the base currency for USD. "Smallest" is incorrect here because we also have cents, but it is not the base currency of USD.

Thanks for pointing this out, I will update the post!

Expand full comment
Jul 11Liked by wasteman

The high level data flow implies that the GL is a summary of multiple subledgers. In fact the subledgers are an explanation of the GL. Leaving aside the consolidation of multiple GL's at corporate level, and focusing on one business entity/GL then financial software needs to commit the source transaction, the subledger entries and the GL double entries simultaneously with integrity and without failure. Whilst I agree with most of your comments on granularity, the recent exposure of the UK Post Office accounting failures was exacerbated by the arbitrary alteration of granular data. A GL should never be re-cast from granular data - to do so is to admit that the GL posting software is fundamentally flawed.

Expand full comment
author

Hi there, thanks for taking the time to read my post!

Sorry if my diagram caused confusion, it did not mean to imply the consistency relationship between the subledger(s) and the GL. Rather an example of data flow that I have implemented at multiple tech companies, but not the only way of doing things! Committing to both the GL and Subledger transactionally is a valid way of implementing an accounting system. But not the only way, as abiding by immutability of data and guaranteeing exact once processing can lead you to the same results.

And regarding your example, if I am understanding this correctly granular data was arbitrarily altered, meaning it was mutated? In that case, they would be breaking the principle of immutability I discussed in the post. I very much agree that the GL should never be "re-cast" as in we don't delete and reinsert data into the GL. Rather any modifications need to be new events (like reversals and re-entries as described above).

Thank you again for commenting, and I hope you found value in my post!

Expand full comment