<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[wasteman.codes]]></title><description><![CDATA[A blog about the foolishness I encounter, trying to get computers to do the things I want them to. ]]></description><link>https://substack.wasteman.codes</link><image><url>https://substackcdn.com/image/fetch/$s_!kcpi!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0025d321-926d-4cf9-94cc-4346ec86411a_268x268.png</url><title>wasteman.codes</title><link>https://substack.wasteman.codes</link></image><generator>Substack</generator><lastBuildDate>Mon, 20 Apr 2026 11:35:58 GMT</lastBuildDate><atom:link href="https://substack.wasteman.codes/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Wasteman]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[wasteman@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[wasteman@substack.com]]></itunes:email><itunes:name><![CDATA[wasteman]]></itunes:name></itunes:owner><itunes:author><![CDATA[wasteman]]></itunes:author><googleplay:owner><![CDATA[wasteman@substack.com]]></googleplay:owner><googleplay:email><![CDATA[wasteman@substack.com]]></googleplay:email><googleplay:author><![CDATA[wasteman]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Correct code isn’t enough]]></title><description><![CDATA[Thoughts on coding agents, after letting them write all my code for months]]></description><link>https://substack.wasteman.codes/p/correct-code-isnt-enough</link><guid isPermaLink="false">https://substack.wasteman.codes/p/correct-code-isnt-enough</guid><dc:creator><![CDATA[wasteman]]></dc:creator><pubDate>Fri, 10 Apr 2026 01:50:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kcpi!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0025d321-926d-4cf9-94cc-4346ec86411a_268x268.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Since the summer of 2025 close to 100% of my code has been written by various coding agents, and I wanted to share some of my experience, and the lessons I have learned as I have become better at using them.</p><blockquote><p><strong>Wasteman&#8217;s note: </strong><em>I will refer to coding agents interchangeably with &#8220;codex&#8221; in this post, as it&#8217;s the only coding agent I use today.</em></p></blockquote><h1>A harness is not enough</h1><p>When I first started using agents I was skeptical of the quality of the code it would output, so I was pretty meticulous about reviewing code line by line, and slowly adding more rules to make sure codex doesn&#8217;t repeat mistakes. The problem I was solving here was, &#8220;can I get the agent to write <strong>correct</strong> code&#8221;, facilitated by this harness of rules, and good tests. Overtime the harness I added started to be really effective, and I could reliably get codex to one shot tasks.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.wasteman.codes/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading wasteman.codes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>I started to trust codex enough that I would barely review the code and perform very light UX validations before pushing code through. At some point however, the quality of codex&#8217;s output started to degrade when I wanted to make changes, especially if it was something fundamental about my app. Since I ignored the code for a sufficient amount of time, I was not well equipped to debug through all the slop that had accumulated.</p><p>We all have that experience with a manager who knows nothing about your system and is always asking you stupid questions about why things take so long, and asking &#8220;shouldn&#8217;t that task be simple?&#8221;. And I realized I had become that manager, except my employees were a bunch of codex agents. I didn&#8217;t understand the code or system enough to ask the right questions and guide codex to write the code the way I wanted it to.</p><blockquote><p><strong>Wasteman&#8217;s note</strong>: <em>Candidly I also became this type of manager during my short stint in management, so I&#8217;m not surprised it happened again with non human agents. Probably more evidence of why I should stay an individual contributor.</em></p></blockquote><p>The harness I built solved the narrow problem of, can it write &#8220;correct&#8221; code exactly as specified in my unit tests and rules. But I am left with two new problems</p><ol><li><p>Does codex have enough context to infer what the definition of correctness is given a new task?</p></li><li><p>Does it have context on &#8220;why&#8221; we made decisions in the past, and does it have any intuition for what is the &#8220;right&#8221; thing to do for our domain?</p></li></ol><h1>Context is more than code, and its more than documentation</h1><p>One of the ways I tried to solve this context problem, is by having in repo documentation about the &#8220;why&#8221; we did things with the hope that codex would read through this and avoid falling down common pitfalls. It did improve performance for a time, but no matter how rigorous I was in adding decision logs and documentation in code, I eventually ran into the same problems again.</p><p>The two fundamental problems with this approach I observed are</p><ol><li><p>The more context documentation you provide, the more likely codex will forget it when it compacts. So even if the information is there, codex may not follow it because it forgot.</p></li><li><p>Human language is fundamentally a lossy form of communication</p></li></ol><p>I think (1) can be solved with smarter models, but I think (2) is actually a fundamental problem when working with models that can&#8217;t continually learn (I am referring to continual learning as the actual weights of the model changing not just a KV cache in memory). When we speak and write, we aren&#8217;t just regurgitating saved thoughts in our brain. Language is just the interface into our minds where a much deeper representation of our knowledge exists. But there is so much knowledge in our brains we either don&#8217;t know how to express, or there are no words to describe it.</p><p>From Peter Naur&#8217;s essay <em><a href="https://gist.github.com/onlurking/fc5c81d18cfce9ff81bc968a7f342fb1">Programming as Theory Building</a></em></p><blockquote><p>A main claim of the Theory Building View of programming is that an essential part of any program, the theory of it, is something that could not conceivably be expressed, but is inextricably bound to human beings</p></blockquote><p>And I think this is what I am getting at, when I say that context is more than code and documentation. Humans have the ability to represent knowledge in a deep representation that cannot be mapped 1:1 purely in human language. Peter Naur would say humans can &#8220;build a theory&#8221; of the system, which you cannot simply codify in markdown files. Until AI labs give us a model that can continually learn, this will be be a problem.</p><blockquote><p><strong>Wasteman&#8217;s note</strong>: <em>I don&#8217;t agree with Naur&#8217;s statement that this is inextricably bound to human beings, but I think he is correct that you need an agent that has a deeper understanding of the system than the code and documents can provide.</em></p></blockquote><h1>Nothing replaces understanding the code</h1><p>It&#8217;s easy to get enamored with using coding agents in an imperative way, &#8220;Build me X&#8221; without any strong opinions on <strong>how</strong> codex should build it. But at some point the code will become complex enough that codex cannot efficiently solve the problem on it&#8217;s own with its limited context window. This is where your understanding of the system is essential to getting the most out of your agent.</p><p>The better you understand the code, the better questions you can ask and the better intuition you have on whether codex is solving the problem the right way or not. This is precisely why so many of us have observed that senior engineers are the ones that have benefited the most from AI. They know enough to ask the right questions, and have built years of intuition that they use to guide the agent.</p><h1>Humans are the source of context</h1><p>In my view, the missing piece of the loop to get a truly autonomous coding agent is an agent that can continually learn and share context forward to new subagents completing subtasks. With today&#8217;s models, we have to be this agent. We have deep representations of the systems we have built, compacted in a much more intelligent way than a KV cache of weights of tokens. So we have to be the ones injecting our opinions and &#8220;taste&#8221; about what the right problems to solve are, and the right ways to solve them are.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.wasteman.codes/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading wasteman.codes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Engineering Principles for Building Financial Systems]]></title><description><![CDATA[Best practices and principles to create accurate and reliable software based financial systems.]]></description><link>https://substack.wasteman.codes/p/engineering-principles-and-best-practices</link><guid isPermaLink="false">https://substack.wasteman.codes/p/engineering-principles-and-best-practices</guid><dc:creator><![CDATA[wasteman]]></dc:creator><pubDate>Wed, 10 Jul 2024 21:00:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F79222ec2-7c6d-4454-8834-6a0cf6fb90fa_268x283.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Accounting hasn't fundamentally changed in the past couple of hundred years. Despite this, there is a lot of confusion around the right way of building software for financial systems. </p><p>In this post, I&#8217;ll share lessons from my years working on financial systems at big tech companies. Our focus will be building an accounting system, but the principles apply to more general financial systems as well.</p><p>This post will go over the following</p><ol><li><p>Basic financial definitions relevant to the post</p></li><li><p>High level goals of an accounting system</p></li><li><p>Engineering principles to achieve those goals </p></li><li><p>Best Practices </p></li></ol><h1>Definitions</h1><ul><li><p><strong>General Ledger (GL)</strong>: The primary accounting record of the company, summarizing all financial transactions over a specific time period. You can think of this as an aggregation of it's corresponding sub-ledgers.</p></li><li><p><strong>Sub-ledger</strong>: Contains detailed information about all individual transactions related to a specific GL. Records in the sub ledger will have much more granular data then the general ledger, like who the specific customer is, specific line items in an order, etc. The difference in data between the sub-ledger and GL will depend on the type of business and volume of data you are working with. Some small businesses can get away with not having any sub-ledgers at all, but it is doubtful that they would ever need custom software to manage something that is so low in scale.</p></li><li><p><strong>Financial Record: </strong>This refers to the general ledger and sub-ledgers. </p></li><li><p><strong>Material: </strong>Materiality refers to whether misstatement of information in your financial statements would impact a reasonable stakeholder&#8217;s decision making statements. Note that this definition is somewhat ambiguous by design, as different businesses have different materiality thresholds. For example what might be material for a business making $250,000 of revenue per year, will not be material for a business making $1 billion in revenue. From a design perspective, the main value of this concept is to classify different categories of financial data.</p></li></ul><h1>High Level Data Flow</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lq1D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lq1D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png 424w, https://substackcdn.com/image/fetch/$s_!lq1D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png 848w, https://substackcdn.com/image/fetch/$s_!lq1D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png 1272w, https://substackcdn.com/image/fetch/$s_!lq1D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lq1D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png" width="378" height="837.2596153846154" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:3225,&quot;width&quot;:1456,&quot;resizeWidth&quot;:378,&quot;bytes&quot;:79366,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lq1D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png 424w, https://substackcdn.com/image/fetch/$s_!lq1D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png 848w, https://substackcdn.com/image/fetch/$s_!lq1D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png 1272w, https://substackcdn.com/image/fetch/$s_!lq1D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d4c2233-1b32-409a-9d0d-281470ab3469_1767x3914.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h1>Goals</h1><p>The three main goals of your accounting system are to be <strong>(1) Accurate, (2) Auditable and (3) Timely.</strong></p><h2>Accurate</h2><p>The financial record needs to reflect the known state of the business. This statement is a little broad and up to some interpretation so I will give some real examples.</p><p>If we sell a 10 units of a product that costs $9.99, the corresponding financial records must add up to $99.90. This seems obvious but when you are aggregating thousands (in a lot of cases millions) of transactions, simple summation or rounding errors between systems can cause material inaccuracies. </p><blockquote><p><strong>Wasteman&#8217;s Note: </strong><em>People say naming is the hardest problem in computer science, I would say a close second is addition. After working on large scale financial systems for the past few years, I can&#8217;t remember how many times the smallest bugs caused large discrepancies in our data. Also don&#8217;t get me started on summations over floats. I learned the hard way why you should always use integers.</em></p></blockquote><p>The financial record also needs to be <strong>complete</strong>. More specifically, both the sub-ledger and the general ledger are a complete representation of all business activities that occurred at a specific time. If there is an event that occurred but is not in the financial record, than the system is not complete. Note, that this doesn&#8217;t imply eventual consistency not acceptable. You just need to know when your data will become complete, to notify stakeholders that data has settled.</p><blockquote><p><strong>Wasteman&#8217;s Note: </strong><em>Another surprisingly really hard problem is guaranteeing completeness. As your system scales, data hops between many systems and at each hop data can easily be mutated or dropped by accident.</em> </p></blockquote><h2>Auditable</h2><p>Very related to accuracy, your financial record must be easily auditable so that stakeholders can detect errors and accurately measure performance of your business. And even if you don&#8217;t care, the IRS definitely does. </p><h2>Timely</h2><p>This one depends entirely on your business and it's specific needs. Small businesses can get away with just dumping all numbers near the end of the month, just in time to close the books. Larger businesses generally want to avoid this, and have a near real time system. This allows them to monitor financials within the month, make decisions based on financial data faster, and reduce the rush to close the month/quarter in the first few days of the month.</p><p>But whatever that need is, our accounting system should meet the needs of your business, and whatever <strong>timely</strong> means to them.</p><blockquote><p><strong>Wasteman&#8217;s note: </strong><em>People tend to get lost in conversations about batch vs streaming systems with respect to timeliness a lot. My take is that this isn&#8217;t an important distinction to make for most systems. If you care about super low latency cases within seconds to minutes, then this matters. But you would be surprised at how often I hear people arguing about which to do, when the consumer doesn&#8217;t need to see updates more than a couple times a day. Just because they asked for it doesn&#8217;t mean they need it. </em></p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.wasteman.codes/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Enjoying this post? Consider subscribing to my newsletter for free</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h1>Engineering Principles</h1><p>The three main engineering principles your accounting system should abide by are</p><ol><li><p>Immutability and Durability of data</p></li><li><p>Data should be represented at the smallest grain</p></li><li><p>Code should be Idempotent</p></li></ol><h2>Immutable and Durable</h2><p>This allows for auditability, which helps debugging and in turn accuracy. When data is immutable, you have a record of what the state of the system was at any given time. This makes it really easy to recompute the world from previous states, because no state is every lost.</p><p>Building on, once data is stated in the financial record it cannot be deleted. Any corrections to the system must be represented as a new financial transaction. For example let&#8217;s say your system had a bug and accidentally reported that a service was sold for $1000, when it should have been $900. To correct this mistake, you should first reverse the accounting entries corresponding to the mistake, and restate the accounting entry for the correct amount. </p><p>It will look something like this: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Li9B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Li9B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png 424w, https://substackcdn.com/image/fetch/$s_!Li9B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png 848w, https://substackcdn.com/image/fetch/$s_!Li9B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png 1272w, https://substackcdn.com/image/fetch/$s_!Li9B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Li9B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png" width="1456" height="1237" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1237,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:87028,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Li9B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png 424w, https://substackcdn.com/image/fetch/$s_!Li9B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png 848w, https://substackcdn.com/image/fetch/$s_!Li9B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png 1272w, https://substackcdn.com/image/fetch/$s_!Li9B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac99f71-fe2f-41b1-934e-6064f3ffb813_2336x1984.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So you can see that in the financial record, there is evidence that the balance of Accounts Receivable (AR) and Revenue was $1000 at some point, but was corrected later. Even though that balance was incorrect, we want an audit trail of what the balance was at any given moment.</p><h2>Data recorded at the smallest grain</h2><p>Similar to the above principle, this is also critical for enabling a clear audit trail. Even though financial reports and the general ledger are aggregated, they are computed from more granular events. When the data doesn't make sense, you need the most granular data to debug what might have been the issue. </p><p>Saving data at the lowest granularity also makes it really easy to correct data that is derived from that dataset. If a single immutable dataset is the core source of truth for all views of that data, to correct the view all you need to do is rerun the pipeline that creates that view after fixing your data. </p><p>Similarly when accountants are preparing to close the books, they reconcile account balances with all the transactions that occurred to validate that the books are accurate. When a discrepancy is discovered, you can dig into the exact transaction that might be causing the issue.</p><h2>Idempotency</h2><p>Every financial event can only be processed once, duplicates in the financial record will cause obvious inaccuracies. For that reason, all code that produces financial records should be idempotent.</p><h1>Best Practices</h1><p>Over the years, I have run into quite a few gotchas that have caused me a lot of pain. Below are best practices I recommend, to avoid the many pitfalls I have personally faced.</p><p><strong>Prefer integers to represent financial amounts</strong>. Makes arithmetic much easier. Certain decimal representations are okay, avoid floats at all cost.</p><p><strong>Granularity of your financial amounts should support currency conversions with minimal loss of precision</strong>. If you are only working with dollars, representing values in cents might be sufficient. If you are a global business, prefer micros or a decimal like <code>DECIMAL(19, 4)</code>(larger than 4 decimal places may also be used if necessary).  The decimal choice is quite popular among financial systems, but micros has been the standard for ads financial systems. This limits loss of precision when converting between currencies. </p><blockquote><p><strong>Wasteman&#8217;s note: </strong><em>Micros of a currency = base currency unit * 1,000,000. E.g $1.23 = 1,230,000 micros. I first came across this when working with Google&#8217;s metrics API.</em></p></blockquote><p><strong>Use consistent rounding methodologies</strong>. At scale the way you round can create material differences between expected amounts. For example one rounding methodology is to round all values 5 and up to the next significant digit, and 4 and below rounds down. Another valid way is to always round up. All that matters is you are consistent across the board. When you are dealing with millions of transactions, being off by 1 cent per transaction can lead to material differences. (10 million transactions off by 1 cent, leads to a difference of $100k). This may not be material to your business at this scale, but it&#8217;s material enough for the government to come after you for underpaying taxes.</p><blockquote><p><strong>Wasteman&#8217;s note: </strong><em>If you are a global business there can be a lot of gotchas with rounding and currency conversions. I would go as far as saying you should make a centralized library/service to handle both rounding and currency conversions. Different governments respect different rounding rules when calculating taxes, so having all these nuances abstracted into a single library/service will reduce complexity.</em></p></blockquote><p><strong>Delay currency conversion until the end of calculations</strong>. Preemptively converting currencies can cause loss of precision. Delay currency conversions until after aggregations occur in their local currency. </p><p><strong>Use integer representations of time</strong>. This one is a little controversial but I stand by it. There are so many libraries in different technologies that parse timestamps into objects, and they all do them differently. Avoid this headache and just use integers. Unix timestamp, or even integer based UTC datetimes work perfectly fine. The less data conversions that occur between systems, the better. (Read about Etsy&#8217;s own problems with timestamp types <a href="https://www.etsy.com/codeascraft/the-problem-with-timeseries-data-in-machine-learning-feature-systems">here</a>)</p><blockquote><p><strong>Wasteman&#8217;s note: </strong><em>I haven&#8217;t even talked about daylight savings related bugs. Using an incrementing integer can help you avoid this altogether. If you really insist on using datetimestamps, please at least use UTC. You would be surprised at how many very large businesses use non UTC timestamps.</em></p></blockquote><div><hr></div><p>Thanks for reading this post, I am sure I made a controversial statement somewhere but please feel free to comment and start a discussion. I am very open to learning and hearing other people&#8217;s thoughts. And if you enjoyed this post, consider supporting me by subscribing below!</p><p>And for further reading, here are some really good blog posts on accounting tailored towards software engineers.</p><ul><li><p><a href="https://drew.thecsillags.com/posts/2017-12-06-accounting-for-software-engineers/">Accounting for Software Engineers</a></p></li><li><p><a href="https://www.moderntreasury.com/journal/accounting-for-developers-part-i#does-accounting-really-matter-in-software-development">Accounting for Developers Part 1</a></p></li><li><p><a href="https://martin.kleppmann.com/2011/03/07/accounting-for-computer-scientists.html">Accounting for Computer Scientists</a></p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.wasteman.codes/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading wasteman.codes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Data observability and alerting in plain SQL]]></title><description><![CDATA[Patterns and practical examples of SQL based alerts for data pipelines]]></description><link>https://substack.wasteman.codes/p/sql-alerting-for-data-pipelines-a</link><guid isPermaLink="false">https://substack.wasteman.codes/p/sql-alerting-for-data-pipelines-a</guid><dc:creator><![CDATA[wasteman]]></dc:creator><pubDate>Wed, 12 Jun 2024 23:57:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0025d321-926d-4cf9-94cc-4346ec86411a_268x268.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I have seen quite a few resources on alerting patterns for traditional backend web applications, but haven&#8217;t seen many tailored to data pipelines. Even though the same principles apply, a large class of errors in data pipelines can only be detected by analyzing the resulting data. </p><p>A really simple way of monitoring your data pipelines is by using plain SQL queries, integrated with an alerting system. I started using this method years ago when I began working on stream/batch processing systems, and over time found it to be a surprisingly robust way of monitoring data.   </p><p>In this blog post, I will share some basic SQL alerting patterns that I have found useful, and hopefully, you all find useful as well.</p><h1>Data Staleness</h1><p>Some datasets are expected to be up to date for some period, and you will usually document this in an SLA for downstream consumers. For example, I worked on an advertising metrics system for many years that had an SLA of 24 hours for fresh data, and we needed to notify consumers if there were any delays. </p><p>A really simple way of alerting when we approach our SLA is to run a SQL query like this</p><pre><code>SELECT
    DATEDIFF(HOUR, MAX(event_date_time_utc), CURRENT_TIMESTAMP()) &lt; 12
FROM
    source_table;</code></pre><p>It returns a boolean on whether there is data in the table for events that occurred within the last 18 hours. If this query returns False, we will alert. Note that even though our SLA was 24 hours, we alerted our internal team well before then to give us time to push any fixes if need be. If we were certain we would not meet our SLA we would notify our consumers. It was standard practice for us to have these types of alerts on all of our consumer-facing datasets, all with different thresholds depending on the SLA.</p><h1>Data Duplicates</h1><p>This is a common alert, especially when working with OLAP data stores that do not enforce primary keys. Redshift, Snowflake, Iceberg and Deltalake are all various databases and table storage formats I have used that don&#8217;t have any primitives to enforce the uniqueness of data. For this reason, we needed some way of detecting duplicates early, and a really easy way to do this was with a basic SQL query. Below I will show three different methods I have used depending on the use case.</p><h2>Example 1: Existence of duplicates</h2><pre><code><code>SELECT
    COUNT(DISTINCT event_id) AS distinct_cnt,
    COUNT(*) AS cnt
FROM
    events
HAVING
    ABS(cnt - distinct_cnt) &gt; 0</code></code></pre><p>This query just counts the total records in the table and compares it with the distinct count of the unique key on the table. </p><h2>Example 2: Return the duplicate records</h2><p>If you would also like to know which records are duplicates the following query will return all event_ids that have duplicates.</p><pre><code><code>SELECT
    event_id,
    COUNT(*) AS event_count
FROM
    events
GROUP BY
    event_id
HAVING
    event_count &gt; 1</code></code></pre><p>Note that if the unique key for your table is not a single column, you can simply group by your entire composite key instead. (The same method can be used for example 1)</p><pre><code><code>SELECT
    key1,
    key2,
    ...
    keyn,
    COUNT(*) AS event_count
FROM
    events
GROUP BY
    key1, key2, ..., keyn
HAVING
    event_count &gt; 1</code></code></pre><h2>Example 3: Approximation for large datasets</h2><p>For very large datasets the above strategy may take a long time to compute. An alternative is to use APPROX_COUNT_DISTINCT as an approximation and alert when the difference between count, and APPROX_COUNT_DISTINCT is greater than the error threshold of your APPROX_COUNT_DISTINCT implementation. In this example, we are assuming the error threshold is 2.5% for this implementation. APPROX_COUNT_DISTINCT also takes an optional parameter that can reduce the error rate but also affects the performance of the query. Depending on the SQL engine you use, this function may be implemented differently with a different name. But all these engines use the <a href="https://en.wikipedia.org/wiki/HyperLogLog">HyperLogLog</a> algorithm under the hood.</p><pre><code><code>WITH counts AS (
    SELECT
        APPROX_COUNT_DISTINCT(line_item_id) AS distinct_count,
        COUNT(*) AS total_count
    FROM
        revinfra.ad_rev_rec_event
)
SELECT
    distinct_count,
    total_count
FROM
    counts
WHERE
    ABS(total_count - distinct_count) &gt; (total_count * 0.025);</code></code></pre><h1>Gaps in Continuous Data</h1><p>Imagine you have some data stream where you expect events to occur fairly continuously, for example, ad impressions on some large platform like Google. With the volume of searches a day, there should never be a 10-minute time period where there aren&#8217;t ad impressions recorded.</p><pre><code><code>WITH event_gaps AS (
    SELECT
        event_date_time_utc,
        LEAD(event_date_time_utc) OVER (ORDER BY event_date_time_utc) AS next_event_time,
        DATEDIFF(MINUTE, event_date_time_utc, LEAD(event_date_time_utc) OVER (ORDER BY event_date_time_utc)) AS gap_minutes
    FROM
        ad_impressions
)
SELECT
    event_date_time_utc AS start_time,
    next_event_time AS end_time
FROM
    event_gaps
WHERE
    gap_minutes &gt; 10;</code></code></pre><p>The query uses a CTE that creates a mapping between every event, and the timestamp of the following event using the LEAD window function. It saves the difference for each row as <strong>gap_minutes</strong>. The main query then checks that the <strong>gap_minutes</strong> is never greater than 10 minutes.</p><p>In my experience, I have found this type of alert really useful for detecting missing batches of data in an upstream service. For example, the upstream data we consumed for a specific application originated from Kafka events that were ETLed to S3 for us to consume. There was a single failed write from Kafka to S3 for a small period, and this alert caught the issue and we were able to recover these events later. In theory, this failure should have been caught by the upstream team, but &#129335;, all that matters is that we caught it.</p><p>I have also found this alert to be useful in catching bugs I deployed in my own application. There was a time when I deployed a code change that was passing along the wrong timestamp, making it seem like data was missing at specific hours. Instead of passing along the event_date_timestamp (that represented when the event occurred), I passed along the job_run timestamp which was the same for every event processed in the same run &#129318;. Luckily, this alert caught the issue and we fixed our data before we affected downstream consumers. </p><h1>Data Anomalies</h1><p>If you have a dataset that is relatively consistent over time, it may be useful to be alerted of any anomalies. This class of alerts is the trickiest to get right because the utility of your alert depends on the thresholds you set. A threshold that is too strict will cause you to miss real issues, while a threshold that is too lenient will cause the alert to be noisy.</p><p>This post doesn&#8217;t go into the process of figuring out those thresholds, but below I show you some sample queries that you can use to alert you of anomalies. For the examples below assume I have a table <code>revenue_events</code> with columns <code>event_date_time_utc</code> and <code>revenue</code>. </p><p><strong>Note: </strong>The following queries assume that your dataset follows a normal distribution. </p><h2>Example: Abnormally High/Low Values</h2><pre><code><code>WITH stats AS (
    SELECT
        AVG(revenue) AS avg_amount,
        STDDEV(revenue) AS std_amount
    FROM
        revenue_events
)
SELECT
    event_date_time_utc,
    revenue
FROM
    revenue_events
JOIN
    stats
WHERE
    revenue &gt; (avg_amount + (3 * std_amount))
    OR revenue &lt; (avg_amount - (3 * std_amount));</code></code></pre><p>The CTE checks for the average revenue in the table, and the standard deviation using the AVG and STDDEV functions. The main query then searches for records that are greater than 3 standard deviations from the average.</p><h2>Example: Value Outside Percentile Range</h2><pre><code>WITH percentile_bounds AS (
    SELECT
        PERCENTILE_APPROX(revenue, 0.05) OVER () AS lower_bound,
        PERCENTILE_APPROX(revenue, 0.95) OVER () AS upper_bound
    FROM
        revenue_events
)
SELECT
    event_date_time_utc,
    revenue
FROM
    revenue_events
JOIN
    percentile_bounds
WHERE
    revenue NOT BETWEEN lower_bound AND upper_bound;</code></pre><p>This query searches for records in the bottom and top 5 percentiles of the total distribution and on its own may not be a particularly useful alert.</p><p>In practice, this query should likely be matched with time bounds to make sure you are only alerting for responses that should be looked into. For example, if you are a seasonal business, you might want to alert that you got a revenue event in the bottom fifth percentile during a historically high volume time. </p><h2>Example: Sudden Spikes or Drops</h2><pre><code>WITH revenue_diff AS (
    SELECT
        event_date_time_utc,
        revenue,
        LAG(revenue) OVER (ORDER BY event_date_time_utc) AS prev_revenue,
        revenue - LAG(revenue) OVER (ORDER BY event_date_time_utc) AS diff
    FROM
        revenue_events
)
SELECT
    event_date_time_utc,
    revenue,
    prev_revenue,
    diff
FROM
    revenue_diff
WHERE
    ABS(diff) &gt; (
        SELECT
            AVG(ABS(diff)) + 3 * STDDEV(ABS(diff))
        FROM
            revenue_diff
    );</code></pre><p>The CTE first calculates the difference between each event&#8217;s revenue and the event directly preceding it. The main query then calculates if any of the diffs are outside 3 standard deviations of the average diff.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.wasteman.codes/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading wasteman.codes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Substack is having replica lag issues]]></title><description><![CDATA[Why do comments you make on substack not appear immediately? In this post I describe why Substack is having these problems, and how they can solve it.]]></description><link>https://substack.wasteman.codes/p/substack-is-having-replica-lag-issues</link><guid isPermaLink="false">https://substack.wasteman.codes/p/substack-is-having-replica-lag-issues</guid><dc:creator><![CDATA[wasteman]]></dc:creator><pubDate>Wed, 07 Dec 2022 21:11:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F79222ec2-7c6d-4454-8834-6a0cf6fb90fa_268x283.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Since I have been writing on substack, I noticed that quite a few times the changes I have made to posts have not been updated immediately. The same thing goes for comments posted. You can even try it now, try commenting on this post and you will probably see that it takes a bit of time for your comment to show up on my post. My friend complained about this, which prompted me to write this post</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gIhQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gIhQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png 424w, https://substackcdn.com/image/fetch/$s_!gIhQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png 848w, https://substackcdn.com/image/fetch/$s_!gIhQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png 1272w, https://substackcdn.com/image/fetch/$s_!gIhQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gIhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png" width="638" height="176" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:176,&quot;width&quot;:638,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31943,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gIhQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png 424w, https://substackcdn.com/image/fetch/$s_!gIhQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png 848w, https://substackcdn.com/image/fetch/$s_!gIhQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png 1272w, https://substackcdn.com/image/fetch/$s_!gIhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b49fa4-14c8-49f0-9298-8aaf9d53502b_638x176.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p></p><p>I don&#8217;t actually know the inner workings of Substack&#8217;s system, but I can infer that this is a classic replica lag: read your writes consistency problem.</p><p>I actually wrote a <a href="https://www.wasteman.codes/blog/read-your-writes-consistency">post</a> about this a couple of years ago, where I described the problem and how to fix it in a Rails application. If you are interested, feel free to check it out. But back to it, what is read-your-writes consistency? And how does this cause the weird UX we are seeing with Substack today?</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.wasteman.codes/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading wasteman.codes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><strong>Read your writes consistency</strong> is a guarantee that after any write you make to a database, subsequent reads of that data should reflect the changes from any previous writes. In Substack&#8217;s case, if I comment on a post I should see this change reflected immediately. </p><p>In a very simple single-instance database architecture, you probably won&#8217;t face this problem. Almost every modern database today supports some form of database transactions, with pretty good isolation levels. This problem tends to appear when we rearchitect our system to scale to support read load. Substack obviously is a read-heavy system, many more people read posts on substack than they actually write them.</p><p>The most common pattern to scale for a read-heavy system is to provision read replicas of your main database instance. Many companies tend to start with a basic relational database like MySQL or Postgres (Substack uses Postgres to my understanding), which only supports writes to a single leader instance by default. As read load increases, the first solution people go towards is provisioning read replicas of the main leader. You can then update your application to send writes to the leader instance, and all reads to replicas. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Aqe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Aqe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png 424w, https://substackcdn.com/image/fetch/$s_!8Aqe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png 848w, https://substackcdn.com/image/fetch/$s_!8Aqe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png 1272w, https://substackcdn.com/image/fetch/$s_!8Aqe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Aqe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png" width="533" height="381" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:381,&quot;width&quot;:533,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32624,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8Aqe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png 424w, https://substackcdn.com/image/fetch/$s_!8Aqe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png 848w, https://substackcdn.com/image/fetch/$s_!8Aqe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png 1272w, https://substackcdn.com/image/fetch/$s_!8Aqe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0d583-6051-4b7e-a6c1-c9abbb0c9c06_533x381.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Another common pattern is to use some sort of caching, where reads are routed to a cache before reading from the database. The same principle applies here, you have more machines that your app can read from instead of reading from the source database machine directly. </p><p>So where does the problem occur? The problem occurs when there is replication lag; the time it takes for changes to propagate from the leader instance to all the read replicas. When you comment on a substack post, it submits a write to the leader instance. When your page refreshes it reads from a read replica that may not have the data from your comment yet. This is why we see this confusing UX, and it usually self-resolves when we refresh our page a few times.</p><h2>How do we fix it?</h2><p>It will really depend on Substack&#8217;s architecture but a few solutions come to mind. The easiest solution is to force all reads that occur within a time period of a write to read from your leader instance. If your app by default reads from a read replica or some cache, force the app to read from the leader database right after a write. If your system requires a much larger scale of reads, this strategy might not be feasible either though.</p><p>Another way is to rearchitect your application to use a horizontally scalable database. Instead of having all reads route to the single leader instance after a write, if multiple instances support writes you can route them to the instance that stores that specific piece of data. This way instead of all reads being forced onto a single instance, they are distributed across multiple instances because data is written on multiple machines. Obviously, this poses its own challenges, and you probably want a replication strategy here as well. But being able to horizontally scale does give you room to serve many more concurrent users. This is mostly a solved problem, most horizontally scalable databases implement a <a href="https://en.wikipedia.org/wiki/Quorum_(distributed_computing)#:~:text=A%20quorum%20is%20the%20minimum,operation%20in%20a%20distributed%20system.">quorum-based</a> approach to make sure you are returned the latest write, preventing this problem altogether.</p><p>If the problem occurs from caching, we will need to inspect the semantics of the cache. Is it a write-through cache or a read-through cache? What are we saving in the cache? If we cache comments on a post already when a new comment comes in does it add to the existing entry or create a new entry? All of these questions are relevant to architect the correct solution.</p><p>Obviously, I have no insight into Substack&#8217;s actual architecture so who knows if the above solutions will work. But the same principle applies no matter their architecture, you want to read from the machine that has the changes you just wrote. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.wasteman.codes/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading wasteman.codes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Software Engineering Proverbs]]></title><description><![CDATA[A list of proverbs, most not unique to me]]></description><link>https://substack.wasteman.codes/p/software-engineering-proverbs</link><guid isPermaLink="false">https://substack.wasteman.codes/p/software-engineering-proverbs</guid><dc:creator><![CDATA[wasteman]]></dc:creator><pubDate>Fri, 02 Dec 2022 17:26:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34d6193e-2258-40e7-9e52-69efe9f202f7_520x522.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<ul><li><p>Naming things is the hardest problem in software engineering</p></li><li><p>That new javascript framework will become old next month</p></li><li><p>Nobody knows what a monad is</p></li><li><p>making software simple is hard</p></li><li><p>You will look at your code in 1 year and think &#8220;which idiot wrote this?&#8221;</p></li><li><p>All data processing engines converge to SQL</p></li><li><p>Software Incompleteness Theorem: no matter how much you test your code, you missed an edge case</p></li><li><p>Clean code bases don&#8217;t exist</p></li><li><p>For every software principle, there is always a practical use case where it makes sense to break the principle</p></li><li><p>You won&#8217;t refactor it as a fast follow</p></li><li><p>A few hours of planning can save you weeks of coding</p></li><li><p>Good software doesn&#8217;t mean good business</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.wasteman.codes/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Coming soon]]></title><description><![CDATA[A blog about the foolishness I encounter, trying to get computers to do the things I want them to.]]></description><link>https://substack.wasteman.codes/p/coming-soon</link><guid isPermaLink="false">https://substack.wasteman.codes/p/coming-soon</guid><dc:creator><![CDATA[wasteman]]></dc:creator><pubDate>Sat, 01 Jan 2022 06:16:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kcpi!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0025d321-926d-4cf9-94cc-4346ec86411a_268x268.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>A blog about the foolishness I encounter, trying to get computers to do the things I want them to.</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://substack.wasteman.codes/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://substack.wasteman.codes/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>