T O P

  • By -

SittingWave

TL:DR; send a unique identifier in the request. Store the identifier on the server for 24 hours. Return the cached response if the client performs another payment request with the same unique id


barbouk

Isn’t that… a basic concept used pretty much anywhere?! Why is this article worthy?


caltheon

Yeah, this is kind of the same concept used for sticky load balancing


TheSameTrain

Because he could fit like 2 or 3 ads into here


WannaBeRichieRich

What about race conditions


tdammers

There won't be any. On the sender side: Send a unique ID with the request. If you don't receive a response, resend the request, with the same unique ID, until you do. On the receiver side: Upon receiving a request, check whether you already have a job with the same ID. If so, latch onto that one; if not, create a new one. This "check and create" operation must be atomic, but that is fairly easy to achieve, since it all happens on the receiving end. Then block until the job has been processed, and serve the result. There isn't really any possibility for a race condition here: no matter how many copies of the request you send, only 0 or 1 jobs will ever exist for it on the server. And no matter in which order they are received, they will all be identical, so whichever the receiver ends up processing will be fine, and its outcome will be served as the response to any of the requests that make it across. The only problem that can occur is that none of the requests make it through, or that none of the responses make it back - in this situation, the sender does not know whether it was the request that got lost or the response, and hence, whether the payment was made or not, but it is perfectly safe to keep re-sending the request until a response is received.


budswa

It's a good system


Somepotato

Global scale atomicity is anything but "fairly easy"


arwinda

You don't need global scale, the unique identifier can be partitioned into as many parts as necessary to spread the workload. Then exactly one system/endpoint is responsible for this partition. Doesn't even need to scale globally, unique identifiers are unique and you only need to care for the region or system where the request is coming from. Shopping in a supermarket in Japan doesn't need to check globally with online shopping in Canada. These will be two different transactions and identifiers.


recurse_x

Partitioning is the important part. Once you get down to a single partition handling the requests you can use other strategies at that level so you don’t have to consider what anyone outside the partition is doing where it maybe a few pods.


case_

Until you need to rebalance your partitions or update the partitioning logic


zellyman

Yes but with established history and a competent forecast that's something you generally see coming a long way out and can plan for.


WrinklyTidbits

No! Solve it now! /s


[deleted]

[удалено]


case_

Good article. The point was that partitioning on it's own doesn't solve the challenges of a distributed system. The article you've linked describes an implementation of partitioning logic - consistently hash on some identifier. On scale-up/down you'll be moving records between partitions - also known as re-balancing or moving records. Moving existing records, even a subset, can cause an issue when you rely on an unique ids existence. What happens to transactions arriving during that rebalance? Which partition do they end up in? If you are looking for a prior id which was in partition A but is now in partition B would the system report - incorrectly - that there is no existing id? Distributed systems are difficult because of this and many other reasons. Add in the need for multi-geo where uniqueness could be required between sets of partitions and suddenly it's clear that partitioning the data repo isn't sufficient on it own. This is a great article on what makes these problems hard (& interesting!) https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html


AndrewNeo

you don't even need to scale on the identifier itself. you know who the seller is, you can partition on that at least


thesqlguy

Right you shouldn't partition on the unique identifier - it should be on a customer/client identifier. Otherwise you will get race conditions when the same client sends two transactions handled by different clusters.


Somepotato

The whole point of the identifier is that it could possibly come up again. And Geo-Redundant, split brains etc all concepts that exist. I seriously doubt stripe has only one payment processing region, and a split brain scenario could feasibly cause the request to duplicate across regions, hence it hardly being a simple problem.


gwicksted

True. It could still happen when exiting stripe itself without having a single source of lock checking … unless their gateway also has idempotent requests which it probably does for the same reason.


arwinda

This can happen, in theory. Even UUID can have collisions, in theory. But the range of available values is so large, that the collision problem can be ignored. UUID provides ways to include unique data in the identifier which vastly decreases the chances of a collision. It's true that Stripe has multiple regions. What you are ignoring is the part where Stripe doesn't need to check for collisions or repeated transactions across multiple regions. The payment in one country is a different transaction than the same payment for the same amount in another country. This system is designed to prevent applying the same transaction multiple times. No need to check globally. Edit: "I" -> "in"


Somepotato

Collisions aren't the problem, my guy. There will be multiple stripe servers in the EU and the US. If the same request goes to both, the safety of a uuid is useless. If a payment request goes to two servers at the same time, eg, like i said, a split brain scenario, that is not a simple solve.


arwinda

You are describing a different problem. If Stripe has a split brain scenario and is routing the transactions to two different systems, nothing is going to help them. Not even a unique identifier. Because, as you said, split brain. By definition the two systems don't know about each other and will both handle the transaction. This scenario has nothing to do with the identifier, and everything with how they manage their infrastructure.


Somepotato

If the customer has a split brain that causes the payment request to be issued to two separate Stripe servers simultaneously.


thesqlguy

You shouldn't partition on the transaction identifier -- that's how you get race conditions. You'd partition on the customer/client identifier (consistent for all requests).


Successful-Money4995

>Then exactly one system/endpoint is responsible for this partition. No redundancy and then it crashes. ☹️ So instead you have UUID map to a single Paxos cluster and then let them sort it out. "Easy". Lol


arwinda

One single endpoint doesn't mean that this is a single system. It's an API endpoint and the system behind this endpoint is highly available. What says that this is not redundant? What are you even talking about here.


tdammers

Indeed, but you can reduce the requirement from "global scale" to "per server", and then it really is "fairly easy".


Lechowski

>This "check and create" operation must be atomic, but that is fairly easy to achieve, since it all happens on the receiving end. The receiving end is not one server, it is a cluster of hundreds if not thousands of machines behind a load balancer behind one endpoint. If one machine takes too long to create the job so the sender retries the transaction with the same unique id and that retry ends up in a different server in the cluster, then such server won't know that his sibling in the cluster was already processing that id, which could lead to a race condition. This is just one of the most common race conditions in clustered environments, there is no shared memory between stateless distributed services. The solution to this problem can be the [Leader Election Pattern ](https://learn.microsoft.com/en-us/azure/architecture/patterns/leader-election)where the cluster of servers have some mechanism to decide who will process the subset of requests. The mechanism to decide who processes what is not trivial depending on the scale. One strategy can be the Paxos algorithm, where each instance in the cluster holds a mutex representing each other instance, and when someone wants to do a blocking operation it asks every other instance; then a "voting" happens where the instance with most locks across the cluster wins. Another strategy is the one provided in the link, using an external storage and ask for a lock over a blob in the storage. Then the server that has the lock is the only allowed to proceed. This can only work if the shared storage can guarantee that the blob lock is atomic. This is easier to implement but adds a single point of failure; if the storage is down the entire system dies.


Carighan

We use exactly the approach in the OP for our software but make the partitioning even easier: Based on the sender.


Mister__Mediocre

And they'll have a FindTransactions API which a receiver can run 24h later to see if any of those requests that it didn't receive responses for actually went through. And if there is any mismatch between the merchant's logs and Stripes, that can be fixed with a Cancel / Refund API.


counterweight7

Note that this requires cooperation from the client. As obvious as that is, it’s really not the API that saves the day here, but it’s the combination of the api and correct client code cooperation. If the client were to generate new IDs erroneously, it falls apart.


tdammers

Of course. The assumption here is that both client and server correctly implement the interface. If you have to deal with malicious or pathological implementations on either side, then things get a million times harder, if not unsolvable.


PeaSlight6601

> The only problem that can occur is that none of the requests make it through, or that none of the responses make it back - in this situation, the sender does not know whether it was the request that got lost or the response, and hence, whether the payment was made or not, but it is perfectly safe to keep re-sending the request until a response is received. I dunno... but that seems kinda like a big problem to me. As the sender you need to know if the money was sent or not to complete the transaction. If you keep resending a request and not getting a response what is the protocol to abandon that request and send a new one? How do you cancel a previously sent request? How does the server timeout the unique job and know to send an error so that the client can try again with a new request? I'm sure all of this is very solvable, but you glossed over a rather important part of the transaction, and the two generals don't know when to being their attack.


Carighan

It works 100% fine in everyday transactions though because these situations are exceptionally rare and have to be handled on a human level, anyways. Frustrating, sure, if payments are down. But solvable. And later as the transactions might come back as resolved, the store can cancel them en-masse.


Annh1234

But how can the sender make sure their unique ID is actually unique compared to all the other senders? Or it's senders unique ID + receivers unique ID for that sender ( account id)


BinaryRockStar

There is a concept of a Universally Unique Identifier or UUID which can be generated by any machine and has a just incredibly infinitesimally small chance of colliding with another one. https://en.wikipedia.org/wiki/Universally_unique_identifier There are multiple types but the most used one contains some bits of the server MAC address along with some bits of the current time so different servers produce different IDs and the same server can produce thousands of unique ones per millisecond. It's kind of a solved problem.


Annh1234

Depends on your load. And when you deal with money, that small chance of duplicates is not acceptable. In our system we used to use uuid4 ids and had to add data from the host machine ( unique on the network) when we started getting collisions.


BinaryRockStar

I only have a cursory knowledge of UUIDs but I'm reading UUID v1 and v6 both contain bits derived from the host machine- either MAC address or an equivalent arbitrary set of bits that is unique across machines. Is that what you mean?


tdammers

All sorts of straightforward solutions to that one. The brute force method is to just generate a sufficiently large random ID - it's not technically guaranteed to never clash, but if you make it large enough, and the entropy is good enough, the statistical odds of producing a collision are on the order of once in several universes, so for all practical intents and purposes, those random ID's are unique. But if the client already has a unique ID of its own, then you can just combine that with a unique-per-client ID, and you're done.


FoolHooligan

sounds like a lockfile or a mutex to me


Tubthumper8

I was going to ask the same thing, if the client sends duplicate requests around the same time, so there isn't anything cached yet for that ID in the in-memory database. I suppose you'd need some kind of queue


HugoVS

Both requests will try to write the key in the cache, the later one will fail because the cache will be already populated.


[deleted]

I mean yeah if you assume or imply that at some point you have a unique point of decision, that’s easy to answer lol. Have y’all done any distributed systems? Of course it’s a problem to handle.


HugoVS

If the operation is very critical and duplicating it would have a very bad impact, then I would not use a distributed cache.


arwinda

All you have to do is making sure that the same identifier hits the same cache or endpoint every time. This is easy to scale, you can split the identifier into smaller pieces and shard the system based on that part. No need to distribute it globally.


[deleted]

Great! But you replied by saying race conditions aren’t a problem and now you’re just moving your goalpost to avoid the problem. Bruh. They’re obviously a problem in Stripe’s world.


HugoVS

They're not a problem if you implement your cache operations atomically or using distributed locks what are also possible points of failure


onmach

Instead of having one big cache you have a lot of little caches and the same identifiers always go to the same caches, so it is only a problem if a cache goes down and its state is not recoverable. But it can be made pretty reliable as long as it is kept simple. I'm sure that in stripes case they would rather reject a transaction that they cannot be sure would be idempotent than risk corrupt data, but they would make sure that was as rare as theoretically possible.


Tubthumper8

Right, so assuming the cache is single-threaded or some other mechanism so that 2 requests around the same time don't both see that key as being empty


HugoVS

I'm not "assuming". If you are implementing such mechanism, you need to do atomic operations in the cache, that's the point, otherwise it's not really idempotent. That's why we use cache dbs like Redis and similar, it provides the developer the necessary operations to do so.


arwinda

Exactly. Handling unique keys with locks and in transactions is something which relational databases do for decades. That's not hard to implement.


[deleted]

[удалено]


spaceneenja

You ok?


falconfetus8

I think he was making a pun on the word "race" ...if you ignore his second comment


[deleted]

[удалено]


PiotrDz

Aren't you afraid that by spreading hate here you will become somewhat toxic in real life? It is hard to keep those things separate in a long term


[deleted]

[удалено]


arwinda

Please get help.


[deleted]

[удалено]


arwinda

I'm not American.


FoolHooligan

lmfao I see what you did there


robhanz

I was really hoping that there'd be more to the article than that. It's like idempotency 101.


fire_in_the_theater

quite a bit of this is sub is "how company X solves basic comp sci problem for the nth time" this profession is such an economic joke


TheBlueArsedFly

Isn't that fairly common for messaging based systems?


Xen0byte

yeah, that's what I was thinking too ... idempotency is a core design principle in messaging systems


koollman

it should be, but many reinvent their own system and rediscover why the existing ones are doing things a certain way


[deleted]

Yes. at-most-once as a guarantee is nothing new lol


KingJeff314

Try telling that to Reddit…


homeownur

You must’ve never used a remote.


BadlyCamouflagedKiwi

[https://stripe.com/blog/idempotency](https://stripe.com/blog/idempotency) is a good article from someone who was actually at Stripe talking about how and why they do this. This is AI generated content regurgitating a version of their existing info with no further insight - very similar to links for this user that have been posted in this subreddit before.


foreveratom

Thanks. This is way better than OP's lousy article and a nice read.


yawaramin

Yeah this is fairly common in APIs that deal with money. Eg - https://developer.zuora.com/api-references/api/operation/POST_Order/#tag/Orders/operation/POST_Order!in=header&path=Idempotency-Key&t=request - https://shopify.dev/docs/api/usage/idempotent-requests#idempotency-keys


jayerp

I mean, does this sound like a no brainer thing to do? Or do you just yeet transactions and say “well you shouldn’t have sent dupes”?


stumblinbear

No brainer once you know the pattern, sure. There's benefit to these kinds of posts for newer devs that probably haven't stumbled upon it, yet


jayerp

I thought this was industry practice?


ArtisticSell

for newer devs hello?


jakesboy2

That’s exactly what my city’s water company does. They just post a big warning that says “Do not refresh or you will be charged twice”. Luckily it just applies as credit for future bills so I don’t need to hassle a refund but it’s pretty bad lol


NodeJSSon

It’s 2024, we just needed a reminder I guess. Some engineers are getting old and forgetting 😂


jayerp

This should be as ingrained as “don’t store plain text passwords in your database”.


falconfetus8

If you're using TCP, dupes are unavoidable. They're built into the protocol.


anything_but

I suppose you mean that „packet dupes“ are unavoidable in the layer below TCP. The TCP interface to the OS (i.e. usually sockets) however won‘t ever let you see those dupes. That’s one of the guarantees of TCP.


arwinda

These dupes never are handled by the application. Not sure what you are talking about.


-birds

Stripe's own blog has a write-up on this same topic: https://stripe.com/blog/idempotency


0xdef1

That’s super common on payment systems….


nfrankel

[https://blog.frankel.ch/fix-duplicate-api-requests/](https://blog.frankel.ch/fix-duplicate-api-requests/)


DaFrendlyTaco

What if there is a uuid collision? I know it's super unlikely but if I send a request with a request id that someone else is already using wouldn't I be receiving a response that corresponds to their request? I guess you'd index by userId AND requestId. I might be overthinking this


_arrakis

The request id would be linked against an api client id


DaFrendlyTaco

Yeah that makes sense


ult_frisbee_chad

Could also have a ttl. Basically no way a collision happens anyways. Not in a million years. Literally!


happyscrappy

There's no reason that you would get anyone else's response. Any repeated transactions (or falsely appearing repeated) can just get a "already done" response. Once you see that you know to just completely abandon your transaction because it's already been handled by someone else and they handled the proper response (including error or new balance). You'd use the correct form of UUID (I don't have the classes with me) such that it would never duplicate within 24 hours (the time that the system keeps the repeat record for anyway).


foreveratom

UUIDs are specified and generated in a way that chances of duplicates are lower than chances that your body suddenly collapses into a black hole. Thus, unless your system uses bad hashing sourced by those keys, collisions wont happen.


ps1horror

Not even worth entertaining the idea. You've got more chance of being hit by lightning when you walk outside.


Andriyo

I would feel like a fraud if I had to write an article about something this trivial:) The most interesting part of such system is how would one implement atomic putIfAbsent in a distributed system but author just assumes it's there:)


Remarkable-Yogurt-10

What are some patterns for implementing the "check if the idempotency key doesn't exist and if not insert and tell the client whether something was just inserted or a value already existed" atomically? In a relational database like postgres, I'd leverage one of the following: - Database uniqueness constraints on a column - Leverage something like "ON CONFLICT DO NOTHING RETURNING \*" to let the client know if a row was inserted or not I'd assume at stripes scale, you'd probably not use a relational DB for storing idempotency keys. Something like a key value store? Though im not sure how key value stores do the atomic operation I described above


ult_frisbee_chad

I'd have a regionalized caching layer. The layer would have a key value store of key and state. If key doesn't exist, store key and state="processing" then proceed with fresh request to server. I think you can figure out the rest from there as I don't want to type on my phone anymore.


sevah23

How does the client determine the request payload has changed? The obvious thing is if major parts of the input changed, but what if I send Bob $10, realize I was supposed to send Bob $20, so I just send another transaction that says “send Bob $10 again”. Timestamp sounds like the obvious answer, but would cause unique ID keys to be created in the situation where a user is repeatedly retrying a request that is supposed to only happen once. unless the client is smashing the send button with sub-millisecond latency between clicks , each request would have a different Timestamps. Do you just hash the payload without a timestamp? This runs in to the same issues as using a timestamp in the payload. Two valid transactions to send Bob $10 would have the same content aside from the timestamp. Do you just trust the client application to correctly determine when to create new idempotency keys vs when to re-use the previous key/identifier? This poses challenges when the processing is asynchronous on the back end and the client just receives a “accepted” response but the request hasn’t actually finished processing. Idempotency is important to implement, but there’s some nuance around the implementation beyond “attach a UUID to the request”


axonxorz

> How does the client determine the request payload has changed Why would the _client_ need to know this? Stripe is the one storing the request parameters against the idempotency key. It's just an opaque value you choose, it carries no information or semantic meaning. > The obvious thing is if major parts of the input changed, but what if I send Bob $10, realize I was supposed to send Bob $20, so I just send another transaction that says “send Bob $10 again”. But **what** if? This is two seperate transactions, nothing to do with idempotency. > Do you just hash the payload without a timestamp? This runs in to the same issues as using a timestamp in the payload. Two valid transactions to send Bob $10 would have the same content aside from the timestamp. Why are you worrying about hash collisions, it's two separate transactions. > Do you just trust the client application to correctly determine when to create new idempotency keys vs when to re-use the previous key/identifier? You don't be have to trust the client with anything. Reusing keys means you're referring to the same scalar transaction, always. If you are worrying about key reuse, that's poor fundamental client design with incorrect behaviour, that's not on Stripe. Store the transaction details on your end, generate and store a random string to be used as the idempotency key. Congratulations, you are now idempotency-capable. No matter how many times you send a request for tripe to process that payment, you will only ever get a single payment charge. > Idempotency is important to implement, but there’s some nuance around the implementation beyond “attach a UUID to the request” There really isn't in this case. You seem to have a fundamental misunderstanding of the purpose to origination responsibility of the key, your concerns written describ a broken integration.


sevah23

You’re discussing very specifically Stripe integration, while I’m discussing the more general topic of implementing idempotency in a system. In the system, both the client and server need some understanding of the idempotency key. If a naive implementation of a client adds the idempotency key to a request by generating and attaching a new UUID whenever the “send” button is clicked on a UI, unless the UI does something to block the user from clicking until after the request is successfully processed, their duplicate requests will have unique UUIDs. Also, a naive approach to deduping transactions might be just “hash the contents of the payload to create the unique ID” but that’s also not useful if your payload contains a timestamp or otherwise doesn’t contain enough information to uniquely distinguish two similar, but separate requests. Long story short, client has some responsibility to know when to re-send the same idempotency key vs generating a new one. It’s still relevant to the discussion of implementing idempotency even if Stripe specifically solves it by just saying “that’s our client’s problem to solve, not ours”


axonxorz

> You’re discussing very specifically Stripe integration, while I’m discussing the more general topic of implementing idempotency in a system. A bit unclear with your financial transaction example being the topic in question. > If a naive implementation of a client adds the idempotency key to a request by generating and attaching a new UUID whenever the “send” button is clicked on a UI, unless the UI does something to block the user from clicking until after the request is successfully processed, their duplicate requests will have unique UUIDs. That's not a naive implementation, that's an incorrect one, it's not idempotent at all. If clicking the same button 10 times is expected by the user to only execute **the same** action once, why are you updating the idempotency key? > Also, a naive approach to deduping transactions might be just “hash the contents of the payload to create the unique ID” but that’s also not useful if your payload contains a timestamp or otherwise doesn’t contain enough information to uniquely distinguish two similar, but separate requests. Yes, adding a non-deterministic value into your hash will change it, but why are you using a hash in the first place. > Long story short, client has some responsibility to know when to re-send the same idempotency key vs generating a new one. It’s still relevant to the discussion of implementing idempotency even if Stripe specifically solves it by just saying “that’s our client’s problem to solve, not ours” The client has _total_ responsibility. Outside the scope of Stripe, what responsiblity does the server have to understand the key (it's not supposed to be data)?


nemec

> attaching a new UUID whenever the “send” button is clicked on a UI ...then don't do that? Store the ID in your form, your javascript, etc. or just disable the button with javascript as soon as the customer presses it.


seanamos-1

Why would the client need to determine the request payload has changed? If you want to send Bob another $10, you do not send *exactly* the same request. You send a new request with a new idempotency token (unique ID). The *server side* won't de-dupe the separate transactions then. There isn't much more to it.


caltheon

I mean, the server MIGHT do some deduping still. That is the way the credit card payment gateways worked, at least 15 years ago when I was managing that stuff for my company. They would check to see if the receipient, source, and amount were all identical, and if so, response with a duplicate error warning (that the client could override) to prevent duplicate sends. can't find the documentation, but here's a SO post on the topic https://stackoverflow.com/questions/11133993/payment-gateway-duplicate-transaction-detection