-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantifying CoinJoin inefficiencies #27
Comments
I think this isn't what we are looking for. What we are looking for is every normal looking blockchain transaction that doesn't have anything to do with coinjoins. And from that set we should get percentages instead. |
What does right after mean? I think this is somewhat messy, but I see that it makes sense. Maybe it should be a scale. A better question would be: among non-remixed utxos how long it took them to be spent? |
ACK. Very important. |
i think both are in scope, but very different:
this question seems relevant, and i think more general because right now users are using Wasabi with a certain distribution of input amounts. i also think it's potentially more than just a privacy leak to the coordinator, it leaves an on chain footprint. for example input amounts of exactly 1.0 and 0.5 are very common in the data i looked at (beginning until sept 2019), presumably from unit bias, and this interacts with variable denominations to produce change output amounts. specific amounts created can also end up interacting with coin selection when users are remixing coins in the next denomination epoch, which can affect how much users pay in fees, and i believe (but have not yet been able to demonstrate) that this can be used for some privacy attacks - for example suppose a user is now remixing after a denomination reset, or wants to mix change from a payment of a known amount, but chooses an input that is a bit larger than the optimal size for the now larger denomination - that allows an attacker to conclude that they are likely not one of the users who got a near optimal change output from a previous mixing transaction. this is an issue because these change output sizes are not arbitrary, they are a function of the user's prior input and the denomination of the round that created it, i.e. a very strong temporal quasi-identifier. |
It seems to me like there are several kinds of inefficiencies we might be concerned with:
and i'm not sure if efficiency is the right frame for privacy leaks, since but all of these measures interact with it, potentially non-linearly i.e. higher efficiency might reveal less information, so fewer quasi-identifiers to try and combine, but also might stand out more. since Wasabi coinjoins are easily identified, this has more to the txs happenning before a coinjoin, i.e. do TXOs that end up going into mixes have a non-typical distribution? and also to txs happenning after, both of which are constrained by the current round structure. framed more quantitatively - what is the degree of connectivity (e.g. average path length, maybe modular decomposition of graphs would be useful) between the coinjoin-related transaction subgraph, and its complement in the entire tx graph? |
You mean something along these lines? See: Sergi's (@sr-gi) paper: Another coin bites the dust: An analysis of dust in UTXO based cryptocurrencies (https://eprint.iacr.org/2018/513.pdf)
Can you elaborate on this? How do you measure UTXO set size efficiency? what does it mean at all? How do you define it?
To me block space efficiency means that many txs (especially payments) issued after CoinJoin txs could be squished into the CoinJoin itself...this could be quite easily measured.
Really interesting point. I'm not into optimization theory, but I guess this cannot be solved concretely in poly-time. Maybe you can approximate the best solution in poly-time, but I would suppose you cannot do this efficiently given that this optimization problem can essentially be reduced to the subset-sum problem. Hence, I think this is out of scope...at least the optimal solution. |
equal valued coinjoin means that for for an arbitrary amount you will typically end up with change, and then you will also typically end up with change if you later use such outputs for payments. overall the lower the hamming weight of amounts, the more outputs need to be created to represent a given value ownership distribution.
yes, and it pertains to the UTXO efficiency i mentioned as well, i.e. the same situation that makes you end up with arbitrary amounts that are unlikely to be equal to your prior amount or the amounts you eventually need to spend, thereby increasing your average UTXO set size per wallet also makes it so that you create and destroy more intermediate outputs, especially if you are trying to avoid linking e.g. change outputs. currently bitcoin privacy almost always requires a tradeoff in block space (or mostly equivalently, fees paid). there is a perverse incentive to sacrifice privacy (and therefore also hurt fungibility) if trying to reduce transaction costs, so it would be nice to try and quantify this overhead.
if the denomination is fixed then the optimal solution is trivial - equal valued coinjoins with no change outputs. this is the premise underlying whirlpool's tx design and fee structure, which i think makes a lot of sense since it also enables and incentivizes large graphs of coinjoins. for arbitrary denominations only using amounts of hamming weight 1 seems hard to beat for fungibility, but obviously adds a large overhead (a roughly constant factor easily quantified by the hamming weight distribution of all outputs). however, arguing that that is optimal requires some sort of model/framework, and if that model also takes into account block space/fees then like you said it becomes non-linear and is seems likely to be NP hard. ideally whatever tx structure we propose should align privacy and block space, these goals are theoretically very compatible: the fewer bits you write to the blockchain the less there is to analyze in order to break your privacy. unfortunately in practice this is very far from being the case, but i don't think it's unrealistic. i think a pragmatic approach would be to aim for some robust minimal level of privacy by construction, and attempts to approach a pareto frontier of privacy and block space efficiency as a combined objective, but actually favoring block space efficiency (not privacy per block space efficiency), because making the tradeoff less painful seems more likely to change individual behaviour, but that will have compounding effects for privacy & fungibility that is positive sum. |
oops, forgot to bring all that rambling back to this issue - the point is we should try to measure this tradeoff, to try and estimate ow much are privacy wallet users paying relative to privacy nihilists, and to what extent do these function as separate economies (i.e. do those users subsidize the nihilists privacy in some way?) |
|
I'm making the decision on what should be in the serialized transactions. Before that just to reiterate what kind of raw data I'm going to collect on the blockchain:
A transaction object would contain:
Is there anything else we care about in a transaction? It's very important to decide on these now, because it'll likely take a week to parse the blockchain to gather all this initial information. (Although the software will finish overnight, but I expect it to fail have incorrect data or something to come up that makes me have to restart the run a few times so that's why my one week estimate.) |
|
Unfortunately it needs to be. There are dependencies on exploring whole txchains. For example if you don't know all the Wasabi transactions, then you won't know which inputs of a specific Wasabi transaction coming from another CoinJoin transactions.
I think you misunderstand it. The above data would be collected to speed up further data collection. I didn't mean that after this data collected we will never make a
That's in the transaction hex, so that's covered.
Would be faster and more convenient, but it will ultimately depend on if I will have enough free disk space:) For the record asking all the Wasabi txs based on txids takes 12 minutes for me with an overoptimized algorithm that loses order of transactions, which is a huge problem for some metrics. So it'd be really good to keep these on disk instead otherwise we'll spend hours to collect statistics. |
ah, so this is more like an index? in that case yeah this all sounds right to me |
I'm ready. Starting the algorithm and praying it won't bloat my disk or be too slow. Otherwise I'll have to optimize. |
Unfortunately it didn't work out. I encountered multiple issues, I'll work on them.
|
I think it should suffice to get a rough picture. We will not get the whole picture, but a rough lower bound could still be established for the number "immediate post-mix payments".
Does your script fetch entire blocks containing mixing transactions? Can't you just fetch certain txs by transaction id? For Wasabi txs, I suppose you have all the tx ids, hence you could just fetch them directly without downloading the whole block. Personally, I would only be interested in wasabi mixing txs (+post/pre mix txs). At least in the very short term, maybe we could extend our scope later. |
Good work guys, I think it would be sufficient to have only wasabi tx for now, and we already have all tx ids, so this would make it a lot more efficient. Would still need to get post mix tx though... |
Index is done and some checks suggest it's correct. I'm moving on to the metrics. |
LOL, I found this unsent comment here for days. By tomorrow hopefully all the stats I described will be published. Problem that WabiSabi Solves and what statistics can prove that problem exists.
We'll gather other statistics when we want to decide on denominations. I think other suggestions here should come at that phase. It seems to me that most of @nothingmuch's suggestions are comparative metrics and not standalone ones. So we can score coinjoins and compare them to the new WabiSabi protocol with denominations figured out, but for now it doesn't make sense to compare Wasabi, Scamourai and other coinjoins to each other. That's another research. |
For a proper research paper we need quantified arguments about CoinJoin inefficiencies. Our goal is that WabiSabi will enable us to somewhat alleviate these inefficiencies.
These are the statistics we were discussing with @nopara73 and @nothingmuch:
The number of input UTXOs in a coinjoin that have less value than the minimum denomination. In current CoinJoin implementations these UTXOs if merged, than coordinator can link them. This is a privacy-leak we will be able to eliminate with WabiSabi with input merging.
The number of output UTXOs that right after the CoinJoin tx were redeemed for payment purposes. We could use a heuristic that every tx with 2 output UTXOs are payments. This might be a good approximation of the payment transactions using CoinJoin outputs. With WabiSabi these transactions could happen in the CoinJoin tx itself, therefore saving considerable blockspace for the whole community.
Calculating the ratio/number of unmixed change outputs in CoinJoin txs.
Pls feel free to add below or edit this issue if you have any meaningful idea we should measure!
The text was updated successfully, but these errors were encountered: