Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Problem: how to securely fix critical runtime issues #4329

Open
bedeho opened this issue Sep 29, 2022 · 13 comments
Open

Open Problem: how to securely fix critical runtime issues #4329

bedeho opened this issue Sep 29, 2022 · 13 comments
Labels
post-mainnet question Further information is requested runtime

Comments

@bedeho
Copy link
Member

bedeho commented Sep 29, 2022

Background

Suppose someone identifies a critical security fix and reports to the council. It is determined that its important to fix and deploy, however, in the process of deploying a runtime upgrade must be made where code change must be explained, presumably. How can the community challenge destructive or malicious upgrades if the code change is not justified transparently?

Question

  1. What ways could this be dealt with?
  2. How do other major chains, both with and without upgrades, handle this? Doing a deep dive reviewing incidents on how they were identified, reported, resolved, published and communicated, would be key.
@mochet
Copy link

mochet commented Sep 29, 2022

I will give this more thought and do some research.

Some wild ideas. Some of these would actually have value in other aspects of the project potentially. Especially if we want to fund things that require an element of confidentiality to be effective (such as funding creative endeavors like movies):

  • An "emergency pause" proposal that pauses most/all pallets until the fix is completed. This would obviously require communication using various channels and a degree of communication which is not guaranteed. It is unclear if you'd want to pause transfers in such a case.

  • An option to use the forums using encrypted messages that are only available to whitelist recipients or elected council members/WG leads.

    • This isn't very desirable as it may lead to an organization where back channel dealings are common.
    • Leaks could happen.
  • The same as above, but with messages that require multisig to be released at a later date: https://me.hashkey.com/en (substrate based)

  • The same as above, but with some sort of time-released lock which enabled the discussion to be publicly revealed at a later date: https://www.timecapsuledapp.com/ (substrate based)

  • This could also be done with non-runtime proposals, that have their text/description time locked for some period of time. They could include a human readable aspect (i.e. CRITICAL: SECURITY VULNERABILITY) while leaving the rest encrypted so only certain recipients can see the text and after a certain date the text is revealed to all.

@bedeho
Copy link
Member Author

bedeho commented Sep 29, 2022

I think the most useful contribution here is to actually review incidents from other major projects first, no need for us to try to invent anything new before we understand what other are are already successfully doing.

@traumschule
Copy link
Contributor

This sounds like a builders task in coordination with council to

  • publish a protocol for updating the runtime (proposals, bounty creation, builders report etc.)
  • review and explain code changes
  • deploy a test network for the suggested change
  • publish an "open stress test" bounty to invite security researchers and blockchain developers
  • have someone in builders able to create an audit
  • the council would then either trust the builder's assessment or order and wait for an external audit

We'll also want to keep working closely with substrate developers to get notified about security updates and communication (how to responsibly disclose security bugs and announcements).
There's an example for a public security research bounty: https://threatpost.com/tor-project-opens-bounty-program-to-all-researchers/126937/

@bedeho
Copy link
Member Author

bedeho commented Sep 29, 2022

This sounds like a builders task in coordination with council to

I don't think this gets to the core of the problem. The core of it is that voters need to be able to know what upgrades are being proposed in order to intervene in one of the elections to block a malicious upgrade, but this means actual upgrades must be open-source and explained, which means third-party attackers can view the upgrade in advance and actually exploit it.

@traumschule
Copy link
Contributor

If the assumption is the council takes all steps to verify the fix and announces a security upgrade with limited info to not endanger the current chain that should be fine. After all you wouldn't want to elect an incompetent or malicious council, some trust is required. Same for other privacy / security / legal issues the council will have to handle in cooperation with leads.

@mochet
Copy link

mochet commented Sep 30, 2022

I think the most useful contribution here is to actually review incidents from other major projects first, no need for us to try to invent anything new before we understand what other are are already successfully doing.

Agreed.

I am doing a lot of searching and putting together some examples, but writing out a scope of what I understand the most applicable/relevant examples would be for our use case (I'll still list many others, but these seem like the most pertinent for our DAO and writing it here to outline what I'm looking for in particular because there are so many successful attacks):

  • Layer 1 blockchains with governance systems
  • No multisig/sudo keys
  • Should ideally assume a very random distribution of tokens and/or large amounts of tokens locked away for long time periods (meaning that it isn't necessarily possible to perform urgent actions)
  • Not really have an on-chain reputation system that can be easily leveraged for authority (although this could still be applicable in our case)
  • Should include forum posts, Github issues/PRs as well as proposals relating to any changes, notifications or rollbacks

Let me know if that scope sounds about right. I'm mainly thinking out loud (:

There are of course plenty (thousands) of successful attacks that have happened but finding specific situations where an exploit/critical bug was found, reported & fixed before being exploited (outside of unsolicited bounties which probably wouldn't count in this type of situation) doesn't seem to happen very often. But I will continue looking as this is a very interesting topic.

@bedeho
Copy link
Member Author

bedeho commented Sep 30, 2022

Layer 1 blockchains with governance systems ... No multisig/sudo keys

The governance side is actually not the first order issue here. Take Bitcoin for example, here is a critical inflation bug

https://hackernoon.com/bitcoin-core-bug-cve-2018-17144-an-analysis-f80d9d373362

somehow, the bug had to be fixed by someone and then this person had to tell everyone to upgrade, while not immediately signalling how it could be abused, but also obviously people needed to understand what they were being asked to run. How did they solve this problem?

Whether this person was a random dev, a council or whatever, is not the first order issue, but its worth taking into account when thinking about how things will work for Joystream in the end.

@mochet
Copy link

mochet commented Sep 30, 2022

The Polkadot Fellowship is something I stumbled across today, it is quite an off-chain concept and would require a very socially mature community, but it has a heavy emphasis on technical knowledge and expertise of Polkadot's underlying technology and its runtime specifically: https://github.com/polkadot-fellows/manifesto

I will try to summarize it briefly as it does seem to provide at least one potential model to develop a highly expert group on matters related to security, protocol and runtime changes over everything else.

The Fellowship thus aims to embody the expertise over protocol and code design which is utilised by any realisation
of the Polkadot meta-protocol (i.e. a node implementation), any realisation of the Polkadot runtime and any code or
technology primarily utilised for the routine maintenance of the network and without which would seriously inhibit the
network’s potential to sustain itself.

In short, if expertise or code is required and primarily used for the Polkadot (Main) Network to continue operating
and improving, then it is covered. If it is not then it is not.

There should be additional meetings exclusively for Fellows and (when there are practical numbers) Masters. Information on these meetings should (for security purposes) be kept on a need-to-know basis. Disregarding this is a breach
of Fellowship rules

It does not value simple knowledge of programming languages but seems to purely value core protocol knowledge, expertise, actual contributions to the codebase and the like. The members are ranked, manually inducted and also financially rewarded for being a part of this fellowship. They are also expected to "ideally" vote on matters via commit-reveal and require quite a stringent standards.

It was also described in the [Proposal for Common Good Parachains](https://polkadot.network/blog/proposal-for-common-good-parachains/) blogpost:

The new proposal for Polkadot governance introduces a new collective, the Fellowship, which allows a ranked group of experts to express its opinion on sensitive or highly privileged proposals.

Joystream itself may or may not become a parachain, or it may get to a stage where such a group is possible to self-develop or not, but it is perhaps even a possibility that extremely critical matters could somehow be shared with such a group at some point in the future.

@traumschule
Copy link
Contributor

Good starting point to learn about responsible disclosure: https://media.ccc.de/search/?q=responsible+disclosure

@mochet
Copy link

mochet commented Sep 30, 2022

I'll keep looking but found some useful stuff for a good starting point with some help from @traumschule

I looked through a few recent exploits (by no means a large amount yet) and almost all:

  • Were DeFi/cross-chain/bridge related (which I think makes creator tokens seem like they may be a more attractive vector for attack)
  • Were projects that were already audited
  • One (Acala) was the result of a misconfigured proposal/pallet (Substrate based--this was the one example I found of governance being used quickly and effectively to stop an exploit but it has taken 30+ days and has still not fully resumed yet, which highlights the extreme costs involved).
  • One involved a MITM attack (DNS spoofing) for a UI which was rectified within an hour (which probably indicates that monitoring of any UI/UXs should be a highly routine task)
  • Dependency supply chain attacks seem like they are feasible in some situations (Audit Dependencies rmrk-team/rmrk-substrate#46) which could be mitigated by ensuring regular checks are done for vulnerabilities of any dependencies. (available as an action from the GitHub marketplace: https://github.com/actions-rs/audit-check)

Nonetheless, the following links I think provide quite a good summary of some of the varying approaches to disclosures as well as highlights some of the tooling and processes used. A lot more research could probably be done but it seems from a surface level look that ensuring a culture of paying out responsible disclosures, and maintaining good relationships with auditing companies probably have very long term benefits.

Available Tooling

  • Github’s inbuilt tooling for creating security policies and issuing security advisories
  • https://immunefi.com/hackers/ - placing bounties on finding vulnerabilities

Some crypto project vulnerability processes + bounty information

Security Disclosure Policies for some crypto projects:

Other Issues / Lessons:

  • Lack of defined process or communication channels for disclosing and/or handling urgent security matters (NOTE: this is not accurate and seems to just reflect poor information flow within the Polkadot ecosystem)
    • Currently, as I’m writing this post, there are some security vulnerabilities in Substrate that could be abused on some Parachains but have not been disclosed by Parity. I suppose they want to have it fixed/deployed in the relaychain before revealing them but I think it is a bad strategy for the ecosystem.
    • I would also add that as a founder/lead on one of the parachains that may be affected by this, I have not received any responsible disclosure on this alleged vulnerability. That also seems very odd to me.
    • https://forum.polkadot.network/t/improving-the-substrate-ecosystem-vulnerabilities-disclosure/38
  • Indecision regarding ownership of, or willingness to finance audits of pallets between different parties:
    • In addition to the difficulty to maintain it, Parity’s (to my understanding) doesn’t want to take any responsibility for Frontier’s pallet/code/issues, making it hard to guarantee its quality and support. (The Moonbeam foundation is currently the one paying for audits in the Frontier/EVM repos)
    • We’ve been contributing to Frontier for a couple of years now and I think Wei has done great designing and reviewing the project proposals but Alan is right in that a single maintainer can sometimes drag reviewing process depending on that person’s schedule, which is totally fine, but can be simply solved by having more eyes on it.
    • https://forum.polkadot.network/t/making-frontier-a-first-class-citizen/37

Rewinding the chain as a last resort

I didn't come across many attempts at "rewinding" chains recently (like from "The DAO" hack on Ethereum) but it seems like these might have value. The DAO could agree via governance to certain historical points to rewind to but this wouldn't account for all assets and would probably be highly complex.
Nonetheless, there are services around that archive chains for historical purposes if this was desirable:

May look into this more when I get a chance, but I think it shows some of the variety in approaches that exist.

@bedeho
Copy link
Member Author

bedeho commented Oct 1, 2022

Great work compiling all of this, but it would be even more useful to write down some actual case studies of how specific incidents were handled, specifically how was system updated, so that node operators started running new software, without the attack vector being broadcasted.

If you can find a concise answer to this question for 5-10 incidents, then I think we have a good foundation.

@mochet
Copy link

mochet commented Oct 7, 2022

BNB chain suffered a significant attack yesterday: https://www.bnbchain.org/en/blog/bnb-chain-ecosystem-update/

Their way of remedying it was to communicate with validators and they were able to pause the network rather quickly. Would be interesting to know if that is possible on Substrate.

@mochet
Copy link

mochet commented Oct 15, 2022

A good way to browse recent hacks and also includes the exploits used: https://defillama.com/hacks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
post-mainnet question Further information is requested runtime
Projects
None yet
Development

No branches or pull requests

3 participants