Web3 DAO | Ethereum Foundation Logo

Make `clef` scalable

Organization

Ethereum Foundation

Deadline

N/A

Status

LIVE


INSTRUCTIONS

The clef binary is meant to be a secure key management tool, which can be used to separate key management from the actual node operation.

It inherently uses the same data model that's already in the go-ethereum library, which is,

  • Keystore (directory with encrypted keyfiles)
  • USB-devices (hardware wallets)
  • Other external backend devices accessible via the signer external protocol.

However, there is one usecase which doesn't really fit; when a user has millions of keys. The problem with the keystore is,

Primary problem

  • We use filesystem notifications to become aware of new keyfiles.
  • When we detect a keyfile, we open the file and scan the json for the address element, and
  • add it to an internal cache of available addresses.

This works fine for a handful of addresses, but does not scale.

  • Opening millions of files and unmarshalling json takes quite some time,
  • The internal caches are not prepared to handle millions of items, which causes extra overhead (sorting, etc).

In order to cater for this type of usecase, we would need an additional data storage format -- not based on keystore files. A problem with keystore files, is that although the actual address is commonly in the actual filename, this has never been mandated.

Secondary problem

There is also a secondary problem: apart from the actual key data, clef maintains a separate database of metadata, contaiing

  • ruleset data,
  • passwords

If a user has 5M keystores, it should also be possible to have 5M passwords. Currently, this would probably not scale well. Although each password is individually encrypted in an aes-gcm container (so the entire thing doesn't need to be decrypted), I suspect that it might be pretty slow to access one of them, since the whole thing is loaded into memory first.

Possible solutions

  • Add a generic backend for database-backed keystore, where a somewhat generic database can be configured. The database should be able to answer queries about what addresses exist, and delivery encrypted keystore json. Users can then configure this with arbitrary SQL database, either locally or remote. This has the upside that it's totally up to clef to maintain integrity and robustness of the database -- it will be up to the user to manage the database.

  • Add a non-generic database, e.g. leveldb or sqlite or something custom. If we do something custom, it's highly important that the db is robust. Database corruption, resulting in the loss of keystore data is not acceptable.