Swarm
Until now, we have been treating the modules of swarm, whisper, and EVM as a black box, where we are only interested in how these modules interact, rather than what really happens inside them. However, to have a firm base for developing smart contracts, which we will eventually define and take up in Chapter 3, Hello World of Ethereum Smart Contract, we need to know the bare minimum internal workings of these modules. This will help us in tuning the smart contract code to make its performance better while running on the Ethereum blockchain.
So, how do we measure the performance of our code? The answer is unfortunately not very straightforward, but lies along several axes. Performance is measured along resource consumption, and code consumes a variety of resources. After a certain point, a trade-off among various resources is required. We consume more of one resource and less of another to make a code truly performant in the real world.
Let us now look into the three important resources that measure the performance of a code. They are time, space, and network. Time resource is how long any code needs to run to completely process a set of operations on a given input to generate an output.
The primary objective of swarm is to allow DApps to efficiently share the storage and bandwidth resources of their data in order to provide the necessary services to end users. This is accomplished by three crucial ideas implemented in swarm:
- Chunks: It is the basic unit of storage and retrieval in swarm with a maximum size of 4 KB
- Hash: It is the cryptographic hash of data chunks with a unique identifier and address
- Manifest: It is the path specifier for content retrieval of the hash
When any blob of data, termed as content, is uploaded to swarm, it is chopped up into pieces of data called chunks. A unidirectional swarm hash is generated for each chunk with an identifier and address for access. These hash addresses are immutable by nature. In simple words, modifying the content changes the hash address of each chunk. The hashes of these chunks themselves are bundled into another chunk, which in turn has its own hash. In this way, the content gets mapped to a chunk tree, which is basically a Merkle tree. Even for large content files, such as streaming videos, a hierarchical swarm hash prevents any loss of data integrity and allows protected random access.
To access the swarm content, we need the manifest. This file describes a document collection. The document can be a file system directory, a virtual server, or an index of a database. The manifest is like the table of contents of a book, giving the gist of what and where to find content. It is the metadata that allows uniform resource locator (URL) based content retrieval, by specifying paths and corresponding content hashes.
Swarm node addresses define a location in the same address space as the data. A swarm node participating in the network has its own base address termed as bzzkey. This bzzkey is basically derived as the hash of an Ethereum address, the so-called swarm base account of the node. There is no concept of deleting or removal in swarm. Once content is uploaded, there is no way we can initiate swarm to revoke it. Before swarm was released, Ethereum used interplanetary file system (IPFS).