Customer: Tharsis Labs Ltd

Preparer: Orijtech Inc

Date of audit and changes: October 10th 2021 to January 4th 2022

Table of contents:

Introduction:

Evmos is a scalable and interoperable Ethereum, built on Proof-of-Stake with fast finality; Ethereum-On-Cosmos.

Mission:

Improve Evmos performance and increase throughput (Transactions Per Second)

Methodology:

Orijtech Inc collected and examined hundreds+ of continuous profiles from evmosd, the Evmos daemon but also read through lots of the source code, audited dependencies to make performance improvements, and yak the hairy code bison for what could be improved. Some of the changes are in dependencies that need to be merged in, but the performance improvements are nonetheless demonstrable. Our method was profile and instrument-guided improvements, plus audits of code and dependencies. We’ve highlighted some changes that stood out but a whole lot of the time was spent on auditing, trying to break and then verify functionality

Implementation of a Transactions-Per-Second counter:

Firstly the biggest objective and insight for all these changes is to improve the TPS (Transactions Per Second) In order to make this improvement, we’ve got to measure what we are improving. To add this counter, we mailed https://github.com/tharsis/evmos/pull/74 plus https://github.com/tharsis/evmos/pull/122 plus https://github.com/tharsis/evmos/pull/149 which measure the transactions per second by a counter in DeliverTx that records both failed and successful transactions. We tacked on ways to examine TPS scalably and visually using Prometheus hooked in by OpenCensus and the instructions are detailed at https://evmos.dev/quickstart/run_node.html#recording-transactions-per-second-tps

tendermint/tendermint/internal/libs/protoio MarshalDelimited optimization

PR https://github.com/tendermint/tendermint/pull/7325

By examining the code patterns and noticing that they used NewMarshalDelimitedWriter which required passing in a *bytes.Buffer. In there when a struct had a .MarshalTo method, there would be no need to create a writer. In the case of structs that don't implement the target method, buffers would expensively be created then discarded; this pattern required applying a sync.Pool with reusable buffers and that led to huge performance gains for example per

$ benchstat before.txt after.txt
name                                        old time/op    new time/op     delta
types.VoteSignBytes-8                       705ns ± 3%     573ns ± 6%      -18.74% (p=0.000 n=18+20)
types.CommitVoteSignBytes-8                 8.15µs ± 9%    6.81µs ± 4%     -16.51% (p=0.000 n=20+19)
protoio.MarshalDelimitedWithMarshalTo-8     788ns ± 8%     772ns ± 3%      -2.01%  (p=0.050 n=20+20)
protoio.MarshalDelimitedNoMarshalTo-8       989ns ± 4%     845ns ± 2%      -14.51% (p=0.000 n=20+18)

name                                        old alloc/op   new alloc/op    delta
types.VoteSignBytes-8                       792B ± 0%      600B ± 0%       -24.24%  (p=0.000 n=20+20)
types.CommitVoteSignBytes-8                 9.52kB ± 0%    7.60kB ± 0%     -20.17%  (p=0.000 n=20+20)
protoio.MarshalDelimitedNoMarshalTo-8       808B ± 0%      440B ± 0%       -45.54%  (p=0.000 n=20+20)

name                                        old allocs/op  new allocs/op   delta
types.VoteSignBytes-8                       13.0 ± 0%      10.0 ± 0%       -23.08%  (p=0.000 n=20+20)
types.CommitVoteSignBytes-8                 140 ± 0%       110 ± 0%        -21.43%  (p=0.000 n=20+20)
protoio.MarshalDelimitedNoMarshalTo-8       10.0 ± 0%      7.0 ± 0%        -30.00%  (p=0.000 n=20+20)