Overview
Berachain has launched on 6 February this year after more than a year of testnets.
It's a community-driven growth blockchain that got a lot of attention.
but there are also some unique aspects of validator operations that are worth sharing.
Berachain is a technically modular blockchain that uses EVM for the execution layer and Cosmos for the consensus layer.
It has also various DeFi ecosystem and a unique reward system called POL, based on a soul-bound token called BGT.
We`d like to share some of the issues we've encountered on our journey to becoming a Berachain Mainnet Validator.
Introduction
Many issues arise during the initial onboarding process from validator testnet.
In order to be selected for the mainnet validator, stable node operation is essential, and it is necessary to respond to issues quickly.
Berachain went through three testnet phases (aArtio, bArtio, and cArtio).
each of which had quite different issues.
(I won't discuss cArtio here because it's not that different from bArtio, the mainnet simulation).
I'd like to share the process of resolving the issues we encountered while running the nodes.
Issues and Solutions
Hardware spec
Recently, the DPos consensus layer-based nodes of the cosmos family of chains have been demanding quite high performance hardware specifications.
Berachain also requires high hardware specifications.
Due to the short block interval (2s) and the use of Pebble DB, it requires high-speed IO.
For storage, the NVMe SSD series is a good choice.
It requires a CPU with at least 8 cores to run multi-threaded context and to handle for the peaks caused by GC time(Garbage Collection),
We recommend at least 32GB of memory due to caching that state management, ABCI processing, and peer networks.
Region Issue
For nodes in the cosmos chain family, region has a significant impact on the performance of the node.
Most high-speed block producing nodes have a significant advantage or disadvantage in consensus participation depending on their region.
Closer to the cluster is better for the performance of the Validator node operations.
For Berachain, it is advantageous to operate in Europe and North America.
Peer Connection Issue
You should set your IP to expose itself to the outside for better peer connectivity.
For CL side, set external_address in config.toml,
For EL side, you should set extip option to improve connectivity with other peers.
You should maintain at least 40 peer connections
aArtio Phase Issue
Berachain was not a modular blockchain during the aArtio testnet.
It was a fully COSMOS-based chain, which was quite unstable at the time.
In the early stages of a testnet, it's important to be prepared for anomalies.
We configured a dashboard based on Prometeus + Grafana and detected anomalies based on metrics.
Check our below grafana link
https://grafana.com/grafana/dashboards/20305-berachain-validator-monitoring/
In addition, in the initial testnet of cosmos-based nodes, nodes that usually lack voting power are disadvantaged in the consensus and easily fall into a jail state.
Therefore, we recommend configuring jail detection and unjail scripts.
We also utilised KILNFI's COSMOS-based monitoring tool to monitor detection of jails.
https://github.com/kilnfi/cosmos-validator-watcher
After bArtio Phase Issue
After bArtio, BeraChain changed the architecture of the node by creating beaconkit (https://github.com/berachain/beacon-kit) with EVM as the execution layer and Cosmos tendermint as the consensus layer.
While beaconkit supports multiple Ether clients, we used geth for stability.
Recently, RETH has been used a lot because it has better transaction processing performance, but when we ran the two clients on Berachain, there was no significant difference performance in transaction processing. In RETH, we experienced peer connection issues when participating as a genesis validator.
Also, storage capacity geth takes up less, about half as much (based on snap sync).
However, in the case of geth, we recommend increasing the rpc-timeout value to 2s in app.toml because the sync up issue occurs at the CL. EL cannot occasionally keep up with the speed of creating CL blocks.
And if you plan to expose RPC to public, we recommend RETH, which is better at handling large TXs.
For geth monitoring grafana dashboard, we used this https://grafana.com/grafana/dashboards/15750-geth-server/
(Don`t forget --metrics option when running geth client)
Another useful tool for participating in genesis for cosmos family blockchain is tmtop (https://github.com/QuokkaStake/tmtop), which allows you to efficiently monitor the consensus layer in real-time.
Early on in your Genesis participation, it's very important to check if you're not able to join the consensus layer, so if you notice any issues with the state of your prevotes and precommits, you should have to check your node problems.
Other Point Note: POL Ecosystem
POL is a reward ecosystem system that is unique to Berachain.
How a Validator chooses to structure their rewards has a significant impact on their incentives.
Understanding the POL system is crucial for Validator operational revenue and collaboration with other DeFi companies.
For more information, you can check out this article: https://docs.berachain.com/learn/pol/#pol-lifecycle
This research article https://research.despread.io/report-berachain-eco/ will also help you understand the POL ecosystem.
For a long-term validator operation, it will be key to gather BGTs, gather governance power, and how to structure rewards.
It's also important to remember that BGT has nothing to do with the block voting power of a validator, and that to increase voting power, you need to increase the amount of Bera staking.
Conclusion
While participating in the Berachain validator, We`ve ran a COSMOS + EVM modular node and experience various issues that can be encountered with COSMOS and EVM. I hope that my experience can be a good reference for running a Validator node with a similar architecture.