Skip to content

Subtensor Docker crashes on initial sync-up / import (78%) on systems with 8GB RAM no swap #1221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhedgehog opened this issue Jan 31, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@zhedgehog
Copy link

Describe the bug

There seems to be an issue with large memory consumption during the initial sync and import causing the docker container to crash.

I have confirmed this on two separate systems with exact configuration:

4 vCPU
8GB RAM
160GB SSD
no swap (0kb)
Debian 12
Docker

To Reproduce

  1. Create VM (Debian 12) - 4 vcpu/8GB ram/160gb storage/0kb swap (no swap)
  2. Install all update (apt update && apt dist-upgrade -y)
  3. setup docker: https://docs.docker.com/engine/install/debian/
  4. pull repo and start mainnet litenode (script used below)

``

cleanup old subtensor dockers

for d in $(docker container ls -a|grep -i subtensor|cut -d ' ' -f1);
do
echo "[ cleaning up container: $d ]"; docker container rm $d -f;
echo "";
done;

for i in $(docker images|grep -i 'subtensor'|cut -d ' ' -f1);
do
echo "[ cleaning up image: $i ]";
docker rmi $i;
echo "";
done;
for v in $(docker volume ls|cut -d ' ' -f6|grep -i 'subtensor');
do
echo "[ cleaning up volume: $v ]";
docker volume rm $v;
echo "";
done;

for n in $(docker network ls|grep -i 'subtensor'|cut -d ' ' -f1);
do
echo "[ cleaning up network: $n ]";
docker network rm $n;
echo "";
done;

cleanup repo

echo "[ cleaning up repository... ]"
cd ~/apps
rm -rf ~/apps/subtensor
echo

reclone

echo "[ cloning repository... ]"
git clone https://github.com/opentensor/subtensor.git
cd subtensor

change port mapping

echo "Map 11144 to 9944 for Docker container"
sed -i 's/9944:9944/11144:9944/g' ./docker-compose.yml
sed -i 's/--sync warp/--sync warp --rpc-max-connections 2000/g' ./docker-compose.yml
echo

to run a lite node on the mainnet:

echo "[ setting up a lite node on the mainnet... ]"
./scripts/run/subtensor.sh -e docker --network mainnet --node-type lite

set restart policy to always

for d in $(docker ps -a|grep -i 'subtensor'|cut -d ' ' -f1);
do
echo "[ updating container $d restart=always ]";
docker update --restart=always $d;
echo "";
done;

echo "[ DONE ]"
echo
``

Expected behavior

Successful sync and node should be up and running!

Screenshots

Actual behaviour:

Docker container crashes during import due to OOM:

Last entry in docker log:

2025-01-30 21:42:09 ⚙️ State sync, Importing state, 78%, 408.71 Mib (218 peers), best: #0 (0x2f05…6c03), finalized #0 (0x2f05…6c03), ⬇ 114.0kiB/s ⬆ 4.9kiB/s

dmesg:

[Thu Jan 30 21:46:50 2025] Out of memory: Killed process 3226 (node-subtensor) total-vm:820705016kB, anon-rss:7564512kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:15228kB oom_score_adj:0

Environment

Debian 12 (x64/AMD64) // Latest subtensor - 1.2.4 (also tested 1.2.3)

Additional context

No response

@zhedgehog zhedgehog added the bug Something isn't working label Jan 31, 2025
@ales-otf
Copy link
Contributor

This is not a docker configuration issue, but the fact that during the node sync, the whole state is persisted in the memory. This is on the substrate's side. There is an ongoing issue to change this: paritytech/polkadot-sdk#4

Similar issue: paritytech/polkadot-sdk#5053

And the cause of it (starting from this comment): paritytech/polkadot-sdk#5053 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants