How to fix a "database disk image is malformed"

Problem

Sometimes a Storage Node Operator may encounter the "database disk image is malformed" error in their log. This could happen during unplanned shutdown or reboot. The error indicates that one or more of the sqlite3 databases may have become corrupted.

Solution

Firstly, we should try to verify the database with an embedded SQLite3 command. So, we need to have sqlite3 installed (v3.25.2 or later). The installation steps depend on the OS.

  1. Stop the storagenode
  2. Make a backup of all the sqlite3 databases. They are located in the storage folder for your data storage. For example x:\storagenode\storage\bandwidth.db, where x:\storagenode is the data folder you had specified in the --mount type=bind,source=x:\storagenode,destination=/app/config option of the docker run command for your storagenode, or x:\storagenode\storage in case of using the Windows GUI, in the storage.path: option of the config.yaml file.
  3. Check each database for errors. We will use bandwidth.db as an example in this guide.

We will use Docker instead of direct installation (this option is available only for x86_64 CPUs, for arm-based boards you will need to install sqlite3 via the package manager of your OS). See the next tab.

replace ${PWD} with an absolute path to the databases location, or simple switch the current location to there

docker run --rm -it --mount type=bind,source=${PWD},destination=/data sstc/sqlite3 find . -maxdepth 1 -iname "*.db" -print0 -exec sqlite3 '{}' 'PRAGMA integrity_check;' ';'
docker run --rm -it --mount type=bind,source=${PWD},destination=/data sstc/sqlite3 find . -maxdepth 1 -iname "*.db" -print0 -exec sqlite3 '{}' 'PRAGMA integrity_check;' ';'
  1. If you see errors in the output, then the check did not pass. We will unload all uncorrupted data and then load it back. But this could sometimes fail, too. If no errors occur here, you can skip all the following steps and start the storagenode again.

  2. If you were not lucky and the check failed, then please try to fix the corrupted database(s) as shown below.

  3. Open a shell

Open a shell Inside the container:

docker run --rm -it --mount type=bind,source=x:\storagenode\storage,destination=/storage sstc/sqlite3 sh
docker run --rm -it --mount type=bind,source=x:\storagenode\storage,destination=/storage sstc/sqlite3 sh

Tip. You can use tmpfs to restore your databases. It uses memory instead of disk and should take a lot less time than on HDD (you can read more about usage of tmpfs with Docker in the Use tmpfs mounts guide or this forum comment). For Windows or MacOS you must increase the allocated RAM for the docker's VM via Docker desktop application to fit a double size of the greatest corrupted database file in case of usage of tmpfs.

  1. Now run the following commands in the shell. You need to repeat steps 7 to 12 for each corrupted sqlite3 database:
cp /storage/bandwidth.db /storage/bandwidth.db.bak
sqlite3 /storage/bandwidth.db
cp /storage/bandwidth.db /storage/bandwidth.db.bak
sqlite3 /storage/bandwidth.db
  1. You will see a prompt from sqlite3. Run this SQL script:
.mode insert
.output /storage/dump_all.sql
.dump
.exit
.mode insert
.output /storage/dump_all.sql
.dump
.exit
  1. We will edit the SQL file dump_all.sql
{ echo "PRAGMA synchronous = OFF ;"; cat /storage/dump_all.sql; } | grep -v -e TRANSACTION -e ROLLBACK -e COMMIT >/storage/dump_all_notrans.sql
{ echo "PRAGMA synchronous = OFF ;"; cat /storage/dump_all.sql; } | grep -v -e TRANSACTION -e ROLLBACK -e COMMIT >/storage/dump_all_notrans.sql
  1. Remove the corrupted database (make sure that you have a backup!)
rm /storage/bandwidth.db
rm /storage/bandwidth.db
  1. Now we will load the unloaded data into the new database
sqlite3 /storage/bandwidth.db ".read /storage/dump_all_notrans.sql"
sqlite3 /storage/bandwidth.db ".read /storage/dump_all_notrans.sql"
  1. Check that the new database (bandwidth.db in our example) has a size larger than 0:
ls -l /storage/bandwidth.db
ls -l /storage/bandwidth.db
  1. Exit from the container (skip this step, if you use a directly installed sqlite3)
exit
exit
  1. If you are lucky and all corrupted sqlite3 databases are fixed, then you can start the storagenode again.

Warning. If you were not successful with the fix of the database, then your stat is lost.

You need to follow the guide How to fix database: file is not a database error.

Prevention

On Windows: disable the write cache. Consider migrating to the Windows GUI instead of using Docker.

On Unraid: update to the latest version of the platform (the bug is fixed in the 6.8.0-rc5 as seen in this comment) or rollback to version 6.6.7.

On Docker: use the updated docker run command from the documentation: Storage Node

Common Problems

Make sure that you are not using NFS or SMB to connect to the storage, they are not compatible with SQLite. The only working network protocol is iSCSI.

Make sure that your external USB drive has enough power and it does not turn off during operations. It's better to avoid using them and use only internal drives.

Previous
How to change the payment address for storagenode (v3 network)