We recently experienced a bug in MariaDB 10.6.13 (fixed in 10.6.14) that caused our ibdata1 file to grow 5+GB/day. The file grew to over 400GB by the time alerts on disk space usage started to get sent out.
Our databases in total are about 865GB so a logical backup / restore with mysqldump would have taken days at least to do. If you have a large database you can use mydumper to do a logical dump/restore much more quickly since apparently that is still the only way to shrink the ibdata1 file.
Here are some notes / procedures I followed when I rebuilt our DB servers to get ibdata1 back down to a reasonable size:
I use
innodb_file_per_table=1already.I used this tool to determine the space usage in
ibdata1was indeed undo log pages (it took hours to run, this was before I was aware that there was a bug in 10.6.13 and didn't know for sure what was causingibdata1to grow so much).We have one primary and three replicas.
I first took one replica out of production, used
mydumperto dump/restore it, then I usedmariabackupto back up the rebuilt replica. I used that backup to restore the rest of the DB servers (which is much faster).We already have a full backup routine using
mariabackup. Before you do anything, make sure you have a known good full backup!
Rebuilding the first server
Stop all writes to the database.
I do flush tables; set global read_only = 1; to make 100% sure nothing is writing to the DB during backup (and stop replica; on replicas of course, read_only=1 is already set on replicas in our config).
Use mydumper to dump databases. I used this script which will send an email when it's done (if you've got email set up):
#!/usr/bin/bashtime mydumper -t "$(nproc)" \ --compress \ --triggers \ --routines \ --events \ --rows 10000000 \ --verbose 3 \ --build-empty-files \ --regex '^(?!(test|performance_schema|information_schema|sys))' \ --logfile /mnt/mysql_backups/ibdatafix_backup.log \ --outputdir /mnt/mysql_backups/ibdatafix_backup 2>&1echo "DUMP ON $(hostname) DONE" | mail -s "DUMP ON $(hostname) DONE" rootThis took 12 minutes to dump 865GB worth of DBs on a pretty beefy enterprise server with 4 x SSDs in hardware RAID 10. On the older server this takes less than one hour.
Grep the log for any errors.
I tried simply dropping our databases, shutting down MariaDB, and removing ibdata1 but that caused problems so I shut down MariaDB, wiped the entire data dir (rm -rf /var/lib/mysql/*), and ran mysql_install_db to start fresh (make sure you have good backups!).
Then I added this config to move the undo logs out of ibdata1. Adjust for your environment, of course, if you want to do this. At this point I think it's a good idea to do so mainly because there's no way to truncate ibdata1 and I don't want to have to rebuild all of our servers again any time soon.
# About 3GB.innodb_max_undo_log_size = 3221225472# I moved the undo logs into a separate volume. You can set this to# /var/lib/mysql.innodb_undo_directory = /var/lib/mysql_undo_logs# This appears to only work if innodb_undo_directory is set and# the undo logs are separate from ibdata1.innodb_undo_log_truncate = 1innodb_undo_tablespaces = 3Add these to the config temporarily for the restore:
innodb_flush_log_at_trx_commit=0net_read_timeout=60performance_schema=offinnodb_adaptive_hash_index=offsql_mode=NO_ENGINE_SUBSTITUTIONinnodb_doublewrite=offslow_query_log = 0Now start up MySQL / MariaDB and restore:
#!/usr/bin/bashtime myloader \ --compress-protocol \ --directory /mnt/mysql_backups/ibdatafix_backup \ --overwrite-tables \ --queries-per-transaction 1000 \ # The recommendation here is half of your available cores. I set this # too high and it lead to timeouts and a botched restore. --threads 4 \ --verbose 3 > restore.log 2>&1echo "RESTORE ON $(hostname) DONE" | mail -s "RESTORE ON $(hostname) DONE" rootThis took around 12 hours to complete.
Make sure to check restore.log for errors (grep -i error restore.log). My first restore was totally botched because of errors I didn't catch.
Remove the temp config and restart MySQL/MariaDB.
Before I did anything else, I ran pt-table-checksum to make sure the primary and all replicas had identical data (esp. the newly restored replica).
If you need to rebuild other servers (other replicas/the primary) you can use mariabackup (or Percona XtraBackup) which is much faster. Something like this (I ran this on a replica so I added all the slave options):
# On the freshly rebuilt server:rm -rf /mnt/mysql_backups/ibdatafix_backup/*mariabackup --defaults-file=/etc/mysql/my.cnf --parallel=4 \ --compress --compress-threads=4 --safe-slave-backup --slave-info \ --user=root --backup --target-dir=/mnt/mysql_backups/ibdatafix_backup# Then on the other server(s), shut down the DB, wipe /var/lib/mysql, and copy the backup back over:systemctl stop mariadb/mysqlcd /var/lib/mysqlrm -rf *# This is an NFS share mounted on all DB servers.rsync -av /mnt/mysql_backups/ibdatafix_backup .mariabackup --decompress --parallel=8 --remove-original --target-dir=.mariabackup --prepare --use-memory=32G --target-dir=.chown -R mysql:mysql .systemctl start mariadb/mysql# Set up replication etc.This all took about two hours per server to complete (865GB in databases, enterprise servers, 10gig network, 4 x SSD in RAID 10).
Undo log truncation is definitely working now. I had set innodb_max_undo_log_size to 10MB initially and MariaDB was constantly truncating them. It'll log when it does so.
ibdata1 is now holding steady on our highly active servers at just 12MB.