A common cause of website and webservice failure is running out of disk space. Either the webserver is no longer able to write to it’s log files and fails completely (resulting in an error page or no connection at all) or the database server may return an error. For example MySQL returns the fairly cryptic: Got error 28 from storage engine. If you are using availability monitoring, it will start alerting you because your page is no longer showing up properly.
Basically meaning the database server can’t write either new data or temporary files (often needed for complex queries). Because the error does not happen on all SQL queries, the error may only become visible on certain pages or actions.
Worst case, the inability to write data, may lead to database corruption, requiring a repair after you freed some space. This may be a problem if you have large MySQL MyISAM tables, because repairing those requires additional free space as much as the largest table you’ve got. The one thing you just ran out of, causing the automatic repair to fail. How to repair MySQL tables will be subject of another post.
After you became aware of the above issue, it’s important to find out what partition is full and what is taking up all that space. I’m assuming that you are on a Linux-based server.
The first thing is to find out the partition in trouble, by running:
df -h
This should show you something like: (note the 100% on rootfs)
Filesystem Size Used Avail Use% Mounted on rootfs 20G 20G 0M 100% / none 6.3G 344K 6.3G 1% /run none 5.0M 0 5.0M 0% /run/lock none 32G 176K 32G 1% /run/shm none 100M 0 100M 0% /run/user /dev/md2 92G 829M 87G 1% /data /dev/md3 1.8T 155G 1.5T 10% /home
To figure out what particular files are big, you could run: du -hs * in root ( / ) and then descend into the biggest directory and refine. However, on a production server, this will often take too long and should only be used as a last resort. Most often running these three should already tell you what is going on:
du -hs /tmp => if this is particularly big, you are probably not properly cleaning temporary files after use, or writing logs that are never rotated
du -hs /var/log => if this is particularly big, you may be keeping log files forever or some log has gone haywire, you could consider transporting logs to an external server for long-time archival
du -hs /var/lib/mysql => this is your mysql database, usually you can’t do much about this, except move it to a different server or partition
du -hs /var/* => if none of the above, often it’s still somewhere in this part of your file system
What often happens is that the server comes with a certain partitioning and disks by default, that has a small root partition and sometimes also a small partition just for /tmp. Default installs usually put their database files onto the root partition, so if your database is particularly big, you may quickly fill up the root partition with database and log files, while there is sufficient space somewhere else. This means you will need to move your database and then use a symlink (ln -s), mount –bind or change your server configuration (edit the mysql.conf file) to point to the new database location.
Running out of space happens to everyone at some point, however to avoid a lot of stress, you should make sure you know in advance. This will prevent errors on your service and potential data corruption. To do so, you can use Observu server monitoring, which will set alerts at specific levels of disk usage. The notification will include information which partition is almost full and needs your attention. To resolve the issue you can then (without stress) use the hints provided above.
A special recommendation for those still using MySQL MyISAM tables: make sure you leave enough space for repair/recovery. This means you will need to set warning levels as early as 60 or 70% of disk usage if table size is unbalanced.