Where did my disk space go?

May 11, 2012
Have you ever noticed missing disk space on newly bought hard disks? Here are some possible reasons:
  1. Unit confusion: The most common confusion in between GB/GiB or MB/MiB.
  2. Reserved space in linux partitions.
  3. Filesystem overheads.

Unit Confusion

Many people have questions like: Why my 1 TB hard disk shows only 930 GB? This is purely a confusion of units and bases used when words like 'kilo', 'mega' etc. are used. Following table, from AskUbuntu, explains the situation. Prefix Bytes Prefix Bytes 1 Byte = (2^10)^0 = 1 1 Byte = (10^3)^0 = 1 1 Kibibyte(KiB) = (2^10)^1 = 1024 1 Kilobyte(KB) = (10^3)^1 = 1000 1 Mebibyte(MiB) = (2^10)^2 = 1048576 1 Megabyte(MB) = (10^3)^2 = 1000000 1 Gibibyte(GiB) = (2^10)^3 = 1073741824 1 Gigabyte(GB) = (10^3)^3 = 1000000000 1 Tebibyte(TiB) = (2^10)^4 = 1099511627776 1 Terabyte(TB) = (10^3)^4 = 1000000000000 Many softwares and operating systems often calculate size in GiB and put GB when they are displaying it. This becomes even worse because most of the harddisks are labelled with actual GBs and hence when numbers are reported in GiB, it creates the confusion. Online calculators like this can help to remove the confusion. NIST states very interesting historical reason of this confusing prefix:

Once upon a time, computer professionals noticed that 210 was very nearly equal to 1000 and started using the SI prefix "kilo" to mean 1024. That worked well enough for a decade or two because everybody who talked kilobytes knew that the term implied 1024 bytes. But, almost overnight a much more numerous "everybody" bought computers, and the trade computer professionals needed to talk to physicists and engineers and even to ordinary people, most of whom know that a kilometer is 1000 meters and a kilogram is 1000 grams.
Then data storage for gigabytes, and even terabytes, became practical, and the storage devices were not constructed on binary trees, which meant that, for many practical purposes, binary arithmetic was less convenient than decimal arithmetic. The result is that today "everybody" does not "know" what a megabyte is. When discussing computer memory, most manufacturers use megabyte to mean 220 = 1 048 576 bytes, but the manufacturers of computer storage devices usually use the term to mean 1 000 000 bytes. Some designers of local area networks have used megabit per second to mean 1 048 576 bit/s, but all telecommunications engineers use it to mean 106 bit/s. And if two definitions of the megabyte are not enough, a third megabyte of 1 024 000 bytes is the megabyte used to format the familiar 90 mm (3 1/2 inch), "1.44 MB" diskette. The confusion is real, as is the potential for incompatibility in standards and in implemented systems.

Reserved space

While formatting Linux partitions like ext2/ext3/ext4, 5% of the drive's total space is reserved for the super-user (root) so that the operating system can still write to the disk even if it is full. This space is wasted if the drive is being used solely for data storage especially on large partitions. The details of the reserved space can be seen by running following command (replace sda1 by your drive name)sudo tune2fs -l /dev/sda1 The reserved space can be decreased to 1% of the drive's total space by running following command. To completely remove this reserved space replace 1 by 0. sudo tune2fs -m 1 /dev/sda1 As explained in Ubuntu Documentation, using this command does not change any existing data on the drive. You can use it on a drive which already contains data. This should be left to 5% on drives containing filesystem /, /var, /tmp in order to avoid any problems.

Filesystem overheads

Journaling file systems like ext2/ext3/ext4/NTFS used file tables for managing the filesystem. These tables take up lots of space (2-3% of drive size). Different file system allocates space to these tables differently. For example ext4 allocates all its space during the format itself, which shows up as used disk space. Where as NTFS allocates it when more files are written to disk.

The exact details are more complicated than this handy wavy explanation. Many forum posts has discussed this issue.

Read more ...