SQL server storage principles

An international client of mine had server hard drive failure late last Friday.

The company, albeit large, was running this smaller application, and its SQL server on external hard drive because of previous failure the month before. The hardware was brand new, but workstation grade, not server grade. The management had disallowed any new purchases until the budget would be formally allocated.

Needless to say, the hardware failed, and their management fails to understand that the problem has been created by them. Their IT department requested a letter stating the cases for their management. So here it is:

As discussed, with your agreement, during the recovery of your SQL server, database applications are very hard applications on the wear rates of hard-drives.  Typically hard drives are created for either the workstation (or home user) market, or the server market. Hard drives usually come with warranties ranging from 1 year to 5 years, depending on the design. These warranties usually indicate that their components are of different grade, and that their MTBF (most electronic equipment has a Mean-Time-Before-Failure estimation) is significantly different.

The problem with using workstation grade hard drivers in a database application that server many users, is that the busiest parts of the database will over time usually be located over the same contiguous physical sectors, and these sectors are likely areas for failure. A typical home user might take several years to create the number of reads and writes to a particular sector that a busy database application would cause in just one week.

The risk of data corruption is high. It results in lost data of the past, and labour hours which need to re-occur to re-capture data, as well as reconcile data. In addition, during recovery time, labour hours are lost. Therefore both past work is last, and future work is created, and staff are often demoralised by such an event. There is also the risk of debtors invoice information being lost, and the opportunity for untrustworthy staff to take advantage of a situation where they can see the system might have reconciling difference that need to be written off anyway, the perfect time for theft.

To mitigate the risk  of such failure, two principles are best adhered to, firstly adequate backups (which given the hardware constraints), the best was already being done, and secondly, trying to reduce the requirement for ever needing those backups.

Hard drives are where the information is kept, so it’s best to use server-grade hard drives, which also boast additional speed, as well as server grade architectures for storage, such as RAID 5 or RAID 6 arrays which allow a single (RAID 5) or two (RAID 6) hard drives to fail without losing any data – and without creating any “down-time”. Hard drives are one of the most common failure points because of their mechanical nature, they are “doomed” over time to fail. Using a RAID array, allows a company to mitigate such risk.