If you took the decision to build your own platform ( see article dangers of the clouds ), then you start from analysing all the data storage options, as that is most important part.
In simplistic terms the issue look trivial: you add HDDs with growth of load and you replicate everything.
But how to measure stability and efficiency of such system and how to predict expenses on scalability ?
Turns out that HDDs were not meant for long term data storage. Engineers from Seagate, IBM and other service centers, who are in charge of data recovery know that very well.
According to Dirichlet principle, probability of the absence of failure approaches to zero with growth of HDDs number. By the way, take a look at statistics from Google: «Failure Trends in a Large Disk Drive Population». They say, that after the first scan error, IDE PATA drives are 39 times more likely to fail within 60 days than drives without scan errors. After the first reallocation, drives are over 14 times more likely to fail within 60 days than drives without reallocation counts.
CPU.
According to Amdahl's law, theoretical maximum speedup of system, using multiple processors
calculated with the help of this formula:
where S -- the number of processors
and P -- the result of improvement in percents.
So, with 4 CPU's we have:
1/(0.5+((1-0.5)/4)) = 1.6
with 8 CPU's: 2.9
That will give you an understanding that the number of servers will grow exponentialy. Reason: when you add another processor to server you get small improvement of performance.
Power Consumption
CPU consume 50 watt approximately
DDR3 consume about 8 watts per DIMM
IDE PATA drive – ~7 watts. 50 + 8*2 + 7*2 = 80 watt
Conclusion
Building your own platform is hard, but you will see benefits within the first months if you do the technical part right.