1) For hardware you want cheap, expendable, bare metal.
Look up posts about how Google built their own servers
for reference.
2) For RAID, go with software only RAID. You will sidestep
problems caused by hardware RAID controllers having
custom data format each (i.e. non-swapable for different model/make).
3) For filesystem, look for OpenAFS. CERN is using OpenAFS
to store petabytes of data from LHC.
4) For operating system, look at Debian. Coupled with FAI (fully automated installation), it will enable you to deploy
multiple servers in an automated way, to host your files.