This is the sort of thing where artificially generated samples work well. Spawn a random set of blocks, run a physics engine for a bit to have them pile up, add random lights and render to an image. The renderer can also give you a ground truth annotation for each pixel to the corresponding block type.