Exploiting on-chip memory concurrency in 3d manycore architectures