Added a bunch of minor improvements in the logging routines, especially for MPI and CUDA.
Added a printing routine for CUDA device information.