Return to Section accelerate overview
The GPU package was developed by Mike Brown at ORNL and his collaborators, particularly Trung Nguyen (ORNL). It provides GPU versions of many pair styles, including the 3-body Stillinger-Weber pair style, and for kspace_style pppm for long-range Coulombics. It has the following general features:
Here is a quick overview of how to use the GPU package:
The latter two steps can be done using the "-pk gpu" and "-sf gpu" command-line switches respectively. Or the effect of the "-pk" or "-sf" switches can be duplicated by adding the package gpu or suffix gpu commands respectively to your input script.
Required hardware/software:
To use this package, you currently need to have an NVIDIA GPU and install the NVIDIA Cuda software on your system:
Building LAMMPS with the GPU package:
This requires two steps (a,b): build the GPU library, then build LAMMPS with the GPU package.
You can do both these steps in one line, using the src/Make.py script, described in Section 2.4 of the manual. Type "Make.py -h" for help. If run from the src directory, this command will create src/lmp_gpu using src/MAKE/Makefile.mpi as the starting Makefile.machine:
Make.py -p gpu -gpu mode=single arch=31 -o gpu lib-gpu file mpi
Or you can follow these two (a,b) steps:
(a) Build the GPU library
The GPU library is in lammps/lib/gpu. Select a Makefile.machine (in lib/gpu) appropriate for your system. You should pay special attention to 3 settings in this makefile.
See lib/gpu/Makefile.linux.double for examples of the ARCH settings for different GPU choices, e.g. Fermi vs Kepler. It also lists the possible precision settings:
CUDA_PREC = -D_SINGLE_SINGLE # single precision for all calculations CUDA_PREC = -D_DOUBLE_DOUBLE # double precision for all calculations CUDA_PREC = -D_SINGLE_DOUBLE # accumulation of forces, etc, in double
The last setting is the mixed mode referred to above. Note that your GPU must support double precision to use either the 2nd or 3rd of these settings.
To build the library, type:
make -f Makefile.machine
If successful, it will produce the files libgpu.a and Makefile.lammps.
The latter file has 3 settings that need to be appropriate for the paths and settings for the CUDA system software on your machine. Makefile.lammps is a copy of the file specified by the EXTRAMAKE setting in Makefile.machine. You can change EXTRAMAKE or create your own Makefile.lammps.machine if needed.
Note that to change the precision of the GPU library, you need to re-build the entire library. Do a "clean" first, e.g. "make -f Makefile.linux clean", followed by the make command above.
(b) Build LAMMPS with the GPU package
cd lammps/src make yes-gpu make machine
No additional compile/link flags are needed in Makefile.machine.
Note that if you change the GPU library precision (discussed above) and rebuild the GPU library, then you also need to re-install the GPU package and re-build LAMMPS, so that all affected files are re-compiled and linked to the new GPU library.
Run with the GPU package from the command line:
The mpirun or mpiexec command sets the total number of MPI tasks used by LAMMPS (one or multiple per compute node) and the number of MPI tasks used per node. E.g. the mpirun command in MPICH does this via its -np and -ppn switches. Ditto for OpenMPI via -np and -npernode.
When using the GPU package, you cannot assign more than one GPU to a single MPI task. However multiple MPI tasks can share the same GPU, and in many cases it will be more efficient to run this way. Likewise it may be more efficient to use less MPI tasks/node than the available # of CPU cores. Assignment of multiple MPI tasks to a GPU will happen automatically if you create more MPI tasks/node than there are GPUs/mode. E.g. with 8 MPI tasks/node and 2 GPUs, each GPU will be shared by 4 MPI tasks.
Use the "-sf gpu" command-line switch, which will automatically append "gpu" to styles that support it. Use the "-pk gpu Ng" command-line switch to set Ng = # of GPUs/node to use.
lmp_machine -sf gpu -pk gpu 1 -in in.script # 1 MPI task uses 1 GPU mpirun -np 12 lmp_machine -sf gpu -pk gpu 2 -in in.script # 12 MPI tasks share 2 GPUs on a single 16-core (or whatever) node mpirun -np 48 -ppn 12 lmp_machine -sf gpu -pk gpu 2 -in in.script # ditto on 4 16-core nodes
Note that if the "-sf gpu" switch is used, it also issues a default package gpu 1 command, which sets the number of GPUs/node to 1.
Using the "-pk" switch explicitly allows for setting of the number of GPUs/node to use and additional options. Its syntax is the same as same as the "package gpu" command. See the package command doc page for details, including the default values used for all its options if it is not specified.
Note that the default for the package gpu command is to set the Newton flag to "off" pairwise interactions. It does not affect the setting for bonded interactions (LAMMPS default is "on"). The "off" setting for pairwise interaction is currently required for GPU package pair styles.
Or run with the GPU package by editing an input script:
The discussion above for the mpirun/mpiexec command, MPI tasks/node, and use of multiple MPI tasks/GPU is the same.
Use the suffix gpu command, or you can explicitly add an "gpu" suffix to individual styles in your input script, e.g.
pair_style lj/cut/gpu 2.5
You must also use the package gpu command to enable the GPU package, unless the "-sf gpu" or "-pk gpu" command-line switches were used. It specifies the number of GPUs/node to use, as well as other options.
Speed-ups to expect:
The performance of a GPU versus a multi-core CPU is a function of your hardware, which pair style is used, the number of atoms/GPU, and the precision used on the GPU (double, single, mixed).
See the Benchmark page of the LAMMPS web site for performance of the GPU package on various hardware, including the Titan HPC platform at ORNL.
You should also experiment with how many MPI tasks per GPU to use to give the best performance for your problem and machine. This is also a function of the problem size and the pair style being using. Likewise, you should experiment with the precision setting for the GPU library to see if single or mixed precision will give accurate results, since they will typically be faster.
Guidelines for best performance:
Restrictions:
None.