LAMMPS on xeon phi application hangs and neighbour overflow

I have compiled latest lammps-unstable version from github with intel
16.0 compilers on xeon phi KNC card with USER-INTEL package. I have 2
issues:

1. When I run in.intel.rhodo file, provided in src/USER-INTEL/TEST
folder with, x y z as 6 6 5 (replication of data file in x y z
dimension), LAMMPS get stuck if I give large number of MPI ranks,i.e.
20 nodes, 240 mpi ranks, 20 KNC cards (12 core processor + 1 card per
node) - all good

30 nodes, 360 mpi ranks, 30 KNC cards - stuck at
"LAMMPS (12 Oct 2016)
using 1 OpenMP thread(s) per MPI task
Intel Package: Affinitizing MPI Tasks to 2 Cores Each"

30 nodes, 180 mpi ranks 30 KNC cards - all good

40 nodes 480 mpi, 40 cards - stuck at:
LAMMPS (12 Oct 2016)
  using 1 OpenMP thread(s) per MPI task
Intel Package: Affinitizing MPI Tasks to 2 Cores Each

40 nodes 240 ranks 40 cards:
usually hangs but some times it works

I dont know what is going on. Similar issue with GPU cards with
attributed to lack of RAM in previous mails but if that is issue, how
come reducing number of mpi ranks will help?