Dear Rams,
From your input/output files, it doesn't seem you are doing anything particularly wrong. It is more that the system is perhaps a bit more challenging. Here are a few ideas that might be of use regarding speed up and convergence:
i) You are having quite large lattice parameters (15 Angstrom in each direction). In this case having a 4x4x4 mesh of k-points is perhaps even more than needed to converge the total energy, so you can try reducing it to a 3x3x3 or 2x2x2 mesh to reduce the computational cost and get some speed up (without compromising quality). You could probably go to 1 k-point just as well, but then you might not get a correct Fermi level sampled. Which brings me to the second point...
ii) Do you expect your system to be metallic? If so, you can test a few values of the smearing parameter to see whether it makes a difference to the convergence behaviour. If you know you need to get a semiconductor, then you might want to try shifting the occupied states down in energy (keyword LEVSHIFT to prevent the system from getting stuck in a conducting state and causing the SCF to get stuck...
iii) You have the DIISALLK keyword in your input file, have you tested that as necessary for the convergence of the undoped cell? It does increase the memory/storage requirements, so could be worth trying with just the default (gamma-point) DIIS...
iv) One alternative is to turn off the DIIS convergence accelerator and revert to "classical" Fock mixing. Albeit slower in convergence than DIIS, it can stabilize the SCF procedure avoiding strong, sudden oscillation in the solution (while DIIS is turned on, the FMIXING is used only in the very first SCF step)...
v) Coming back to the charge oscillations you observed, they do indeed start appearing very quickly in your second run.
Here are the corresponding total energies from your OUTPUT_Run1.d12:
CYC 0 ETOT(AU) -1.128999989618E+05 DETOT -1.13E+05 tst 0.00E+00 PX 1.00E+00
CYC 1 ETOT(AU) -1.129553711002E+05 DETOT -5.54E+01 tst 0.00E+00 PX 1.00E+00
CYC 2 ETOT(AU) -1.129699205883E+05 DETOT -1.45E+01 tst 7.60E-05 PX 9.78E-02
CYC 3 ETOT(AU) -1.123277877060E+05 DETOT 6.42E+02 tst 4.05E+00 PX 8.33E+00
CYC 4 ETOT(AU) -1.123934069238E+05 DETOT -6.56E+01 tst 1.68E-01 PX 6.66E+00
CYC 5 ETOT(AU) -1.128505608381E+05 DETOT -4.57E+02 tst 2.53E-01 PX 1.17E+00
CYC 6 ETOT(AU) -1.126570396187E+05 DETOT 1.94E+02 tst 1.41E-01 PX 1.13E+00
CYC 7 ETOT(AU) -1.129397986733E+05 DETOT -2.83E+02 tst 1.36E-01 PX 1.16E+00
CYC 8 ETOT(AU) -1.128184205493E+05 DETOT 1.21E+02 tst 9.81E-02 PX 6.95E-01
CYC 9 ETOT(AU) -1.129501102455E+05 DETOT -1.32E+02 tst 9.49E-02 PX 6.74E-01
CYC 10 ETOT(AU) -1.129223250516E+05 DETOT 2.78E+01 tst 5.80E-01 PX 6.62E-01
And then from OUTPUT_Run2.d12
CYC 0 ETOT(AU) -1.129223250516E+05 DETOT -1.13E+05 tst 0.00E+00 PX 1.00E+00
CYC 1 ETOT(AU) -8.145313982597E+04 DETOT 3.15E+04 tst 0.00E+00 PX 1.71E+02
CYC 2 ETOT(AU) -1.058417907360E+05 DETOT -2.44E+04 tst 2.27E+00 PX 1.71E+02
CYC 3 ETOT(AU) -7.201760490303E+04 DETOT 3.38E+04 tst 9.37E+00 PX 3.62E+02
CYC 4 ETOT(AU) -3.830339502397E+04 DETOT 3.37E+04 tst 1.36E+02 PX 7.83E+02
CYC 5 ETOT(AU) -5.612117026766E+04 DETOT -1.78E+04 tst 2.27E+01 PX 5.54E+02
CYC 6 ETOT(AU) -1.006357459536E+05 DETOT -4.45E+04 tst 3.23E+01 PX 5.67E+02
CYC 7 ETOT(AU) -1.131945182713E+05 DETOT -1.26E+04 tst 9.68E+00 PX 2.13E+02
CYC 8 ETOT(AU) -1.203304661632E+05 DETOT -7.14E+03 tst 8.51E+00 PX 1.06E+02
CYC 9 ETOT(AU) -1.148704842219E+05 DETOT 5.46E+03 tst 1.77E+01 PX 6.53E+01
CYC 10 ETOT(AU) -1.235795679474E+05 DETOT -8.71E+03 tst 2.63E+00 PX 4.40E+01
In the first run the total energy does not oscillate, hence why the atomic charges are still somewhat as expected. In the second run, the SCF solutions starts fluctuating much more (see difference in total energies) and as a result the charges get destabilized. The SCF might not recover from that at all (even though it seems it could), so I would suggest first trying speeding up the simulation to try and get the job done in one run. Alternatively you might have to go with slower mixing and more restarts, but that is not a guarantee for a solution either. Further options include increasing the DFT grid (XXLGRID, XXXLGRID) or further increasing the last number of the integral tolerance (even though yours is pretty high already).
Hope any of this is helpful in the end!
Cheers,
Aleks