[Wien] Problems with hybrid calculations: case.vectorhf_old file missing

Peter Blaha pblaha at theochem.tuwien.ac.at
Wed Mar 11 09:07:58 CET 2015


Hi,

Fabien Tranb already answered to you:

For -hf   you must set SCRATCH=./

You   CANNOT USE     /home/matstud/WIENSCRATCH/    as scratch.

In addition: run a "small example" first. (Something like MgO,.. maybe a 
4 times larger cell of MgO (P instead of F in a 1x1x1 supercell) and 
find out the scaling of a hybrid calculation.

I'm pretty sure that with your hardware this system cannot be handled 
(at least not with this 2x2x2 k-mesh). Even your lapw1 steps took 
several hours, and hf is about 100 slower ...



On 03/11/2015 01:53 AM, Paul Fons wrote:
> I have an update and some questions on hybrid calculations on a 96 atom cluster.  I am running my initial tests with two 24 core machines connected by Infiniband.  I have included 4 k-points using a 2x2x2 MP grid.  My .machines file is as below.
>
> lapw0:localhost:12
> 1:localhost:12
> 1:localhost:12
> 1:draco-ib:12
> 1:draco-ib:12
> granularity:1
> extrafine:1
>
>
>   I have done a conventional PBE calculation on the same cluster using the above .machines file and the calculation finished without errors in a few hours.  I then initialized a hf calculation using lapw_hf_lapw and specified the same 2x2x2 grid.  I specified 770 bands in my case.inhf as I have 1526 electrons.  The initialize ran without errors.  I then invoked the scf loop using “run_lapw -hf -p” using the same machines file.  The lapw0 and the initial part of the scf loop appears to have run without errors, but the calculation stopped on the second iteration of the SCF loop.  In particular, the second loop failed due to a missing file "/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old”.  Before I continue, I should add that the WIENSCRATCH environmental variable is correctly set and the directories on both machines exist.  I should add that of course the regular parallel PBE run ran without errors as well and I assumed it used the same scratch directories wit!
 hout error
.  The file in question “aCGT.vectorhf_old” does not exist in either of the WIENSCRATCH directories nor does it exist in the home directory of the calculation.  The two nodes are gemini (localhost) and draco-ib (the infini-band connected second node).  The contents of the scratch directories on both nodes are listed below as well as the files with vector within the files on the project directory.   The current calculation only involves four k-points and I have been careful not to limit the number of MPI jobs to four.  I have done the calculation twice now with the same errors.  The first time, I tried it immediately after a successful PBE calculation while the second time I tried it after deleting all of the files in the WIENSCRATCH directory thinking it the file error could have been caused by filename confusion.  The end result was the same (reprinted below).  The run stops because the file aCGT.vectorhf_old cannot be found.  Are there any suggestions as to what I migh!
 t try next
 to solve the problem?  Thanks in advance for any help.
>
>
>
> The actual run output went as follows:
>
> run_lapw -hf -p -in1new 2
>   LAPW0 END
>   LAPW0 END
>   LAPW1 END
> mv: cannot stat `aCGT.vector': No such file or directory
>   LAPW1 END
>   LAPW1 END
>   LAPW1 END
>   LAPW1 END
> mv: cannot stat `aCGT.vectorhf_old': No such file or directory
>   LAPW2 END
> mv: cannot stat `aCGT.vector': No such file or directory
> LAPW2 - FERMI; weighs written
>   LAPW2 END
>   LAPW2 END
>   LAPW2 END
>   LAPW2 END
>   SUMPARA END
>   CORE  END
> OPEN FAILED
>
>    Above message repeats for a total of 48 times
>
> error with vector files
>
>>    stop error
>
>
>
>
>
> vector files in the different directories.
>
> On gemini (locahost) working directory
>
> ls -l aCGT*vector*
> -rw-rw-r-- 1 matstud matstud 0 Mar  8 19:18 aCGT.vectorhf
>
>
> On gemini (locahost) WIENSCRATCH
>
> matstud at gemini.a04.aist.go.jp:/usr/local/share/wien2k/Fons/aCGT>ls -l $HOME/WIENSCRATCH
> total 3592696
> -rw-rw-r-- 1 matstud matstud 2943235040 Mar  9 13:31 aCGT.vector
> -rw-rw-r-- 1 matstud matstud  367792558 Mar  9 14:11 aCGT.vector_1
> -rw-rw-r-- 1 matstud matstud  367877082 Mar  9 14:11 aCGT.vector_2
>
>
> On draco-ib (remote host) WIENSCRATCH directory
>
> matstud at gemini.a04.aist.go.jp:/usr/local/share/wien2k/Fons/aCGT>ssh draco-ib ls -l WIENSCRATCH
> total 718780
> -rw-r--r-- 1 matstud matstud 367646498 Mar  9 14:04 aCGT.vector_3
> -rw-r--r-- 1 matstud matstud 368373958 Mar  9 14:04 aCGT.vector_4
>
>
>
>
> DAYFILE
>
> cat aCGT.dayfile
>
> Calculating aCGT in /usr/local/share/wien2k/Fons/aCGT
> on gemini.a04.aist.go.jp with PID 45216
> using WIEN2k_14.2 (Release 15/10/2014) in /home/matstud/Wien2K
>
>
>      start 	(Mon Mar  9 10:27:54 JST 2015) with lapw0 (40/99 to go)
>
>      cycle 1 	(Mon Mar  9 10:27:55 JST 2015) 	(40/99 to go)
>
>>    lapw0 -grr -p	(10:27:55) starting parallel lapw0 at Mon Mar  9 10:27:55 JST 2015
> -------- .machine0 : 12 processors
> 755.913u 3.546s 1:06.22 1146.8%	0+0k 184+796936io 0pf+0w
>>    lapw0 -p	(10:29:01) starting parallel lapw0 at Mon Mar  9 10:29:01 JST 2015
> -------- .machine0 : 12 processors
> 622.223u 2.856s 0:54.57 1145.4%	0+0k 48+203264io 0pf+0w
>>    lapw1    -c 	(10:29:56) 20873.217u 161.505s 3:01:39.31 192.9%	0+0k 14448+5913840io 0pf+0w
>>    lapw1  -p   -c 	(13:31:36) starting parallel lapw1 at Mon Mar  9 13:31:36 JST 2015
> ->  starting parallel LAPW1 jobs at Mon Mar  9 13:31:36 JST 2015
> running LAPW1 in parallel mode (using .machines)
> 4 number_of_parallel_jobs
>       localhost localhost localhost localhost localhost localhost localhost localhost localhost localhost localhost localhost(1) 27361.552u 625.141s 39:53.54 1169.2%	0+0k 8+882304io 0pf+0w
>       localhost localhost localhost localhost localhost localhost localhost localhost localhost localhost localhost localhost(1) 27183.185u 653.051s 39:50.16 1164.6%	0+0k 0+719488io 0pf+0w
>       draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib(1) 0.020u 0.024s 33:06.78 0.0%	0+0k 0+0io 0pf+0w
>       draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib(1) 0.023u 0.029s 33:15.23 0.0%	0+0k 0+0io 0pf+0w
>     Summary of lapw1para:
>     localhost	 k=0	 user=0	 wallclock=0
>     draco-ib	 k=0	 user=0	 wallclock=0
> 54550.790u 1281.369s 39:55.96 2330.2%	0+0k 72+1603312io 0pf+0w
>>    lapw2   -c 	(14:11:32) 979.641u 57.954s 9:12.71 187.7%	0+0k 1128+253800io 0pf+0w
>>    lapw2 -p   -c  	(14:20:45) running LAPW2 in parallel mode
>        localhost 261.005u 5.646s 0:25.12 1061.4% 0+0k 64+253704io 0pf+0w
>        localhost 228.920u 5.488s 0:22.16 1057.7% 0+0k 16+199776io 0pf+0w
>        draco-ib 0.033u 0.031s 0:21.96 0.2% 0+0k 8+0io 0pf+0w
>        draco-ib 0.032u 0.033s 0:21.80 0.2% 0+0k 8+0io 0pf+0w
>     Summary of lapw2para:
>     localhost	 user=489.925	 wallclock=47.28
>     draco-ib	 user=0.065	 wallclock=43.76
> 505.406u 13.549s 0:57.49 902.6%	0+0k 594800+654112io 0pf+0w
>>    lcore	(14:21:43) 4.164u 0.365s 0:06.55 69.0%	0+0k 8+69416io 0pf+0w
>>    hf       -p -c 	(14:21:50) running HF in parallel mode
>        localhost ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:u!
 nformatted
 ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted 0.241u 0.956s 0:00.72 165.2% 0+0k 16+8io 0pf+0w
>        localhost ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:u!
 nformatted
 ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted 0.240u 0.982s 0:00.73 167.1% 0+0k 16+8io 0pf+0w
>        draco-ib ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:un!
 formatted 
ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted 0.031u 0.018s 0:00.93 4.3% 0+0k 8+8io 0pf+0w
>        draco-ib ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:un!
 formatted 
ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted 0.022u 0.025s 0:00.86 4.6% 0+0k 8+8io 0pf+0w
>     Summary of hfpara:
>     localhost	 user=0	 wallclock=0
>     draco-ib	 user=0	 wallclock=0
> **  HF crashed!
> 0.755u 2.429s 0:07.75 40.9%	0+0k 96+1352io 0pf+0w
> error: command   /home/matstud/Wien2K/hfcpara -c hf.def   failed
>
>>    stop error
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/staff/tc_group_e.php
--------------------------------------------------------------------------


More information about the Wien mailing list