Page MenuHomePhabricator

Incidental user auth error during job submission prevents the job from being submitted again
Closed, DeclinedPublic

Description

local-liangent-py@tools-dev:~$ qstat

job-ID prior name user state submit/start at queue slots ja-task-ID

2609557 0.25000 updatedyk local-liange Eqw 02/17/2014 21:30:06 1

local-liangent-py@tools-dev:~$ qstat -j 2609557

job_number: 2609557
exec_file: job_scripts/2609557
submission_time: Mon Feb 17 21:30:06 2014
owner: local-liangent-py
uid: 50498
group: local-liangent-py
gid: 50498
sge_o_home: /data/project/liangent-py
sge_o_log_name: local-liangent-py
sge_o_path: /usr/local/bin:/usr/bin:/bin
sge_o_shell: /bin/sh
sge_o_workdir: /data/project/liangent-py
sge_o_host: tools-login
account: sge
stderr_path_list: NONE:NONE:/data/project/liangent-py/updatedyk.err
hard resource_list: h_vmem=262144k,jobs=1
mail_list: local-liangent-py@tools.wmflabs.org
notify: FALSE
job_name: updatedyk
stdout_path_list: NONE:NONE:/data/project/liangent-py/updatedyk.out
jobshare: 0
hard_queue_list: task
env_list:
job_args: maintenance
script_file: /data/project/liangent-py/scripts/updatedyk/loader.py
error reason 1: can't get password entry for user "local-liangent-py". Either the user does not exist or NIS error!
scheduling info: Job is in error state
local-liangent-py@tools-dev:~$


Version: unspecified
Severity: normal

Details

Reference
bz61798

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:04 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz61798.

local-liangent-py@tools-login:~$ crontab -l | tail -n 7

m h dom mon dow command

PATH=/usr/local/bin:/usr/bin:/bin

0,20,40 * * * * jsub -once -N updatedyk $HOME/scripts/updatedyk/loader.py main
10,30,50 * * * * jsub -once -N updatedyk $HOME/scripts/updatedyk/loader.py maintenance

local-liangent-py@tools-login:~$ tail updatedyk.err
[Sat Feb 22 06:50:05 2014] there is a job named 'updatedyk' already active
[Sat Feb 22 07:00:11 2014] there is a job named 'updatedyk' already active
[Sat Feb 22 07:10:05 2014] there is a job named 'updatedyk' already active
[Sat Feb 22 07:20:06 2014] there is a job named 'updatedyk' already active
[Sat Feb 22 07:30:06 2014] there is a job named 'updatedyk' already active
[Sat Feb 22 07:40:05 2014] there is a job named 'updatedyk' already active
[Sat Feb 22 07:50:06 2014] there is a job named 'updatedyk' already active
[Sat Feb 22 08:00:10 2014] there is a job named 'updatedyk' already active
[Sat Feb 22 08:10:05 2014] there is a job named 'updatedyk' already active
[Sat Feb 22 08:20:06 2014] there is a job named 'updatedyk' already active
local-liangent-py@tools-login:~$

Others. It seems we had some technical difficulties at that time.

local-liangent-php@tools-login:~$ qstat

job-ID prior name user state submit/start at queue slots ja-task-ID

1654375 0.43462 httpd-lian local-liange r 11/24/2013 15:08:12 webgrid@tools-webgrid-01.pmtpa 1
2592966 0.26397 php_dispat local-liange r 02/16/2014 16:00:22 continuous@tools-exec-02.pmtpa 1
2609575 0.25000 php_transl local-liange Eqw 02/17/2014 21:31:02 1
2609583 0.25000 php_cleanu local-liange Eqw 02/17/2014 21:33:01 1
2609686 0.25000 php_cleanu local-liange Eqw 02/17/2014 21:53:01 1
local-liangent-php@tools-login:~$

This appears to have been a transient fault during one of pmtpa's numerous fits.