This is an explanation on a common reason for Lat print queues stalling when being Protocol Translated across an IP network.
On a DEC VMS host system, remote printers are normally driven by the LATSYM processer via the LT driver across ethernet to DECservers. The print queue is mounted on an LTA device, which is defined to point to a specific DECserver via a combination of nodename, service and port name.
ALPHIE>show que lat_test /full Terminal queue LAT_TEST, idle, on ALPHIE::LTA300:, mounted form DEFAULT /BASE_PRIORITY=4 /DEFAULT=(FEED,FORM=DEFAULT) /LIBRARY=HPLJ3SI Lowercase /OWNER=[SYSTEM] /PROCESSOR=LATSYM /PROTECTION=(S:M,O:D,G:R,W:S) ALPHIE>mc latcp show port lta300 Local Port Name: _LTA300: Local Port Type: Application (Queued) Local Port State: Inactive Connected Link: Target Port Name: Actual Port Name: Target Node Name: WOLF Actual Node Name: Target Service Name: PRINTER1 Actual Service Name:
The Lat protocol is a fast Lan based protocol (Local Area Transport), and was never designed to go across wide area networks, so therefore responds very badly to congestion and packet delay. With an 80msec ack timer, it is very easy to reveal network problems when Lat is in use. This also means that applications using Lat are also very pedantic about timing. If the LAT symbiont does not see data movement an LTA device to which it is printing for more than 30 seconds it puts the queue into a stalled state.
When data transfer resumes, then the queue continues printing. Since it is common to have a lot of small jobs on the queue, there will be a lot of lat connections being initiated and torn down in rapid succession. If there is a current virtual circuit between a DECserver hosting the printer and the Host that is hosting the print queue, then the print jobs will just appear as extra slots in the virtual circuit, so there is no good reason for the queues to stall on an ethernet unless the printer signals that it cannot print any more.
When we translate to TCP, we pretend to be a Lat DECserver, and terminate one end of the Lat virtual circuit, we extract the user data and wrap it into TCP packets, then route it like normal IP data.
hostname wolf ! interface Ethernet0 ip address 172.17.243.2 255.255.248.0 lat enabled interface Serial0 ip address 141.245.41.253 255.255.255.252 ! translate lat PRINTER1 tcp 141.245.40.253 port 4001 quiet !
this config causes us to advertise the service PRINTER1, as existing on node WOLF, and accept Lat solicitations for this service.
We then send the print data to the IP address 141.245.40.253 port 4001 , which in my samples exists on another router across a wan link. On the other router we accept the TCP connection, put the user data into a Lat packet and start another Lat session to the target decserver. The Lat nodename, servicename and port name do not have to be the same throughout the translation.
! hostname sjang ! ! ! interface Loopback0 ip address 141.245.40.254 255.255.255.0 ! interface Ethernet0 ip address 172.17.243.4 255.255.248.0 lat enabled ! interface Serial1 ip address 141.245.41.254 255.255.255.252 ! ! translate tcp 141.245.40.253 port 4001 printer lat PRINTER2 port 1 quiet !
This config will solicit connections to the service PRINTER2, and hopefully our target decserver has been configured to advertise this service.
Note that the TCP -> Lat translation has the keyword "printer" in the statement, this causes us not to accept the translation if we cannot complete the Lat connection to the requested service (PRINTER2), this avoids us losing data between the two routers. What actually happens is that we reject the TCP translation, so the originally translating router (Lat -> TCP) has to queue up the connection for a retry.
This can cause timing problems. Consider the situation where we have multiple small print jobs on the host system, it sends them in rapid succession to the Lat -> TCP router, who accepts the first job, translates it and off it goes to the printer, it then accepts the second job milliseconds later, translates it, but the printer is still busy with the first job, so the translation is rejected and we queue the translation for a retry. The print queue is in the status "Printing" because it's talking to the first router. The printer clears the job in 15-20 seconds, and is ready for the next job, but the first router has a default "terminal-queue entry-retry-interval" of 60 seconds, so we don't retry the translation until 60 seconds has passed. Aeons for the Lat protocol.
In that 60 seconds the print queue will go to a "stalled" state. After 60 seconds we translate the 2nd job, it connects to the printer, we accept the 3rd job from the host, but can't translate because the printer is busy with the 2nd job, so we queue it for 60 seconds. The queue stalls, and so the cycle continues.
To get around this, the "terminal-queue entry-retry-interval" should be set to a lower value. The timer needs to be experimented with to get the optimum setting. If someone can check the average duration of a print job being executed on the printer, then make this timer that period + 5 seconds. If the timer is too small then we'll be aggressive in our translation attempts and this is just a waste of CPU overhead.
To check to see what entries are being queued for re-translation use the command "show entry"
wolf#show entry 58 waiting 00:00:13 for service PRINTER1 from LAT node ALPHIE
This job , number 58, has been waiting for 13 seconds already, when it reaches 60 it will be tried again.
Below is some sample output from my Alpha Host and debug from translating routers ..
ALPHIE>show que lat_test/all
Terminal queue LAT_TEST, busy, on ALPHIE::LTA300:, mounted form DEFAULT
Entry Jobname Username Blocks Status
----- ------- -------- ------ ------
72 LOGIN CPRICE 2 Printing
73 LOGIN CPRICE 2 Pending
74 LOGIN CPRICE 2 Pending
75 LOGIN CPRICE 2 Pending
ALPHIE>
ALPHIE>show que lat_test/all
Terminal queue LAT_TEST, stalled, on ALPHIE::LTA300:, mounted form DEFAULT
Entry Jobname Username Blocks Status
----- ------- -------- ------ ------
73 LOGIN CPRICE 2 Stalled
74 LOGIN CPRICE 2 Pending
75 LOGIN CPRICE 2 Pending
ALPHIE>
The Lat --> TCP translating router
wolf#show entry
45 waiting 00:00:29 for service PRINTER1 from LAT node ALPHIE
wolf#show trans
Translate From: LAT PRINTER1
To: TCP 141.245.40.253 Port 4001
Quiet
0/0 users active, 2 peak, 123 total, 0 failures
wolf#show entry
45 waiting 00:00:56 for service PRINTER1 from LAT node ALPHIE
wolf#
lattcp2: fork 32 started
lattcp3: fork 33 started
lattcp3: connection remains queued
lattcp2: fork 32 started
lattcp2: connection remains queued
Debug on the TCP --> Lat translating router.
tcplat2: fork started LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=1, M=1, R=0, len=22, next 0 ref 1 LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=1, M=0, R=0 LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=0, M=1, R=0, len=28, next 0 ref 1 LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=0, M=0, R=1 tcplat2: queuemax = 30 LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=0, M=1, R=0, len=24, next 0 ref 1 LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=0, M=0, R=1 LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=0, M=1, R=0, len=8, next 0 ref 1 LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=0, M=0, R=0 LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=0, M=1, R=0, len=1A0, next 0 ref 1LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=0, M=0, R=1 tcplat3: fork started LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=0, M=1, R=0, len=176, next 0 ref 1LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=0, M=0, R=1 tcplat3: can't get lat connection LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=0, M=1, R=0, len=8, next 0 ref 1 LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=0, M=0, R=1 LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=0, M=1, R=0, len=8, next 0 ref 1 LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=0, M=0, R=0 tcplat3: fork started LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=0, M=1, R=0, len=28, next 0 ref 1 LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=0, M=0, R=1 tcplat3: can't get lat connection LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=0, M=1, R=0, len=8, next 0 ref 1 LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=0, M=0, R=0 LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=0, M=1, R=0, len=C, next 0 ref 1 LAT: I int=Ethernet0, src=0000.0c8c.7088, dst=aa00.0400.0414, type=0, M=0, R=0 LAT: O dst=0000.0c8c.7088, int=Ethernet0, type=2, M=1, R=0, len=A, next 0 ref 1
sjang#show trans
Translate From: TCP 141.245.40.253 Port 4001 Printer
To: LAT PRINTER2 Port 1
Quiet
0/0 users active, 1 peak, 58 total, 38 failures
sjang#
The Commserver that I was using as a DECserver, with a printer on port 1 , the Aux port.
A>show lat servi
Service Name Rating Interface Node (Address)
PRINTER2 1 Local
A>
Local host statistics:
0/0 circuits, 0/0 sessions, 1/0 services
255 sessions/circuit, circuit timer 80, keep-alive timer 20
Recv: 417 messages (0 duplicates), 420 slots, 36608 bytes
0 bad circuit messages, 12324 service messages (16 used)
Xmit: 423 messages (0 retransmit), 318 slots, 3792 bytes
0 circuit timeouts, 12324 service messages
Total: 73 circuits created, 133 sessions
A>