When connectivity is lost printer prints jumbled characters

ChrisM4 · ‎04-11-2017

My OfficeJet Pro 6970 prints many pages typically with a single line of jumbled characters whenever a print job is interrupted by loss of WiFi connectivity. This sounds like a minor issue, but when used in conjunction with HP Instant Ink, this problem quickly eats into my monthly printing allowance. The nature of WiFi is such that it gets interrupted occassionally, this could do with being more robust.

I'm not sure why it would behave like this? Is this a bug with the printer driver? Perhaps something left in for backwards compatibility? Either way, it seems like an easy fix - can we have a checkbox somewhere on the printer interface that allows this to be switched off?

dansdaduk · ‎04-12-2017

>> ... Is this a bug with the printer driver?

>> ... Perhaps something left in for backwards compatibility?

No.

Printers such as yours do not 'understand' plain ASCII text, so all of the text, pictures, etc. in the source (Word, Excel, etc.) document are 'converted'by the printer driver into something which the (relatively simple) intepreter in the printer can understand.

In essence, these source document objects are converted into a series of encapsulated, compressed, printer-format raster images; the printer then just unencapsulates and uncompresses these images, and prints the resultant dots.

These encapsulated, compressed images are, by their very nature, highly structured.

Any interruption in the middle of such a sequence means that the structure is compromised, and the printer 'loses its way', then tries to intrepret the incoming stream of bytes as best it can.

[A bit like a modern digital TV, which does its best (usually very-pixellated or frozen pictures) if the TV signal is very weak, or interrupted by atmospheric conditions].

>> ... Either way, it seems like an easy fix

Not really; as I see it, there are two problems:

The not inconsiderable difficulty in deciding on the criteria to be used to determine that the structure MAY have been compromised, without generating 'false positives'.

Even if a 'good' algorithm was found, the cost of including this in the relatively simple firmware of devices such as yours might mean that the cost of the printer would increase; it might require additional memory, and/or increased processing power to cope, which would add further to the cost.

ChrisM4 · ‎04-12-2017

Thank you very much for the response. I'll have to take your word for it - although it still doesn't help me with my lost InstantInk allowance. I would however say...

>> A bit like a modern digital TV, which does its best (usually very-pixellated or frozen pictures) if the TV signal is very weak, or interrupted by atmospheric conditions

Digital TV seems much trickier to me? That's a compressed lossy stream (not lossless like files sent to a printer), that has no fixed start or finish (not like printer files which have a definite start and finish), and has to make use of temporal compression where the frames in something like MPEG2 have to refer to previous frames (not like Postscript or whatever format print documents are transferred in which is a single blob).

>> The not inconsiderable difficulty in deciding on the criteria to be used to determine that the structure MAY have been compromised, without generating 'false positives'.

There must be a point when the printer starts printing a literal unicode representation of the bytes (or whatever those random strings that are getting printed are), when it knows that is unexpected behaviour - because surely it's had to convert to that? But alternatively, further encapsulation applied by the printer driver? Lead the transmission with the number of bytes the printer should expect and end with a series of sentinal bytes to play safe. Maybe even a checksum to verify the printer got everything? There are some very cheap checksums.

>> Even if a 'good' algorithm was found, the cost of including this in the relatively simple firmware of devices such as yours might mean that the cost of the printer would increase; it might require additional memory, and/or increased processing power to cope, which would add further to the cost.

I think TCP is probably simpler than what I've proposed above, and the printer handles that fine. The printer we are discussing can actually receive and print files from email (very cool!). I think it can handle a checksum? Although I'm sure there's a more efficient solution than the one I've proposed as an example.

dansdaduk · ‎04-12-2017

I can't comment on your references to MPEG2, etc. (I don't know enough about them) , but my digital TV still pixellates quite frequently at certain times of year / times of day, on some channels.

I agree that there should be a point where the printer firmware thinks that the data is incorrect, but:

Dealing with it without causing 'false positives' (leading to legitimate prints being aborted) could still perhaps be a problem?

I don't know enough about the formats and error detection / recovery techniques involved, although I would point out that checksum characters are usually associated with error detection, whilst more advanced techniques (such as those used on compact disc, QR Code bar-codes, etc.) can offer error recovery features, but at the expense of perhaps 50% of the data being error-recovery data rather than source data - meaning network traffic increases - and requires more memory and processing power to handle.

By the time that the printer 'realises' that some data is perhaps incorrect, it may have read (and attempted to process) a considerable amount of data.

Bear in mind that a monochrome raster image (at 1 bit-per-pixel, using 600 dots-per-inch resolution) would require 360000 bits, or 45000 bytes, per square inch in uncompressed form; for a Letter size page (8.5" x 11"), this is over 4 MB, although compression techniques used within the printer 'language' would reduce this.

An equivalent 24-bit colour page would require over 96 MB in compressed format.

So if only a very small fraction of the data for a page was 'corrupted', this could result in lots of text being printed - I think that these printers have a very basic text font which is primarily used for diagnostic purpose (but I may be totally misinformed about this, of course), which may be invoked in error situations.

All of the above is conjecture - I don't know enough about the firmware / intepreter on these printers (and I dont work for HP, so can't easily find out, either).

ChrisM4 · ‎04-12-2017

Dealing with it without causing 'false positives' (leading to legitimate prints being aborted) could still perhaps be a problem?
- I don't think so - and that's why I suggested a tick box option on the printer. If well implemented this seems highly unlikely it would be an issue - but the user could temporarily switch it off if the printer rejected a specific document. Or even nicer, it could say on that lovely screen "Warning: error checking failed, do you still want to print"
I don't know enough about the formats and error detection / recovery techniques involved, although I would point out that checksum characters are usually associated with error detection, whilst more advanced techniques (such as those used on compact disc, QR Code bar-codes, etc.) can offer error recovery features, but at the expense of perhaps 50% of the data being error-recovery data rather than source data - meaning network traffic increases - and requires more memory and processing power to handle.
- That's fine, don't need recovery - just checking. QR codes and bar codes are designed so they still operate if they become damaged. My file doesn't become damaged if it fails to print, so that's not a requirement here. I'm happy to send it again if I walk too far from the WiFi, I just don't want the printer to react by spitting out lots of junk and burning through my paper and InstantInk allowance. There would be minimal extra data to send. CRC would do fine here. This works fine on embedded and mobile devices so a printer should be fine too.
By the time that the printer 'realises' that some data is perhaps incorrect, it may have read (and attempted to process) a considerable amount of data.
- And then it could abort and dispose of that data, and maybe even give me a nice error message. Maybe a negative of this method would be that you have to wait longer for prints. However, I tend to think the file that is sent is already fully unpacked and converted before the printing starts, so this would be negligable?
An equivalent 24-bit colour page would require over 96 MB in compressed format.
- I've just taken your second one here re size. Size doesn't impact much on checksums. It can slow validation a bit - but should be easy for a printer. As I say, really old mobile devices can do this stuff.
So if only a very small fraction of the data for a page was 'corrupted', this could result in lots of text being printed - I think that these printers have a very basic text font which is primarily used for diagnostic purpose (but I may be totally misinformed about this, of course), which may be invoked in error situations.
- As I originally suspected - maybe it's there for backwards compatibility or debugging. I can throw any data at the printer from an old word processor and it prints it literally to support the old protocols from that device? If this is the case, again, it should be a configuration option, and it should be switched off by default. I think the majority of users are printing via modern OS's and applications.

Sorry, but still sounds like an easy fix to me.

That's okay, appreciate your input.

dansdaduk · ‎04-13-2017

A few more comments:

The specifications of the PCL3 GUI and PCL3 Advanced PDLs (the ones supported by your printer, according to the datasheet) are not published by HP, so it is difficult to know exactly how the printer may operate.

As far as I can make out, these PDLs are based on PCL3 (a precursor to the PCL4 and PCL5 languages used on later and current LaserJet devices), but with extensions which are incompatible with PCL4/5.

As such, the language will support some basic PCL escape sequences (to control page size, plexing, etc.), and probably some basic text processing capabilty (using one (or more?) basic (fixed-size) printer-resident bitmap fonts.

The language will also support some of the basic control-code characters, probably including LineFeed (0x0A) and HorizontalTab (0x0D), and definitely including FormFeed (0x0C).

But the bulk of each print job will almost certainly use PCL-format raster-graphic images, albeit using a proprietary (undocumented) compression technique (mode 10).

To illustrate (some of) the above, attached (within a .zip file) is a .prn 'capture' of the Driver Test Page print job generated by the OfficeJet Pro 6970 - PCL3 driver supplied (via Microsoft) on a Windows 10 Pro 64-bit system, together with a (limited) analysis of the file; here is a (short extract) of that analysis:

0000012119   PCL Parameterised      <Esc>*o0M         Print Quality: Normal
0000012124   PCL Parameterised      <Esc>&l26A        Page Size: A4: 210 mm x 297 mm
0000012130   PCL Parameterised      <Esc>&l0M         Media Type: Plain Paper
0000012135   PCL Parameterised      <Esc>*o5W         Driver Configuration (data length = 5)
0000012140   PCL Binary             [ 5 bytes ]       [ 0b 01 00 00 00 ]
0000012145   PCL Parameterised      <Esc>&u600D       Unit-of-Measure (600 PCL units per inch)
0000012152   PCL Parameterised      <Esc>*t600R       Raster Graphics Resolution (600 dots-per-inch)
0000012159   PCL Parameterised      <Esc>*g12W        Configure Raster Data (data length = 12)
0000012165   PCL Binary             [ 12 bytes ]      Configure Raster Data header
0000012165                          [ 12 bytes ]      [ 06 07 00 01 02 58 02 58 0a 01 20 01 ]
             Format                                   6: identity name not known
             Pen mode (?)                             7
             Component              Count             1
             Component 1            Res. Horizontal   600 pixels-per-inch
             Component 1            Res. Vertical     600 pixels-per-inch
             Component 1            Compression Mode  10
             Component 1            Orientation       1: Lansdscape
             Component 1            Bits per Pixel    32
             Component 1            Planes per Pixel  1
0000012177   PCL Parameterised      <Esc>*r4891S      Raster Width: Source (4891 pixels)
0000012185   PCL Parameterised      <Esc>&l-2H        Paper Source: id -2 is Printer Dependent
0000012191   PCL Parameterised      <Esc>*r1A         Start Raster Graphics: Left Margin at X
0000012196   PCL Parameterised      <Esc>*p211Y       Cursor Position Vertical   (211 PCL units)
0000012203   PCL Parameterised      <Esc>*b44Y        Y Offset (44 raster lines)
0000012209   PCL Parameterised      <Esc>*b23W        Transfer Raster Data By Row/Block (data length = 23)
0000012215   PCL Binary             [ 23 bytes ]      [ 9f ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ]
0000012231                                            [ 5d 00 d8 78 ff ff 51 ]
0000012238   PCL Parameterised      <Esc>*b0W         Transfer Raster Data By Row/Block (data length = 0)
0000012243   PCL Parameterised      <Esc>*b0W         Transfer Raster Data By Row/Block (data length = 0)
0000012248   PCL Parameterised      <Esc>*b0W         Transfer Raster Data By Row/Block (data length = 0)
0000012253   PCL Parameterised      <Esc>*b0W         Transfer Raster Data By Row/Block (data length = 0)
0000012258   PCL Parameterised      <Esc>*b19W        Transfer Raster Data By Row/Block (data length = 19)
0000012264   PCL Binary             [ 19 bytes ]      [ 9b ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ]
0000012280                                            [ 5d 80 1f ]
0000012283   PCL Parameterised      <Esc>*b22W        Transfer Raster Data By Row/Block (data length = 22)
0000012289   PCL Binary             [ 22 bytes ]      [ 9f ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ]
0000012305                                            [ 5d 80 01 ff ff 51 ]

The full analysis shows:

The job starts with 11000 <NUL> (0x00) bytes; I think that these are to ensure that the printer's buffers are flushed at the start of the job.
There is a fairly anodyne PJL job-level header.
There are a few 'standard' PCL escape sequences (to set paper size, paper source, plex mode, paper type, etc.).
The bulk of the job is made up of raster-transfer sequences; note that the data associated with these sequences can contain ANY byte value (0x00 -> 0xFF).
Towards the end of the job is a FormFeed (0x0C) control-code byte, to ensure that the (last) page is ejected.

If, for example, there was some (undetected) corruption in one of the raster fragments (which can be quite large - remember than in uncompressed mode, a one-inch square image requires over 1MB of data to describe it), such that:

The workstation believed that it was sending a raster-sequence fragment of size 500 bytes.
Due to (undetected) corruption, the printer believes that it is to receive and process 300 bytes of raster definition.
The extra 200 bytes are then parsed by the printer and treated as text data, perhaps interspersed with control-code bytes (e.g. FormFeed (0x0C)), or PCL escape sequences (introduced with the <Esc> contol-code character (0x1B), and which the PCL syntax decrees the printer will ignore if the following few bytes do not make up a valid/recognised escape sequence); as far as the printer is concerned it is just interpreting data which has been presented to it.
This can lead to random text data being printed, perhaps over a number of pages.

Now for some further comments:

The only PCL sequences I'm aware of which make use of checksum bytes are (some) soft font downloads.

None of the raster commands use checksums, although the associated binary data does have defined structures, so (some) inconsistencies can be detected.

As far as I know (it's not my area of expertise) checksums, where used, can detect all single-bit corruptions, and some multiple-bit corruptions, but not all, so even where checksums are used, not all corruptions would be detected.

As mentioned above, the PCL language does not (other than as indicated) use checksums, and to now make changes to the language specification would cause all manner of incompatibilities between the (tens of?) millions of devices using PDLs based on the language syntax, and the (tens of ?) millions of workstations with drivers (some of which may be third-party) which generate print jobs using this very-well established syntax.

Where checksums (and other techniques (e.g.flow-control) ?) may be used is at lower levels (e.g. the TCP/IP transport level, or the Port 9100 protocol which is commonly used with printer connections); I don't know enough about any of these to be able to present meaningful comments.

Even the more business class traditional PDLs such as PCL5 and PCL XL (which require much more memory and processing power, etc. than the PCL3+ raster-based languages) appear to suffer from occasional 'runaway' prints where the print job has been corrupted.

I'm not sure why this occurs with PCL XL (which is more highly structured, and uses PageBegin and PageEnd Operators to (theoretically) delineate pages; perhaps what happens is the that the device times out when data stops reaching it and aborts the job (but cannot notify the sending workstation because of the connection problems), then connection is resumed with the workstation attempting to send the remainder of the original job - but the printer think it is now a new job (and without a PCL XL header) asssumes that it is just data/PCL5, and treats it as such?

I'm not sure of the situation with the PostScript PDL - sometimes encapsulated binary images are used within what is normally an ASCII text-based syntax.

Note that PCL3/4/5/6, etc. are all 'passive' languages, but PostScript is a proper programming language , which supports constructs such as conditional statements and loops - a PS print job basically consists of a source PostScript program being downloaded to the printer for it to parse and execute. Again, it requires much more (expensive) memory and processing power in the printer (and obviously a PostScript interpreter in the firmware).

ChrisM4 · ‎04-15-2017

Wow. Fantastic overview. Thank you very much.

It's interesting you mention TCP. I do wonder if this is really an issue between the transport layer and the application layer? If I walk away from my printer with my iPad (or someone picks up a cordless phone, switches on a microwave etc. etc.) and the WiFi connection is interrupted, that FIN/ACK won't take place. It sounds like the printer firmware interprets any connection closed signal (regardless of whether it is reset, times out, or finishes cleanly) as an instruction to print.

This isn't the behaviour used by other application layers. If I'm sending an email and the connection is reset, my email client will tell me so and the server will throw away the email at the other end. Not deliver half a corrupted email.

dansdaduk · ‎04-15-2017

>> ... mention TCP. I do wonder if this is really an issue between the transport layer and the application layer?

I've no idea, sorry - I'm out of my comfort zone with communications protocols, flow-control techniques, etc., and the interaction between layers.

As my previous response may have indicated, my area of expertise is Page Description Languages (in particular, PCL5 and PCL6 (PCL XL).

Create an account on the HP Community to personalize your profile and ask a question