html to pdf conversion suggestions

Discussion in 'Geek Cave: Computers, Tablets, HT, Phones, Games' started by atomicbob, Jan 17, 2021.

  1. atomicbob

    atomicbob dScope Yoda

    Pyrate BWC MZR
    Joined:
    Sep 27, 2015
    Likes Received:
    18,829
    Trophy Points:
    113
    Location:
    On planet
    Background
    My dScope reporting tool creates html output with hot links from the index to specific data in the pages that follow. Forums don't allow html reports to be posted for very good security reasons. This forum also limits attached file size to 1 MB. Up to now I've been able to use google chrome and print to pdf and keep my reports just under 1 MB.

    Problem
    google has changed chrome recently which has increased the pdf output file size. My reports are now running 1.1 to 1.3 MB and even using jpdftweak with better compression checked I can't get the report pdf file size below 1.018 MB.

    Using other browsers to create pdf reports result in formatting issues seriously degrading the report visual quality.

    Question
    Is there a windows 7 compatible utility for converting html to pdf that will allow me to replicate what I was achieving before?
    I do not wish to use online or cloud based tools.

    Long shot, but I highly doubt, is it possible to find a reversion for chrome browser pdf print function?

    Or would it be possible to have the forum file size limit increased for this particular type of report to 1.5 MB?

    Until I solve this I will be unable to post dScope reports on the various components in the queue now.
     
  2. Armaegis

    Armaegis Friend

    Pyrate BWC
    Joined:
    Sep 27, 2015
    Likes Received:
    7,537
    Trophy Points:
    113
    Location:
    Winnipeg
    @atomicbob
    Do you have any other pdf software on the computer? Any type of reader like Adobe Acrobat, Foxit, Nitro, etc, all install a pdf "printer" that you can select from the print menu, and they usually have settings that let you adjust the file size/quality.

    As a stopgap, I do have a licensed copy of Kofax Power PDF editing software and I'd be willing to try compressing/optimizing some file sizes for you. I was laid off just recently and find myself with an abundance of extra time now, so I'd be happy to help out.
     
  3. Syzygy

    Syzygy Friend

    Pyrate
    Joined:
    Jun 13, 2018
    Likes Received:
    2,144
    Trophy Points:
    93
    Location:
    DFW, Texas
    I sent a PM, I'd like to look at what the tool renders to get a better idea of what would be helpful. If the graphs are SVG-based, we can just run those through a tool that translates it to a PNG file that you can upload as an image.

    If you need help with some report text also (that is not inside an image), we can look at that too.
     
  4. atomicbob

    atomicbob dScope Yoda

    Pyrate BWC MZR
    Joined:
    Sep 27, 2015
    Likes Received:
    18,829
    Trophy Points:
    113
    Location:
    On planet
    @Armaegis - very sorry to hear you are on an unpaid corporate mandated "holiday". I hope you find a suitable alternative soon.
    Thank-you for your offer. Kofax Power PDF looks like the type of utility for which I seek. If it does the job it would be stable for long periods of time, unlike cloud based solutions which can change at any time without notification.
    @Syzygy - thank-you for your offer of assistance too.
     
  5. Armaegis

    Armaegis Friend

    Pyrate BWC
    Joined:
    Sep 27, 2015
    Likes Received:
    7,537
    Trophy Points:
    113
    Location:
    Winnipeg
    Thanks Bob. I was upset with myself at first, but then I met my replacement who turns out to be the son of the company's new GM... hmm... if that's the way things are going, then I'm better off.
     
  6. Syzygy

    Syzygy Friend

    Pyrate
    Joined:
    Jun 13, 2018
    Likes Received:
    2,144
    Trophy Points:
    93
    Location:
    DFW, Texas
    I think I've found a solution that should work for you.

    Recommended, because it's easiest:

    1. Prerequisite: install ghostscript on your computer. This is an open-source PostScript® rendering engine (PDF is somewhat of a subset of PostScript). Choose the installer under the GNU Affero General Public License column.

    2. Prerequisite: install Python3 on your computer. This is a general-purpose programming and scripting language. I'd choose the Windows 64-bit installer version.

    3. Install the script that drives the PDF compression for you. This is a fairly nice wrapper around the Ghostscript command line needed to do the compression. Most people will name this script cpdf or cpdf.py . This link is to the raw python script file; select-all, copy, and paste it into a new Notepad document, and save it somewhere easy to get to from the command prompt.

    4. The above script needs a Python library to display its fancy progress bar. You install the library by using the command prompt: pip3 install pyprind

    Now we're all set up to run the script and compress PDF's!

    Each time you have a PDF to compress, and assuming that python3 was added to your PATH by its installer, you will use a command from the command prompt with this pattern: python3 cpdf default [input-pdf-name].pdf [compressed-pdf-name].pdf

    You can put that in a .bat file if it works without doing anything else.

    The first argument to the script is the target for the compressed file; it must be one of the values: default, screen, ebook, printer, prepress. More information is available with the command python3 cpdf types .

    -----

    If there are problems, please post here and I'll be happy to help get this up and running for you. For some of the Windows-y stuff above, I've made educated guesses about, since I quit Windows in 1999 and now run various Unixes. But I can set up a virtual machine and go through this process if necessary to see where the problem lies.

    Possible problems above: python3 may need to just be python ; the prior version of the language just end-of-life'd so they may have dropped the numeric suffix. Same with pip3 .

    Hopefully the Windows Python installer put it on your path for you. Otherwise you'll need to massage the PATH in your environment variables so that CMD.EXE can find Python3.exe

    I ran the tool on PDF contained in the zip file you emailed, and here's the output, so it looks promising:

    Compressing...
    0% [##############################] 100% | ETA: 00:00:00
    Total time elapsed: 00:00:01
    0
    Title: Compressing...
    Started: 01/17/2021 17:52:50
    Finished: 01/17/2021 17:52:52
    Total time elapsed: 00:00:01

    Compressed!

    dac.pdf is 1.160069MB in size.
    test.pdf is 0.9MB after compression.


    I first tried with the screen output target, and the graphs weren't of acceptable quality for me.
     
  7. Syzygy

    Syzygy Friend

    Pyrate
    Joined:
    Jun 13, 2018
    Likes Received:
    2,144
    Trophy Points:
    93
    Location:
    DFW, Texas
    Ha, alternatively, you can skip steps 2-4 above, and directly use ghostscript. The correlating command to the usage I provided above is:

    gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -sOutputFile="[compressed-pdf-name].pdf" "[input-pdf-name].pdf"

    And of course you can put that into a .BAT or .CMD file too.

    I'm also hoping the Ghostscript installer put itself on your PATH.

    edit: I took this command directly out of the python script referenced in the prior post. The /default is the output target setting.

    All those online services are just running Ghostscript to do this for you (and prolly stealing your doc at the same time).


    Another edit: the Windows Ghostscript installer may install a GUI frontend to it. You can probably use it (you'd have to find the appropriate settings in it, as above), and do it with a nice windowy user experience.
     
    Last edited: Jan 17, 2021
  8. atomicbob

    atomicbob dScope Yoda

    Pyrate BWC MZR
    Joined:
    Sep 27, 2015
    Likes Received:
    18,829
    Trophy Points:
    113
    Location:
    On planet
    Thank-you @Syzygy for putting the effort into these two solutions.

    You have hit upon one of at least two reasons for my dislike of online or cloud based "services". Another is that they can change performance without notice at any time. I will look into Ghostscript.

    batch file is very good as I am already using sed in a batch file to convert some of the html lines as necessary to preserve format compatibility with browser pdf printing.

    Did you test run the sample html report with png files I sent? Could you send me a copy of the pdf output so I can compare to the previous method output?
     
    Last edited: Jan 17, 2021
  9. atomicbob

    atomicbob dScope Yoda

    Pyrate BWC MZR
    Joined:
    Sep 27, 2015
    Likes Received:
    18,829
    Trophy Points:
    113
    Location:
    On planet
    @Syzygy - ghostscript solution is very close. Hot links are correctly preserved, formatting is correct. The png graph quality deteriorated a little more than I would like. Is there a switch that leaves the png graphs alone? They were already resized and compressed with a utility called PhotoMarks.
     
  10. Syzygy

    Syzygy Friend

    Pyrate
    Joined:
    Jun 13, 2018
    Likes Received:
    2,144
    Trophy Points:
    93
    Location:
    DFW, Texas
    I didn't try the printer, prepress, or ebook settings. I'd suspect that print or prepress might be the best, but might increase the file size.
     
  11. Syzygy

    Syzygy Friend

    Pyrate
    Joined:
    Jun 13, 2018
    Likes Received:
    2,144
    Trophy Points:
    93
    Location:
    DFW, Texas
    Sent emails with the PDFs; I didn't want to upload them here and steal your thunder! The gist is that screen is poorest quality (too poor), default looks decent on my MacBook and my 4k monitor at default PDF viewer settings; ebook looks decent too, I didn't study enough to discern a difference from default; and both printer and prepress increase the file size.
     
  12. atomicbob

    atomicbob dScope Yoda

    Pyrate BWC MZR
    Joined:
    Sep 27, 2015
    Likes Received:
    18,829
    Trophy Points:
    113
    Location:
    On planet
    @Armaegis hit upon the idea that I don't need 32 bit color depth in the graphs. Converting to 16 bit color depth is unnoticeable and allow me to continue to use google chrome pdf conversion while remaining under 1 MB file size. I have a utility called irfanview which allows batch bit depth color conversion. Another step in the posting process, sigh. But at least there is a way, thank-you @Armaegis

    @Syzygy solution is very close but the quality is just a little less than the color bit depth solution. So thanks to @Syzygy for introducing me to another tool for the box of solutions, ghostscript.
     
  13. haywood

    haywood Friend

    Pyrate
    Joined:
    Oct 22, 2015
    Likes Received:
    764
    Trophy Points:
    93
    You can probably find a version of Chromium (the open source version of Chrome) from before the patch that made pdfs bigger. Alternately something like Brave might not have incorporated that code yet and is based off Chromium so should have similar output.
     
  14. Syzygy

    Syzygy Friend

    Pyrate
    Joined:
    Jun 13, 2018
    Likes Received:
    2,144
    Trophy Points:
    93
    Location:
    DFW, Texas
    Excellent! Yeah I didn't get to the next stage…reducing the image sizes independently, hoping we could just compress the PDF further. This should give ~50% space savings on the images.

    There's rarely need to use >16 bits for these types of things. The 10-bit color rendition is just coming to TV's (expensive photography monitors will do 14bits, IIRC), and 24- or 32- are really only needed for editing overhead in photography/videography.
     
  15. Armaegis

    Armaegis Friend

    Pyrate BWC
    Joined:
    Sep 27, 2015
    Likes Received:
    7,537
    Trophy Points:
    113
    Location:
    Winnipeg
    Awesome, I'm glad to hear it worked out!
     
  16. mitochondrium

    mitochondrium Friend

    Pyrate
    Joined:
    Sep 1, 2017
    Likes Received:
    1,115
    Trophy Points:
    93
    Location:
    A Cell
    Nice to have found a solution. Luckily bit reduction is easier on the eye than on the ear
     
  17. atomicbob

    atomicbob dScope Yoda

    Pyrate BWC MZR
    Joined:
    Sep 27, 2015
    Likes Received:
    18,829
    Trophy Points:
    113
    Location:
    On planet
    Thanks to conversation with both @Armaegis and @Syzygy I gained better understanding of png and pdf properties. Armed with more knowledge I found this solution, wkhtmltopdf:
    https://wkhtmltopdf.org/

    Lightweight command line oriented program that plays nice in a batchfile or makefile.

    Here are file size comparisons for a Morpheus dScope report html output converted to pdf:
    1171 K - google chrome version m87, png at 32BPP - where my problem began, filesize exceeds 1000 K forum limit
    975 K - google chrome version m83, png at 32BPP - this is what I used for the Morpheus technical measurements post
    958 K - wkhtmltopdf, png at 32BPP
    785 K - wkhtmltopdf, png at 8BPP, color depth reduction

    wkhtmltopdf restores my previous working method while eliminating the need to open a browser and perform the manual print step. It will remain stable unless I choose to update it, rather than having updates foisted upon the user without option.

    *edit*
    The reason pdf files output from google chrome increased in size is due to additional metadata tagging of objects in the pdf beginning with version m85.
     
    Last edited: Jan 18, 2021
  18. rhythmdevils

    rhythmdevils MOT: rhythmdevils audio

    Pyrate
    Joined:
    Apr 15, 2020
    Likes Received:
    12,429
    Trophy Points:
    113
    Location:
    Bay Area, CA
    Home Page:
    In case this doesn’t work out, not to state the obvious but as a stop gap you could always upload the images to Imgur and the file size won’t matter. It’s better to attach them to the site because Imgur could disappear but to temporarily keep your much appreciated work going it would be fine and you could get lower file sizes when you figure it out and attach them then. You probably already thought of this but since you mentioned not being able to continue your measurements I thought I’d mention it.
     
  19. atomicbob

    atomicbob dScope Yoda

    Pyrate BWC MZR
    Joined:
    Sep 27, 2015
    Likes Received:
    18,829
    Trophy Points:
    113
    Location:
    On planet
    Thank-you for the thoughts. With the help of forum friends multiple workable solutions have been presented, with a summary of most likely to be used in post #17 above.

    The usual challenge of finding time amongst all the other tasks normally in my queue remains.
     

Share This Page