Posted by Jeff / 22 April 2017 / Tutorials / Hits: 3

Problem Statement:

Maybe it was the seeing a young David Copperfield, but I simply could not get an article about JPEG compression developed recently by Google out of my mind.

Firstly, I love saving.  The wasting time, money, clock-cycles, and bytes hurts my very being.  Saving is, for me, efficiency and is the ultimate goal of all systems.

But JPEG compression?  This is not something that I typically care about, but for whatever reason I was smitten - I needed to know more.

Off to github to learn more, and wouldn’t you know it - Google was so kind as to have multiple means of installation.  After a simple brew command, and I was up and running, Figure 1, or was I?

Figure 1. Guetzli installed and ???.

Sure I had the software installed, but I didn’t know anything at all.  I don’t typically expect much from “man COMMAND”, but “No manual entry for guetzli”.

Wunderbar!

Hum what about:

>guetzli <enter>  

Success - a listing of flags with minimal info, I had what I needed:  --quality #   

I figured that this command’s power comes from the quality sought.  Though there was no mention of “limits”, I entered an old standby:  “11”.  This was a great choice as I immediately got an error, explaining to me that I had to pick 84 or greater.  

So I needed to figure out how changing this value affected the output.  Playing around, I realized that the amount of time I was having to commit to was rather insane.  It is my hope that the systematic analysis offered below will help others in understanding how best to use this new tool.

Methodology: 

Computer Used

  • Late 2009 iMac 27” 2.8 GHz Intel Core i7

  • 32 GB 1333 MHz DDR3 Memory (yeah, you actually can do that)

  • 1.0 TB SSD Hard drive

  • OS X 10.11.6 

Test Image   

I selected as my test image, a photo of myself with my Grandmother, please see Figure 2. It was shot on a Sony NEX-3 camera.  I felt that it contained a number of complex elements:  

  • Geometric Lines

  • Color rendering  

  • Human subjects 

Lastly, the image size was 2,892,405 bytes (2.9 MB).  This will give a number of different attributes that can be tested.  A link to this test image has been provided in the Appendix. 

Figure 2. Test image (TestImage.jpg) used for this experiment of myself and my Grandmother.

The Command

The general form of this function is: 

E.g.: (for encoding at the minimal quality setting of 84) 

>guetzil --quality 84 TestImage.jpg test84.jpg <enter>

Procedure 

The first portion of this experiment centered around how the --quality flag affected time needed as well as the size of the output file.  As time is highly dependent upon the computer, time spent executing the function was normalized as a percent to the time for a quality setting of 84.  The treating of time as a percent relative to this basis allows for easy comparison between.

To simplify the comparison of file size reduction, the output of the function was also renormalized as a percentage using the initial test image’s size in bytes.  

The first order of the day was the creation of a shell script that would increase the quality level, starting at 84 (the program’s stated minimum) and incrementing to 120.  This script would take the test image, apply the guetzil command whilst timing how long it this function took.  The output file would then be saved, as well as a reporting of the size in MB. 

During initial pre-testing, it was observed that the testing computer seemed to get a solid workout.  This was seen in the activity monitor, as well as touching the back of the computer and feeling the heat being given off.  It was decided to have the computer sleep for 10 minutes between runs, to allow for the system to cool off and come back into steady state equilibrium.  

The second portion of the experiment will be more concerned on whether a particular quality level is actually usable and hence will be highly subjective.  Each output file was saved and a select number chosen for further review and analysis.  This further analysis would be blowing up the image and performing a comparison between.

Results and Discussion:

Quality Flag

Both time spent and output file size were converted as percentages, as detailed in the procedure.  Next, these values were plotted against “Quality” in a scatter plot, Figure 3.  The command lacked in the way of help or a man page, explaining the quality flag; however, the data indicated a point of diminishing return around a quality setting of “100”.

Figure 3.  A scatter plot of the data normalized as percentages.

As can be seen from Figure 3, at a quality setting of “100”, denoted with a red line, both time and size reduction plateaued.  The average time of execution was 740 seconds with a standard deviation of 21 seconds.  The file size reduction, again as a percentage relative to the test image, was more telling as the average was 97% with a standard deviation of 0%.  

From these results, it is safe to say that one should only run the function with the “--quality” flag from 84 to no more than 100, exclusive.  Should one run every image through this function with a setting of 84?  If you expect your image to be downloaded extensively by viewers, then it would make sense to pay the high computational costs upfront.  If however you don’t have reason to believe that an image that will be seen multiple times, then it is a bit more nuanced. 

Both time and file size reduction, using a quality bounds of [84,100), were further fitted using a linear least squares model, Figure 4.  

Figure 4.  Linear modeling of a subset of the data.

Linear Models 

The linear equations for Time(Quality) and SizeReducation(Quality) as shown in equations 1 and 2 respectively, along with a R2 goodness of fit of this type of a model.

  • Time(Quality) = -9.731X10-3(secquality)*Quality + 1.816 sec; R2=0.570 Eqn. 1
  • SizeRed(Quality) = 3.10X10-2(%Redquality)*Quality - 2.163 %Red; R2=0.999 Eqn. 2

A linear model is a better fit for understanding size reduction as a function of quality, as seen from the r2 ~100% from equation 2.  Although time as a function of quality appears less suited for a linear model, for the purposes of this experiment we will accept an r2 of ~60% as acceptable.  How best to optimize between these two lines, minimizing time whilst maximizing file size reduction? This is where it becomes exceedingly difficult to answer definitively, as the processing speed of your computer, connection quality to the internet, and server load are critical factors that will vary greatly.

Indeed, any sort of modeling is borderline wild speculation due to the parameters being so wildly varying for each use case.  In the case of the computer used, a late 2009 Apple iMac i7, 1284 seconds was used as the baseline (100%).  If this computer were a web server, providing a 2.5 MB image 100,000 times during the effective lifespan of this file, operated over WiFi (802.11n) with a theoretical speed of 56 MB/sec, then the total time spent transmitting would be roughly 45,000 seconds.  In this scenario, using the highest quality setting would result in a savings of nearly 12 hours.  However, it is more reasonable to assume that this image will be viewed at most 200 times during its effective lifespan resulting in a net loss of nearly 1275 seconds.  Again all of this is pure silliness as we have assume zero cost of storage, the ongoing operating costs of a server, etc.  

Is there a sweet spot?  Probably not, and if this is being performed professionally then the calculus is exceptionally easy:  perform the highest reduction.  It is much more important to ensure a great user experience, and an increase in the likelihood of a sale than saving a few clock cycles.   

Quality Analysis

As for the quality of the output that is generated, regions of the original test image were selected as test patches for:

  1. Geometric lines

  2. Color rendering

  3. Human shapes

The regions selected can be seen in Figure 4 and sought to address the above three measures.  These regions will be blown up significantly in photoshop as to provide insight into how they might differ from the uncompressed test image.  As it was shown that images above a quality setting of 100 were effectively the same size, only images below this threshold were considered.  In the interest of time, only images created using a quality flag of:  84, 89, 94 were considered.  

Figure 4. Regions selected for additional analysis.

The regions addressing how the function performs on geometric lines are labeled as such in Figure 4.  Each of these labeled portions were then magnified by 444% in the uncompressed and three compressed files.  Finally a composite image was created to allow for qualitative analysis between all four images considered.  Figures 5 - 7 present these regions and files for further inspection.

Figure 5.  Geometric portion of the test image.

As expected, at the greatest level of compression (Quality 84) the image was a bit more blurred than the uncompressed file and the straight geometric lines help to observe this phenomena.  The lines of the guardrail seem to be blending slightly clockwise due to this blurring effect.  This “bending” was less pronounced at decreases amounts of compression.  Overall though, the researcher was unable to see any significant differences between the maximally and minimally compressed files.  If you are to use this tool and have sufficient computational as well as time resources, then one ought to consider using the lowest possible value for the quality flag.

Figure 6.  Color rendering part of the test image.

Initially it was thought that an easy to perform and understand test of dynamic range would be possible.  This proved too difficult to perform and so a comparison of the uncompressed and compressed image with respect to how well color was rendered/reproduced.  This comparison was based solely upon visual inspection, with all four images having the same RGB reproductions. 

It was realized that greater analysis was needed, so three points in the image were selected and the color information extracted using Adobe Photoshop.  The three points are shown in Figure 7 and were selected as they best characterized either Red, Green and Blue portions of the image.  Next the RGB information of these three pixels for each image was determined.

Figure 7. Those pixels selected for color analysis. 

The four image files were individually analyzed with respect to these three pixels for their color information. This information can be found in Tables 1 through 3.  

BLUE PIXEL

C

M

Y

K

Uncompressed

81

49

28

5

Quality 84

81

49

28

5

Quality 89

81

49

28

5

Quality 94

81

49

28

5

Average:Standard Deviation

81:0

49:0

28:0

5:0

Table 1.  Blue pixel detailed color information

Using the CMYK color model on the Blue pixel, labeled “B” in FIgure 7, all four images were similarly processed.  The pixel was selected, using a custom action built in Photoshop and the information extracted.  This yielded the same value, with a zero standard devation, regardless if the file was compressed or not, Table 1.   

RED PIXEL

C

M

Y

K

Uncompressed

47

70

43

17

Quality 84

47

70

43

17

Quality 89

47

70

43

17

Quality 94

47

70

43

17

Average:Standard Deviation

47:0

70:0

43:0

17:0

Table 2.  Blue pixel detailed color information

Table 2 provides the CMYK values obtained for the red pixel, labeled “R” in Figure 7.  Again the same analysis was seen, with no change in values due to compression.  

GREEN PIXEL

C

M

Y

K

Uncompressed

72

56

48

26

Quality 84

72

56

48

26

Quality 89

72

56

48

26

Quality 94

72

56

48

26

Average:Standard Deviation

72:0

56:0

48:0

26:0

Table 3.  Green pixel detailed color information

The green pixel, labeled “G” in Figure 7 provided no new information.  The function did not result in any measurable change using the test method utilized, as seen in the results shown above in Table 3. 

The selection of a pixel and determining the CMYK color model values as a function of compression, resulted in no measured change.  This may not be a well suited test suite, but it was a first attempt by the researcher to test the function’s effect upon color variation as one changes the levels of quality.   

Figure 7.  Human Subject portion of the test image, talk about taking one for the team.

This region of the image was selected due to the complexity and organic nature of the forms.  Besides the increased blurriness, already seen earlier, there’s little difference discernible to this researcher’s eyes.  In fact, this blurring of pixels may be considered a slight advantage due to a smoothing out effect.    

Conclusion: 

Google’s image compression tool, guetzil, was studied using a variety of methods.  The correct form of this function was determined to be:  

>guetzil --quality # inputFile.jpg outputFile.jpg

Such that “#” refers to the level of quality, and affected greatly the amount of time spent in addition  file size reduction.  The setting of ‘84’ requires the most amount of time and results in the greatest file size reduction, generating an output file that is around ~40% of the uncompressed size.  Due to the complexity in deciding whether or not you need maximal compression, no guidance was suggested.

As for the actual quality of the compressed image as a function of compress, three levels were analyzed with respect to geometric forms, color variation, and complex or human forms.  Apart from an increase in come blurring, compression yielded images that were as nearly good as the original uncompressed image.  No changes to color quality were detected using either visual inspection or CMYK analysis.

The guetzil function was studied due to the researcher’s complete lack of understanding, and desire to explore something outside of their wheelhouse.  The testing methods deployed may have been too simple and a potential of deviating greatly from commonly used practices would not be surprising to the researcher.  However, it is hoped that some of the information contained within this experiment may be of help to others in their exploration of this interesting tool.  

References: 

Appendix:

Posted by Jeff Flowers / 08 June 2015 / Tutorials / Hits: 7334

Go is a powerful new programming language, allowing for rapid prototyping of all types of web apps.  The following is a beginner’s guide to installing this language on a Mac running OS X 10.10.3.  You will need to have Go installed should you wish to also install other tools, such as IPFS later down the road.  This tutorial assumes no prior knowledge.  Enjoy and get developing.

Overview:

  1. Procuring and Checking the Go Language Package
  2. Installing the Go Programming Language
  3. Verifying the Installation Process
  4. Writing a Go Program Using TextEdit

Posted by Jeff Flowers / 08 June 2015 / Tutorials / Hits: 13889

Node.js and the NPM package manger are powerful tools, allowing for rapid prototyping on one’s own computer system as though it were the end server containing your web application. The following is a beginner’s guide to installing these tools on a Mac 10.10.3 system. It assumes no prior knowledge. Enjoy and get developing.

Overview:

  1. Procuring the Node.js
  2. Installing the Node.js Software
  3. Verifying the Installation Process
  4. Installing Packages Using npm

Posted by Jeff Flowers / 02 May 2015 / Bitcoin/Blockchain / Hits: 9451

One of the earliest reasons to originally stir my interests in Bitcoin, was the notion of an immutable database or the Block Chain. A single source of shared truth, distributed and available to all to see; secured within a protocol really got my mind spinning. And although there are many novel methods being developed to fully harness the power and potential of Bitcoin, one cannot fully live and work within the Block Chain. The shear number of data structures and services that we depend upon in order to make up our Internet today simply cannot fit, presently.

Then I saw IPFS – and it made total sense.

Data, or content, cannot be lost. Content is parsed out on the network in a distributed fashion, as the Internet was originally envisioned and meant to be.

I suspect if you are reading this blog, you too see the value in this incredible protocol: IPFS. So why a Raspberry Pi (RPi)?

Talk to Us!

Get a quote, give us suggestions, or send us a hello :)

816.983.6923

San Mateo, California

PGP: 20AE A7E7 6CA8 F81B 4FB1 C942 BA34 7830 1DEB 320E