• aaron hussey

Technology focus... importance of understanding data compression

Data that is collected from process parameters is often stored in data historians with time records of one minute or more between samples. For example, a main steam temperature process parameter might be stored as 1050 degrees Fahrenheit (degF) at timestamp 2/2/2020 11:59:00 AM EST. The parameter at timestamp 2/2/2020 11:59:01 AM EST, unless higher than a 0.5 degF change, might be stored as 1050 degF when in-fact, the parameter might have been 1050.4 degF when measured. When considering the dynamic conditions across the entire plant cycle, this practice of “data compression” effectively loses the ability to assess historical operations at a fidelity where dynamic conditions are captured. This is important for two reasons – 1) process control optimization is limited to a macro understanding of dynamic behavior; 2) equipment reliability programs are based on overall thresholds on parameters instead of actual process dynamics.

As an example of the effects data compression has on data fidelity, consider the figure below. On the left, main steam temperature in blue is plotted versus compressed main steam temperature (stored) in orange. There are areas in the orange line that do not change, illustrating a loss of data fidelity. On the right, final feedwater temperature is subtracted from main steam temperature. The blue line is the uncompressed results and the orange line is the compressed results. There is at least a 1.2 degF difference between the two in the area highlighted.

Figure 2 Main Steam Temperature Data Compression (Left); Main Steam Temperature minus Final Feedwater Temperature (Right)

In order to achieve the collection and storage of truly dynamic conditions in plant data historians, better data compression algorithms are needed. It would be impractical to store all data at high frequency (e.g. storing every 0.001 degF of main steam temperature every one millisecond). Compression of 1,000’s of process parameters is necessary due to the vast quantity of data – data stored every 1 millisecond for 1,000 parameters would quickly fill up even the largest data archive systems that we have today. The recommendation in the meantime is to evaluate and reduce, where possible, compression limits on data considering that there is a point of diminishing returns for every instrument loop.

Please write me at aaron.hussey@int-analytics with comments or questions.


©2020 by Integral Analytics, LLC.