Pages

Sunday, October 17, 2010

Data Mining in Manufacturing & Process Industries

The use and application of Neural Networks (NN) has found a “home” in the domain of industrial process control. At the same time, NN is practically a core function in most popular data mining solutions. NN algorithms have been embedded in process control solutions, yet sometimes seen or even projected as a bit of a “black box” or “magic box”. Obviously, because of the complexity involved for most process control engineers to rationalize the output of an NN algorithm.


Root Cause Analysis (RCA) has traditionally been conducted by core statistical applications in order to identify cause of failure of plant equipment. RCA is classified based on the use or objectives as:
  1. Safety-based RCA, which descends from the fields of accident analysis and occupational safety and health
  2. Production-based RCA, which has its origins in the field of quality control for industrial manufacturing.
  3. Process-based RCA, which is an “add-on” to production-based RCA, but with a scope that has been expanded to include business processes.
  4. Failure-based RCA is rooted in the practice of failure analysis as employed in engineering and maintenance.
  5. Systems-based RCA emerged as an amalgamation of the preceding uses, along with ideas taken from fields such as change management, risk management, and systems analysis.
In the course of an RCA initiative, the need to deal with substantial volumes of process data is well known. The combined dependent and independent variables can be in the range of hundreds for a single knowledge discovery problem addressed with data mining, and it is not uncommon that analyses without deep process knowledge can fail because of insufficient understanding of the process characteristics and behavior in question.  
As such, data collection and the ability of a data mining solution to interface, sometimes in near real time, with plant data bases residing in a control systems (i.e. Integrated Control and Safety System) or a Historian data base, is key to the development of an integrated solution that can be deployed for use in a dynamic environment versus performing data mining analytics offline. 
The approaches, principles or algorithms referred for the retail industry in the previous section can be extended in the process industries by considering sequence patterns of similar nature, except that we are dealing with much larger data sets, considering that the time resolutions of such sequences can be seconds or even milliseconds.
Take the scenario of a huge data set with a long sequence of events and alarms, with thousands of triggered flags or events, logged operator actions and also changes in battery limit conditions, and then add changes that are being tracked due to heat exchange fouling or catalyst characteristics, etc. , and then try to rationalize the outcome and impact on the behavior of continuous or discreet variable or a number of variables, be them trip events or an alarmed deviation of a safety or quality variable. Try also to visualize the endless number of dimensions (variables) involved as those related to the plant or process areas, the time dimension itself, the operators involved, the process unit operations or physical assets associated with the plant areas and units, etc., and then it becomes obvious that a data mining scenario of high complexity evolves. Yet , this is where data mining brings value and makes its money, because data can be both explored and analyzed in so many ways, but also used for predictive purposes by using the same techniques as in the retail example. It is by far a more complex situation that any other industry can offer.
In the case of the RCA example, data, once extracted, transformed and loaded for mining, rules about associations or the sequences of items as they occur in a transactional database can be established and make them useful not only for addressing the RCA problem in concern, but for many other applications, including exploratory and predictive data mining, as for example predicting runaway conditions for a catalytic reactor in a plant or preventing off spec production.