All day, every day, companies are now collecting enormous amounts of data about their products, customers or by simply using their information systems. Such increased availability of data raises one important question: how to use the gathered data to generate meaningful information? What is important is to connect the data and discover patterns. There are several buzzwords / terms you may have already heard related to this approach, such as Data Mining, Multivariate statistics, Machine Learning and Artificial Intelligence. Sometimes it is hard to tell the difference between these various terms.
Table of contents:
1. Data Mining
In this post, we will explain the differences in an understandable way and present the different methods associated with “digging for data” – the “Gold of the 21st Century”.
Data Mining
Data Mining is the computing process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. (Data Mining Curriculum”. ACM SIGKDD. 30/04/2006. Retrieved 27/01/2014.)
The aim of Data Mining is to detect correlations in data that are of interest to and useful to decision-makers. They may be used for better decision making, to address issues that may have been solved by suboptimal methods in the past. The use of statistical significance is certainly not sufficient for the operationalization of the interest and usefulness of statements.
For this reason, integrated methods and methods of artificial intelligence, pattern recognition, machine learning, as well as models from the respective field of application, are used in Data Mining. In addition, methods of (multivariate) statistics are used. In this case, coherent observations are used as a basis for several features, along with procedures for detecting and examining structures. [1]
In contrast to classical approaches used in the field of statistics, data mining not only extends to the examination of manually established hypotheses, but mainly involves the automatic generation of new hypotheses. [2]
The basic tasks involved in data mining are:
- Cluster analysis: This involves identifying groupings / clusters within the data, e.g. customer segmentation.
- Association analysis: Analysis of the frequency of the simultaneous occurrence of objects or events, e.g. a Shopping Cart Analysis.
- Deviation analysis: Identification of unusual data sets, e.g. discovery of fraudulent transactions or detection of input errors.
- Principal component analysis: Reduction of the data set to a compact, simplified description, evaluation by questionnaire (The questions “How satisfied were you with the product?”, “Would you buy the product again?” and “Would you recommend the product to a customer?” refer to one item and can be summarized).
- Sequence analysis: identification of patterns in chronologically successive events, such as click behavior on websites.
- Classification: Assignment of previously unknown objects to existing classes or groups, such as the identification of spam or classification into credit classes.
- Regression analysis: Identification of relationships between dependent and independent variables, such as the relationship between product sales and marketing measures.
The following are pre-requisites for the application of data mining techniques:
- It is actually allowed to analyze the corresponding data
- Patterns from the past are valid in the future
- A high quality of data is available
- The data set contains what is needed to make forecasts
Data Mining projects can be divided into several phases. The Cross Industry Standard Process for Data Mining, developed by an EU project between 1996 and 1999, sets out a six-phase cross-industry standard for Data Mining:
- Business Understanding: Set business goals, assess situation, set Data Mining targets, create project plan
- Data Understanding: Initial data collection, data description, exploratory data analysis, data quality verification
- Data Preparation: Data selection, data cleansing, data construction, data integration, data transformation and formatting
- Modeling: Selection of modeling technology, design of test design, modeling, model evaluation
- Evaluation: Evaluation of the results, process revision, determination of further action
- Deployment: Concept usage, implementation, monitoring and maintenance, preparation of the final report, project review
Data Mining is part of a higher-level process known as Knowledge Discovery in Databases (KDD).
Artificial Intelligence
Research in the field of “Artificial Intelligence” (AI) attempts to replicate human perception and human action by machines. All technologies, such as machine learning, deep learning, or neural networks used in the context of the provision of intelligence, which have until recent times been the preserve of human beings, now fall under the umbrella of AI. While “strong” AI describes a previously unattained state in which a machine is capable of anything, “weak” AI deals with transferring individual skills of human beings to machines, such as the recognition of text, image content, playing, speech recognition and so on. Rapid progress has been made in this area in recent years. “Machine Learning”, “Deep Learning”, “Natural Language Processing” (NLP) and “neural networks” are therefore only sub-areas of AI, or even partial sub-areas within these sub-areas. [3]
Machine Learning
Machine Learning describes mathematical techniques that enable a system (machine) to independently generate knowledge from experience. ML algorithms are used to recognize patterns in existing data sets, to make predictions or to classify data. Mathematical models can be used to gain new insights based on these patterns. The applications range from music and film recommendations in private to the improvement of marketing campaigns, customer services or even logistics routes in the area of business.
Although “Machine Learning” uses linear regression, decision tree algorithms, or cluster analysis, as does data mining, one major difference is that it is less focused on recognizing new patterns in data than on developing appropriate models to discover known patterns in new data. [4]
Artificial Intelligence and Machine Learning (ML) are not even particularly new technologies, but they have only just begun to play a role in practical use relatively recently. The prerequisites for learning systems and corresponding algorithms have not existed for that long, or had simply been too expensive to meet in the past. But we now have sufficient computing capacities at our disposal and access to the very large amounts of data necessary to make such solutions viable. According to a survey, German companies are already quite advanced. In a recent survey, one in five companies in Germany said they were already actively using ML technology, while 64 percent said they were taking a close interest in the topic and as many as four out of five respondents said that ML would be one of the core technologies of the fully digitized company of the future. [5]
At the moment, machine learning, deep learning, and cognitive computing are just some of the more well-known notions among a constellation of AI terms that are not easy to define. For this purpose, the dimensions of application and degree of autonomy can be used. For the most part, ML systems have been developed and trained for applications. For example, they detect defective products during the manufacturing process within the framework of quality control. The task is clearly defined, there are no game rooms.
Deep Learning
Deep Learning systems, on the other hand, are able to learn independently. In cooperation with large amounts of training data, neural networks learn to learn independently and to make decisions and can thus perform specific tasks – for example, the identification of cancer cells in medical images.
Cognitive Computing
The third type of AI is Cognitive Computing. These systems are distinguished by the fact that they can take on tasks and make decisions by performing an assistance function or even acting as a substitute for human being, while at the same time dealing with ambiguity and uncertainty. Examples include case management in insurance, customer service hotlines, or diagnostics in hospitals. Even though a high degree of autonomy can already be achieved in such areas, there is still a long way to go on the path to true artificial intelligence with autonomous cognitive abilities. In the meantime, companies would be well-advised to take a close look at the feasible use cases, of which there are already many.[6]
Sources:
[1] http://wirtschaftslexikon.gabler.de/Archiv/2346/multivariate-statistik-v9.html
[2] http://wirtschaftslexikon.gabler.de/Archiv/57691/data-mining-v9.html
[3] http://t3n.de/news/ai-machine-learning-nlp-deep-learning-776907
[5] Cf. https://www.computerwoche.de/a/machine-learning-darum-geht-s,3330413
[6] See https://www.computerwoche.de/a/machine-learning-darum-geht-s,3330413