Data Science in Auditing: What exactly are decision trees and what are they used for?
Machine Learning (ML) and Artificial Intelligence (AI) are both hot topics right now, but the audit industry is having trouble developing suitable use case scenarios. The reasons for this can be manifold, so what we would like to do here, with this series on data science, is to provide you with the basis you need to have a proper understanding of the methods and new incentives for the development of use cases. If the use of ML and AI algorithms is to fail merely because of an inadequate understanding of the methods, then clearly it is time to look at these topics more closely and to evaluate opportunities and risks.
In a previous blog post “How and why? – Artificial Intelligence“, we have already dealt with the different methods in the fields of ML and AI and provided a clear definition of what exactly a data scientist is. Building on this, this blog post takes a closer look at decision trees in the field of machine learning, which can be used to perform both regression analyses and classifications.
The why’s and wherefore’s of decision trees
According to the Munich neuroscientist Ernst Pöppel, each of us makes about 20,000 decisions a day, usually at lightning speed and for the most part unconsciously. It starts in the morning with the snooze button on your alarm clock, then continues with what you choose to spread on your toast at breakfast (assuming you opt for toast, that is!) and ends shortly before you head out the front door with the question of whether to wear a jacket or not today.
Of course, things will go on like this throughout the day. However, various studies in the past have shown that decisions based on data analysis, as opposed to “gut feeling”, lead on average to more productivity and thus more profit.
As the picture above shows, decision trees are an easy way to understand and visualize the paths to decision making based on clearly defined criteria. For the purposes of classification, for example, the algorithms proceed in such a way that the criteria divide the available data into as many heterogeneous groups as possible. Thus a decision tree can be very helpful when it comes to finding the right decision, or to get an overall picture of opportunities and risks in the company. But more about this in the next section.
Why are decision trees so helpful for the audit department?
When it comes to identifying opportunities and risks, decision trees are a method that can be used to good effect in the audit department. The reason for this is very clear and simple: it is not only makes it easy to see which criteria played a role in the decision-making process, but also allows the causes to be clearly identified.
That all sounds pretty superficial and general, so let’s take a look at a concrete example. Assuming you wanted to separate critical postings on the weekends from non-critical ones, you would first have to filter out all postings that were posted during the week, then manually audit a selection of postings that are as heterogeneous as possible and feed the results back into the database. The whole thing would then look something like this:
|Document No.||Doc. type||Account type||Username||Saturday||Sunday||Finding|
On the first line, we can see the criteria for the decision tree and the respective characteristics of the postings on the subsequent lines. Note: The criteria are not complete and are simply for illustrative purposes.
If we now apply an algorithm for generating a decision tree to the table above, by way of an example, the result will look like this:
On the basis of the decision tree alone, we can now see very clearly in which cases we need to take a closer look and classify future data accordingly. Of course, the criteria shown are not necessarily a “finding”, but the corresponding algorithms can take additional probabilities of a certain “finding” into account if the possibilities are unclear. As an auditor, you therefore know you need to focus on vendor invoices (KR) posted on Saturday and customer payments (DZ) posted by a user name starting with a letter greater than or equal to “K”.
The procedure has of course been simplified for the purposes of this example, but it clearly shows how the findings of an audit can serve as input for a decision tree in order to make subsequent audits more efficient. To continuously improve the results, keep feeding the data back into the database each time and recalculate the decision tree after each audit so that new patterns are included in the data.
Data is the gold of the 21st century and decision trees are just one way of using this data sensibly and efficiently in auditing. Of course, adopting such an approach not only has the advantage of being able to carry out the audit even more effectively and in a shorter time, but also means that the auditor’s knowledge is not lost for good in the event of any changes at the company. There are so many diverse aspects to data science that they have an impact not only on computer science but on psychology and the social sciences too. As a result, you should consider how best to approach the topic, what possibilities exist for your business, and which resources may be necessary in the future. This is an area we have already started to explore. We are currently testing various methods presented in scientific papers and are always happy to hear from companies with ideas for projects which they would like to work on together with us and students specializing in the field. If this sounds of interest to you, then please do not hesitate to contact us here.