How machine learning is improving our fraud detection systems

by Aharsh MS · April 29, 2020

Are you aware that in 2020, the number of customers who have fallen victim to fraudulent activities was around 25.4 million people, which is 16 percent higher than 2019. Advancements in the digital space opens up new channels for financial services distribution all over the globe. But it also creates a perfect environment for fraudsters.

Cyber criminals all over the world are taking advantage of the rapid advancements in technology to carry out attacks that involve money laundering, identity thefts, mobile frauds, etc. One of the most frequently occurring types of cyber crime is bank and credit card fraud. All of us have had our fair share of online shopping. Chances are that most of us have shared our card details to a third party service at some point. So, the soaring occurrences in card fraud shouldn’t come as a surprise. According to McKinsey, worldwide losses from card fraud could be close to $44 billion by 2025.

Along with the direct losses incurred through fraud, a lot of companies witness a loss of sales when false positives are generated by fraud management systems. This means that genuine transactions are declined as well. False positives constitute upto 25% of declined transactions for e-commerce retailers.

Fraudsters are becoming increasingly skilled at finding weak spots in the system. Before the company can patch up the system, they manage to steal sensitive data and cause major losses. In July 2019, a hacker named Paige Thompson broke into a Capital One server and gained access to 140,000 Social Security numbers, 1 million Canadian Social Insurance numbers and 80,000 bank account numbers, plus an undisclosed number of people’s names, addresses, credit scores, credit limits, balances and other information, according to the bank and the U.S. Department of Justice (DOJ). Evidently, fraud is becoming a major problem for companies and banks must find a way to quickly identify and separate fraudulent transactions from legitimate ones, without compromising on customer experience.

Traditional methods of fraud detection

The traditional way to detect fraud often involves a team of analysts who used a rule based approach to detect the probability of fraud in a particular transaction. It involves defining certain rules and label actions. If a particular action does not match them, it will be considered as anomalous and potentially worth checking. For example, if a credit card transaction is more than larger than the average for a certain customer, a notification is sent out.

Through a rule-based approach, a complex set of criteria is applied for identifying suspicious transactions. This can be effective in discovering anomalous transactions that follow a known pattern, it cannot identify or detect fraud that follows a new or unknown pattern. This motivates fraudsters to develop new techniques to find loopholes in the rules.

Rules vary from the traditional statistical ones (e.g. flag all transactions larger than 3 standard deviations from the mean as suspect) to business rules (block credit card after three wrong PINs are entered consecutively). This group of rules allows human experts to apply their expertise while making decisions, but this can be very difficult and time-consuming to implement well. If the experts make an omission, undetected anomalies will happen and nobody will suspect it.

[contact_sales title=”Looking for reliable remote developers?” desc=”Choose from our pool of experienced developers” btn_text=”Get rate cards” url=”https://accubits.com/hire-remote-developers/”]

Why do we need machine learning for fraud detection?

Machine learning is much more effective

Fraudulent transactions follow certain patterns that authentic transactions don’t. Machine learning algorithms can detect and identify these patterns and come to a decisive conclusion as to whether a particular transaction is legitimate or not. They are much faster and more accurate when compared to humans. This is because they can detect new or unrelated patterns that are often unnoticed by humans. They can also process large amounts of data quickly and store them in their memory forever.

ML algorithms can handle data overload

Online fraud is becoming harder to detect as technology becomes more sophisticated. Companies and fraudsters are constantly competing to outsmart each other. This creates a high pressure environment where companies will need to process much more data than they can usually handle. A business that is equipped with a team of top data scientists will still find it difficult to detect online fraud as fast as it happens. This is why ML algorithms are so useful in such cases. They can work 24/7 and process data in no time.

Machine learning can fix the problems faced by a rule based approach

The traditional method to detect online fraud is the rule based approach that was discussed earlier. There are a number of reasons as to why machine learning can do a better job than this method. It can help overcome all the limitations that are posed by it.

The main limitations with a rule based approach are:

Rules are limited by fixed thresholds. Every rule has a threshold. For example, “block when greater than 8 transactions in an hour.” The ideal value for this threshold can change over time. Machine learning can understand this from the data and adapt. A static rule system cannot.
Rules are limited by being absolute. Each rule is a “yes” or “no” decision based on a threshold. This makes it inflexible and inaccurate. On the other hand, machine learning models naturally produce a score from 0-1000, similar to a credit score, that enables a range of actions to be carried out based on different risk tolerances.

Rules provide low coverage. In many cases, only a few highly accurate rules can be found. In order to identify and block more than a small number of risky transactions, companies will have to add additional rules that have a decreasing accuracy. The end result can be an unacceptable rate of incorrect transaction blocking.
Rules have low relative performance. Without a doubt, a hybrid approach that combines rules and models will have superior performance metrics when compared to rules alone. This is because machine learning models are able to identify risky transactions that simple rules alone can’t find.

How does machine learning work in fraud detection?

The first step in fraud detection is data collection. The machine learning model processes and analyzes the gathered data and retains the required features from it. After this, the model is fed with training sets that teach it how to predict the probability of fraud. In the end, it creates a fraud detection model that is ready to identify and detect fraudulent transactions.

The first step involves the input of data. This is vastly different for the ML models and humans. Humans cannot process large sets of data and can get overwhelmed very easily. In contrast, machine learning models thrive with more data. This is because ML algorithms become more efficient when large amounts of data is provided to them.

After data input, the ML model extracts the relevant features. Here, the features that describe good and fraudulent customer behaviour is added. This usually involves information on the customer’s location, identity, orders, network, and chosen payment method.

The next step involves the launch of a training algorithm. This algorithm is a set of rules that an ML model must follow while deciding whether an operation is legitimate or fraudulent. The more data a business can provide for a training set, the better the ML model will be.

Finally, when the training is completed, the company gets a fraud detection model suited to their needs. This model can detect fraud with high accuracy in no time.

Types of ML algorithms

Supervised learning

Supervised learning is one of the most frequently occurring methods of implementing machine learning. In a supervised learning model, a random sub-sample of all the data is taken and manually classified as either ‘fraudulent’ or ‘non-fraudulent’.Rare events such as fraud are often over-sampled to provide a big enough sample size for the ML model. A supervised machine learning algorithm is trained by using these manually classified records. A supervised learning model is based on predictive data analysis and is only as accurate as the training set provided for it. A major drawback of the supervised model is that it’s not able to detect fraud that was not included in the historical data set from which it learned.

Unsupervised learning

In some cases, there may be little or no transaction data available. An unsupervised learning model can be extremely useful in such scenarios. They constantly process and analyze new data and update itself accordingly. Such models can detect patterns and make decisions about whether they are part of legitimate or fraudulent operations. Unsupervised learning is usually associated with deep learning in fraud detection.

Semi-supervised learning

This type of ML works in situations where labeling information is either hard to carry out or too expensive and will need the experience of human experts. A semi-supervised learning algorithm retains data about key group parameters even when group membership of the unlabeled data is unknown. It does so based on the assumption that the discovered patterns can still be valuable.

Reinforcement learning

Ideal behaviour can be detected within a specified context by a machine through a reinforcement learning algorithm. It constantly learns from the environment to find actions that minimize risks and maximize rewards. A reinforcement feedback signal is required for the model to learn its behavior.

Fraud scenarios and their detection

Data scientists make use of a wide variety of techniques, which can be better understood in terms of the problems that they have solutions to: classification and regression. Both of these problems are utilized to analyse data and make decisions as to whether a transaction is genuine or fraudulent. The common supervised machine learning algorithms used to solve these problems are logistic regression, decision trees, random forests, and neural networks.

Logistic regression is a commonly used method that identifies the strength of cause and effect relationships between variables in data sets. It can be used to create an algorithm which predicts whether a transaction is ‘good’ or not.
Decision trees are used to create a set of rules that model customers’ normal behavior and can be trained, using examples of fraud, to detect anomalies.
Random forests (boosting techniques) ensemble multiple weak classifiers into one strong classifier – they can be built using an ensemble of decision trees
Neural networks is a well known technique that is based on the human brain and how it works. This technique allows algorithms to learn and adapt to patterns of normal behavior, neural networks can identify fraud in real-time.

Unsupervised techniques are based on clustering algorithms, which group similar data points together – they are used for anomaly detection. Algorithms used in the unsupervised approach are K-means clustering, Local Outlier Factor and One-Class SVM.

K-means clustering divides a dataset into clusters. The algorithm works iteratively and assigns data points to one of the predefined number of classes (k), based on the features that are in the dataset. Data points are clustered based on feature similarity.
Local Outlier Factor, is an algorithm that calculates the local density of data points and allows for identifying regions with similar density in the data set. By using the locality concept, one can distinguish points with much lower density than other neighbours. These points are outliers (fraudulent transactions)
One-Class SVM learns a function used for novelty detection. The idea of novelty detection is to detect rare events, i.e. events that happen rarely, and hence, of which you have very little samples. The problem is then that the usual way of training a classifier will not work

Large companies all around the world have started using machine learning algorithms to prevent issues such as fake accounts, account takeover, payment fraud, and promotion abuse. An outdated financial system can result in major losses for organizations. ML fraud prevention solution providers like Feedzai claim that a well-trained machine learning solution can identify and prevent 95% of all fraud while minimizing the amount of human labor required during the investigation stage. This is why it is really important for them to adapt to the changing technology and incorporate machine learning methods into their fraud detection systems.

artificial intelligence fraud detection machine learning

Written by

Aharsh MS

Aharsh is a tech entrepreneur, a visionary. He believes that each of us is here to do something purposeful, something which counts to leverage our species beyond any limits. And greatly inspired by thoughts and ideas supporting such purposes, to building a future with profound possibilities, a future where technology negates all human miseries!

Traditional methods of fraud detection

Why do we need machine learning for fraud detection?

Aharsh MS

Related articles

The 3 Most Common Reasons AI Initiatives Fail — and How Outcome as a Service (OaaS) Prevents Them

Becoming AI-Ready with Model Context Protocol (MCP) Servers

Generative AI in Banking and Financial Services