It is commonly acknowledged that consumer segmentation always remains an ongoing process. This is especially true for HCPs, even for an established market, as the HCPs themselves arrive, leave, move, competitors’ products enter the market, regulations change, to name a few influencing factors. Implementation of machine learning solutions in this context becomes a key asset to stay on top of all evolutions in the market. Even more importantly, even in a static context, it allows identifying complex patterns driving the potential of HCPs and targeting some that had never been considered before.

What is Actually Machine Learning?

Disclaimer: Ask any data scientist, data engineer, statistician, etc. what machine learning is and you will likely get quite different answers. This is not unexpected given how multifaceted the tools and algorithms falling under this term are. So the upcoming definition tries to be pragmatic and oriented towards applications in segmentation and consumer targeting, but it is not meant to encompass all aspects of the field.

Detecting Reproducible Patterns in Data

Machine learning (ML) in its current form was born at the end of the previous century, as a combination of the growing availability of computing power and new paradigms in data modelling. The underlying idea was and still is to use algorithms to detect patterns in data that are not a consequence of chance, i.e. they can be reproduced, and can be used to build predictive models whose accuracy can be clearly established.

Various Forms of Learning

In its most innovative incarnation, ML provides an automated way to acquire knowledge on complex relationships between an observed phenomenon and a series of measurements (often referred to as features) used as an input. A classical example in banking would be credit card fraud detection where the outcome is the validity of a payment and the features to predict it include among other things to the consuming patterns of the card owner. This class of problem is generally called supervised learning.

A second classical group of ML algorithms were developed to address the situation where there is no factual information about the observed phenomenon to predict, but still a series of available features relating to an underlying one. And again, the goal consists of detecting patterns that can be related to the phenomenon of interest. A classical application in marketing corresponds to the construction of groups of consumers behaving in a similar way in order to develop targeted marketing campaigns. This type of situation where the outcome cannot be directly quantified are often called unsupervised learning. As one would expect, methods in this context are slightly less efficient due to the lack of reference data, but this is nevertheless a common situation (see more examples in the next section) where it allows to generate strongly actionable results.

New Tools for Old Problems

Most problems ML tries to address are in no way new, but the algorithms and methods used definitely are. The most recent algorithms allow for digging much more complex relationships. In the same way, the issue of predictive ability of models is quite old, with classical work on the topic dating from the 1970s and even earlier, but the increased computing power now available allows a far better assessment of the situation and the capability to detect reproducible patterns that was before impossible. Both situations of supervised and unsupervised learning described above can successfully be applied to a variety of tasks performed in the context of HCP targeting.

Many Applications For HCP Targeting

There are clearly many situations where ML can improve targeting, segmentation or consumer engagement. Below are a few examples of classical situations with ways to improve the process using various ML-based approaches.

New Product Launch

Let us start with the case of a new product launch. If the market is new for the company launching the product, not much is usually known about the potential of any HCP active in this market for prescribing the new product. On the other hand, a wealth of data may be available about the market/therapy area the product is targeting. Think about pharmacy sales, census information, etc. In such a situation, unsupervised learning methods will provide sound insights to get started with targeting by identifying practices which are more likely to be good targets for the new product. Add to this Dr-level data like their propensity to adopt new therapies early and ML will give a strong edge for launching the new product.

But it does not end here. Once the sales force starts bringing back feedback about the HCPs, the next natural step consists of incorporating this new intelligence into the ML process. These new data are especially valuable as they offer a more direct measure of the Dr potential. Technically, it means that one can switch to supervised learning methods, i.e. building predictive models. Practically, it means that the targeting becomes even more accurate and the new ML framework can accommodate more data sources and provide even more information.

Updating an Existing Segmentation

In this case, reference data exists at least in the form of existing segments, so supervised learning is the way to go. Often, it can also be available on a sample of Drs, either via market research, data provider or insider (think rep/MSL) knowledge. The principle then consists of building a score measuring the potential of HCPs with regards to the segmentation criterion and to detect patterns in all the available data that are related to different levels of this score. This allows building a predictive model that can be applied to all HCPs to assign them a predicted potential. The score can then be used to assign new HCPs to segments and reassess the current segment assignments.

ML and the Generation of Data Sources

As a last series of examples, it should be emphasised that ML also plays a role in the building of actionable data sources themselves. This step usually considered as a crucial step in data engineering, often represents a needed tool to generate valuable data. As an example, our Dr- and practice-level data sources which are based upon mining of a variety of web resources and online databases rely on machine learning in two ways. First, validation algorithms were built in order to discard irrelevant resources so that the extracted data do relate to the relevant HCPs or practices. Second, an important added value in these data comes from the ability to transform raw results into meaningful and reliable aggregated indicators. For this purpose too, ML algorithms are implemented to detect patterns in the extracted raw indicators and summarise them appropriately for a given purpose.

Immediate & Long-Term Benefits

Aggregating Multiple Data Sources

The most commonly acknowledged direct benefit of working with a ML framework is its extremely efficient way to combine several data sources. With it, you will never have to ask yourself again about a data source how it does fit in your current segmentation process. The integration of any data source is seamless, it just means adding it to the mix (of course after the usual preliminary steps of assessing data quality, redundancy, etc.) of existing data sources and updating the machine learning process. Furthermore, there are not many restrictions to the coverage of the data sources: even if they do not include all HCPs or some geographical entities, they still can be integrated.

Improving Targeting Over Time

Beyond the value of getting a first set of results, once a machine learning strategy has been put into place for a recurrent targeting issue, it becomes pretty easy to update these results and the underlying process when needed. Updating the already used data sources is pretty straightforward. As discussed previously, adding a new data source is not much more complicated. On top of that, the machine learning process can be taught to learn from its mistakes, which means that re-injecting feedback from the field force into the algorithms allows adjusting and improving the accuracy of the process. In many cases, this is done on a yearly or quarterly basis, but in a rapidly expanding market, it can really be done as often as needed.

Generating New Results on Demand

In a similar way, as soon as a ML engine is in place, it can be used at any time in order to get scores for new HCPs, or HCPs who moved to another practice. For this purpose, what are needed are measures for the same features as those used for building the process. If not all characteristics are available at the time when the scores need to be generated, the approach will still work. The only drawback is a drop in accuracy for the corresponding HCPs. Furthermore, if your ML process is linked to your CRM or any other database, the generation of scores is instantaneous.

Understanding the Drivers of Potential

For a long time, the main criticism against ML had to do with its “black box” appearance, i.e. the model worked, but it looked impossible to understand which features contributed the most to the predictive ability of the process and in which way. This statement does not hold at all with the latest class of algorithms available, and it is actually quite easy with them to both identify the top drivers and to visualise their effect on the predicted scores. Needless to say that such information can be invaluable to better understand a market and build custom engagement strategies.

Measuring the Value of Data Sources

Being able to add (and thus drop) data sources as needed from the machine learning process provides another key advantage: by comparing the predictive ability of the process with and without a given data source offers a reliable measure of the value of this data sources as far as the corresponding targeting task is concerned. So, if you paid a significant amount for this data source, or if you had to engage into a lot of efforts to make it usable, you will be in a position to determine whether it was worth the investment or not.

An Invaluable Leverage For Any Targeting Problem

Data sources and their usage have become the crux of most marketing strategies. No matter what the topic is, more sources become available over time with various scope and aggregation level. It no longer makes sense to make decisions based on a single dataset when other relevant information is available. Still, the lack of common ground/aggregation level is often considered as a bottleneck, even though it is at the same time what makes the value of using different data sources. In this context, it is hard to compete with machine learning to make the most out of all data.