Syndicated Knowledge Base (KB) - Australia & NZ

crea.science has been using internet querying & scraping tools to collect publicly available information & insights on Australian Healthcare Professionals (HCPs), including 25 000+ General Practitioners (GPs), for more than 5 years. This information is captured and constantly updated & added to, in the crea.science Syndicated Knowledge Base.

What Information is Held in the Knowledge Base (KB)?

The KB holds a variety & depth of information on individual HCPs as well as their environment (i.e. medical practice).

The information is extracted from many sources including directories, practice websites, medical publishing sites, social media sites, etc. It is stored in its raw form, e.g. clinical interests mentioned in biographies or as flags e.g. Y/N for Twitter account.

The KB currently holds over 300 features and more are added as the needed information is identified.

Updates are constantly performed to keep pace with HCPs evolving web profiles.

How is the Knowledge Base Useful to Pharma?

The information held in the KB is traditionally used by Pharma in 3 ways.

To Build HCP Profiles

To Build Composite HCP Indices

To Build HCP Indices via Extrapolation from Already Validated Reference Sets

Raw data are extracted as features or flags and used to build profiles e.g. stated clinical interests, use of social media, etc.

Machine Learning techniques are used to combine and investigate all the data and to generate indices e.g. for potential, influence, early adoption, proficiency.

The reference set is used to train the machine learning algorithm on what to look for in the data. Once validated, it may be used to predict potential, influence for all HCPs in the target population of interest.

Learn More – A Glimpse of Our KB

See an Example – GP Digital Proficiency Index

Learn More – Using Internal Knowledge in Machine Learning

A Glimpse of Our Knowledge Base

Insights on GPs

Clinical Interests

Social Media Usage

Research Activities

Online Self-Promotion

Biography

Quality of Care

…

Covers almost any publicly available information on the Internet

Regular updates to keep pace with GPs evolving web profiles

Insights on Practices

Use of Digital Technology: online bookings, digital content, blogs, website complexity

Online Repeats

Availability of Allied Health

Business Hours, Bulk Billing

Languages Spoken

…

Internal Knowledge Can be Used to Train the Machine Learning Algorithm

Where the client already has internal knowledge, e.g. a list of validated KOLs, the extracted data from the Knowledge Base (KB) for these already validated KOLs, can be used as a reference set to train the algorithm to identify other KOLs with similar profiles across all features in the KB. From a small set of known KOLs, all other HCPs can be assigned a potential “KOL Influence Score” based on their similarity to the reference set.”

If you possess existing data on some GPs’ digital engagement make use of our Knowledge Base and Machine Learning to expand predictions to all GPs in the target population.

Ask us how.

Our Data Collection & Generation Process – So Much More Than Querying & Scraping

Web Querying

A method of extracting information from the internet using keywords in a search engine, e.g. Google or MS Bing.

Web Scraping

A method of extracting information from relevant websites.

Scraping goes far beyond querying. Internet query may be used to identify websites for scraping. The depth of information scraped can include, Drs. interests, number in the practice, practice email, fax no., online bookings, bulk billing, car parking, etc.

Large number of individual features or characteristics can be scraped in this manner. On their own some may seem irrelevant, but when combined and incorporated into the machine-learning algorithm, each feature and their complex relationships can add to the profile.

Data Engineering – Extensive Data Validation & Cleansing

Over the years we have put in place complex data matching, duplicate identification, validation & cleansing algorithms to make sure that we correctly identify HCPs and that we extract relevant and consistent information.

Generating New Features

Using our knowledge of pharma & the business problem at hand, we also generate additional features based on the raw information extracted from queries & scraping results.

For research activities, we typically calculate the median authorship position for medical publications. The position in the list of authors may be used as a proxy for influence. We typically generate hundreds such features.

Machine Learning Based Data Aggregation

Typically results are aggregated into an index or score e.g. social networks – a higher score assigned where the search indicates high engagement with social networks.

The nature of the indices depends on the goal of the application, e.g. predicting potential, influence, early adoption, digital proficiency, digital engagement, etc.