Data Science Foundations: Defining Primary Data Types Image

Data Science Foundations: Defining Primary Data Types

Data Science is more than a buzzword, it’s a profitable practice that needs to be a part of every company’s retention strategy.

So much data is collected nowadays and if you aren’t analyzing and operationalizing it, you’re missing out on a goldmine. I’ve worked with hundreds of companies and you’d be surprised just how clear the pattern for success can be once you properly use your data. This blog post is the start of a broader series that will walk you through the latests ways to think about foundational Data Science concepts. You’ll get a blend of advanced topics, tactical best practices, and a few pop culture references here and there. To start, I’m taking you on a journey through the data universe to explore primary sources of data.

What is Data Science?

Data Science helps companies distill meaning from their data and operationalize it so they can build better processes. Modern companies use so many technology solutions, data becomes disparate and difficult to compile together to create a hyperdimensional view. At Gainsight, we call this hyperdimensional view a 360-degree view of the customer. Vast data sets like this are aptly called “big data.” It takes an enormous amount of effort to derive insights from them—that’s where Data Science comes in. Data Scientists use statistical tools, algorithms, and machine-learning models to organize and understand big data. Data Science is one of the most powerful tools a company can leverage and directly affects their ability to be predictive and proactive.

Data Universe & Primary Data Source Types

Colors Combining

Do you remember learning about primary colors? There’s red, yellow, and blue, and all the colors that we know of are made from those three primary colors. You can also apply this sort of thinking to data collections. Every business has an ecosystem of connected data objects that can be considered a collection. There are approximately eight primary data sources from which all data collections are made (there are some exceptions, but we’ll save those for another time). Familiarizing yourself with these primary data sources will allow you to mix together your own combinations of data sets. You can’t do it alone though. One of the key factors of a successful Data Science strategy is building interdepartmental relationships, so you can have access to all these types of data sources and undertake cross-functional efforts put your learnings to action.

I’ve divided and defined the following data sources to make it easier for us to build a framework that we’ll use to take over the universe! No, seriously—”universe” is a term we use in Data Science to describe the relationships between series of data sources in a collection. I’ll be using this term here and there. Now without further ado, here are the eight primary data source types:

Entity-Based Data Sources: Dimensions

Dimensions are  a qualitative data source category. Because they’re mostly qualitative in nature and not quantitative, Dimensions do not total a sum. For example, sales region, employee, location, or date are dimensions. In most cases, Dimensions in data science consist of correlated temporary constants used as a descriptor or to define the entities that interact within a data universe. In this case, number of employees would also be a dimension. The aspects that define your products, such as the pricebook or the descriptors used to characterize customers accounts (i.e. industry and size) are all dimensions. These sources are primarily used to give you the answer to categorization and segmentation questions.


Demographic Data includes information like a contact or account’s industry, size, region, product line, etc.. All entities in a universe have demographic dimensions. Most people are familiar with Account Data or Demographic Data as a whole, but you might be surprised to hear that you could be using it incorrectly. Demographic Data is relatively flat meaning—it’s static or slow-changing. However, businesses commonly use this type of data operationally instead of tactically.

While Demographic Data can be used tactically to structure processes, it’s the cognitive analysis prior to process change that Demographic Data influences. I’d argue that the best use of Demographic Data is to segment your customers in combined ways and benchmark their performance with the other data sets. For example, determining which segment benefits most from Professional Services engagements, or which product features have the most tickets at each stage in lifecycle. You can save your Operations teams a lot of time by understanding your demographics reporting requirements upfront and using these insights to power your operations requests.

Master Records

Master Records are any objects used in processes by customers or internal teams. It could be lists of products, events, campaigns, or services. For example, when you sell a product, it gets stored within a contract file, or when marketing sends an email, the recipients are added to campaigns. A customer can buy more than one product in their lifetime, participate in many events, and even be a part of multiple campaigns. Master Records group and organize these data sets as lookup files. If you want to see how much a major event impacted a customer, this would be the data source that could answer that. All the objects that store the dimensions of the event, such as date, location, product, or service are stored in their own master record. 

Time/Relationship-Based Data Sources: Measures/Values

Measures are numerical values that mathematical functions and projections can work on. For example, a user logins column is a measure because you can find out a total or average the data. A Measure/Value in Data Science are, in most cases, bound to the time and relationship-based variables that indicate interactions between entities. For example, an account buys access to three products for 10 users who each in total have access to five features. The five entities in this relationship would be the account, its users, our company, product, and features. The interactions between the five entities in this data universe would be tracked in measures or values. For example, number of users, logins per week, and how many features are used.

These measure-based and value-based data sources represent the relationship between a customer and their Assets/Master Records. It’s important that all of the following data sources have adequate lookups to all of the Master Records and Demographic assets they have relationships with. This makes it easier to report by any category, segment, or internal/external dimension without complex effort required.

Subscription Data

Subscription Data is best thought of as any metrics or values that reflect the outcomes in your Contract or Statement of Work (SOW) between you and the customer. This includes MRR/ARR, users, licenses, credits, or opportunities. Subscription Data will help you set your targets. For example, if a customer buys a subscription for 10 users, this needs to be captured and tracked historically. Each user based on role should have a set of adoption metrics that quantifies the health depth and breadth of the companies overall user activation. This approach ensures that all stakeholders are aligned and your client processes will match the numbers that are directly called out in the purchase order. We call this “purchase order parity.” If they have add-ons in their contract for any custom consumables like credits or service hours, then you should track that at the account level and benchmark success against its delivery, renewal, and upsell growth plan. The sources below are used to answer the question, “How do we make the biggest impact on these subscription measures?”

Support/Case Data

If you’ve ever called a Customer Support team, you’re data in their system! Whenever you contact Customer Support, you’re given a case number after your conversation. Well, this source is where the data on that engagement is stored. Support Data is primarily used to track risk, gaps in products/services, and needs for documentation. Customers usually create tickets when processes, experiences, or features break down.

Even if you’re an on-premise company that doesn’t have a lot of data on the customer, Support and Case Data is one universal data source you can typically leverage. While aligning your company around support alone will only cause you to act reactively, support is a critical function in identifying gaps in customer-facing processes. The data tracked through these interactions can be used to host regular meetings where you conduct cognitive analysis efforts to share with your support team. Make sure they’re tracking things like, severity, priority, escalation, time/days opened, and number of touchpoints. This data will tell you what issues customer are facing and the questions they’re asking the most.

Preparedness Data

Preparedness Data includes information on training, onboarding, implementation, milestones, and professional services. This data can be used much like Support Data to illuminate your Pareto principle insights. In this context, the Pareto principle contends that 80% of the customer experience issues you’ll have will be a result of 20% of the total interactions in your customer lifecycle. Usually, a large portion of this 20% can be found in the projects and milestones that occur in your Preparedness Data or Support Data.

Try to track as much Preparedness Data as you can, especially early on in the customer lifecycle. Then, focus your efforts on tracking the moments with the most friction. The earlier on you catch an issue, the more you’re able to resolve it and prevent it for future customers, thus extending their Lifetime Value (LTV).

Sentiment Data

Sentiment Data is one the most popular data sources as of late. It includes Net Promoter Scores (NPS) and Customer Satisfaction Scores (CSAT). Sentiment Data is a vital source of information, but can be overblown in its operational value. For example, it’s super insightful for companies to know if a customer will advocate for them or if they’re satisfied. But, the hard part is to then understand why and what can be done at scale to promote satisfaction and prevent dissatisfaction. By nature, NPS and CSAT scores can only allow you to be reactive, as they both are captured after a project has complete or sentiment is felt. The information that sentiment data provides can only be applied to fix a negative customer experience or prevent others from having it themselves. Sentiment data is most effective when combined with other data types so you have wider context and visibility into customer experiences and expectations.

Make sure you capture both quantitative and qualitative Sentiment Data. Relying on one or even an aggregate score can make it difficult to figure out important factors in customer sentiment—it becomes a guessing game at best. Your team could spend hours finding out what customers can say in just one sentence. Prevent this by including required open text fields in both your NPS and CSAT surveys, so customers can provide context. Another NPS best practice is to send out surveys frequently at key milestones in the customer journey. Send out CSAT surveys after onboarding, post-launch milestones, support interactions and provide multiple opportunities per year for customers to take NPS surveys.

Engagement Data

You can thank Engagement Data for creating the world we live in today. This data source captures interactions with things like email campaigns, events, and community engagement. The modern way we think of driving data at scale, in regards to conversion rates through stages, comes from the department that owns and populates this data: Marketing/Sales. In how the industrial and warehousing revolutions created the demand for database architecture. Marketing and Sales drove the need to track entities across multiple stages and levels of engagement. Concepts such as marketing sales funnel, activation rates and target population drop-offs are all foundations of cognitive analytics that were laid through the sophistication of marketing and sales tools over the years

Aligning with Marketing is essential for conducting proper analysis of Engagement Data. Just how our eyes only see when light bounces off something, the impact of marketing efforts can only be truly assessed if it bounces off the prospect/customer and is tracked. Marketing efforts should drive Calls to Action (CTAs) whose outcomes can be tracked and monitored in conjunction with other time/relationship-based data sources.

Usage Data

Last but not least, we have the most emerging data set of our time: Usage Data. When you hear buzzwords like “big data” and the “internet of things,” it’s most like in reference to Usage Data. This data source type is where customer behavior, actions, and time is captured. For example, logins, calls, time spent on an activity, clicks, sends, runs, loads, views, and more. Nowadays, phones, apps, and even cars now track this kind of data. The boom of usage data has been one of the largest drivers for the need of Data Science as a discipline. The advent of big data creates the noise of information that we see today. Usage is one of the most critical sources in being able to be predict customer behavior and act on it proactively.

It’s important to note that you must capture Usage Data as accurately and granularly as possible. This is necessary for accurate reporting. For example, it’s typical for companies to track website usage by page level. However, the problem with this approach is that it will only tell you what page loaded and not what the user actually did. How many tabs do you have open right now? Now, how many of them are you actually using? Just because someone opens a page doesn’t mean they’re actually interacting with it. So, to get more accurate, granular data, you need to track beyond page level—you need to track at the action level as well.

Be the Creator of Your Own Data Universe

Now you’ve got the eight primary data source types, of which all data universes consist, how they can be used, and what makes them unique. Take the next step by conducting an internal assessment of the sources you currently have available. What files do you have? What files don’t you have? Do you have a centralized way of organizing them? Take our Maturity Model assessment test to see where your company stands.

I’m always happy to talk to fellow data enthusiasts! If you have any questions or would like to learn more about the Data Science services we provide at Gainsight, I encourage you to reach out at