AI/MLCloud Native ComputingContributory BlogsDevelopersDevOps

Best Practices And Cautionary Advice For A Successful Data Science Practice

0

Data science helps organizations deliver value and gain a competitive advantage. It empowers organizations to leverage knowledge to improve business intelligence and inform decisions that enhance customer experience.

With proper collaboration and communication as the foundation, any organization can leverage the benefits of data science. However, adding another level of complexity to the modernization journey inevitably creates new challenges that must be approached carefully.

This article is the first in a two-part series that aims to unpack everything you need to know to do data science right. Part one offers a basic overview of data science, with cautionary advice on common data science problem areas (yellow flags) and successful best practices. Part two will dive into real world case studies to better understand how data science can be successfully applied. Let’s dive into part one.

Integrating Data Science into an Organization

In order to approach data science confidently, it’s essential to understand what it is. TechTarget defines data science as, “the field of applying advanced analytics techniques and scientific principles to extract valuable information from data for business decision-making, strategic planning and other uses.”

To break it down into a more digestible explanation, think of data science as being comprised of three pillars:

  • Analysis: Analysis allows the organization to collect descriptive information about the past to inform insights about the current state.
  • Statistics: Statistics help us to understand causality with experimentation.
  • Machine learning: Using historical data to predict the future.

Data science helps organizations confidently deliver products and services in a way that is backed by data-based insights. However, before setting off on a data science journey, there are several reasons to proceed with caution.

Yellow Flags in Data Science

Business intelligence, analytics and data science each have a role in driving business value, however, it’s important to recognize when there may be issues or roadblocks. The ability for an organization to identify some of the most common yellow flags can help avoid wasteful starts and stops in the data science journey.

  • Low engagement. This can happen on both the analyst and the end user side of the house. Low engagement is characterized by a lack of communication and a lack of probing questions around a data request. Often, whether it’s the analyst or the end user, certain decisions or assumptions are made and the outcome is not a match for what the end user wanted. Low engagement hinders the organization from gaining insightful data points that are critical to moving the organization forward.
  • Slow delivery. When requesting data, it should not take months to receive some sort of intel to help make a decision or move forward in a project.
  • Manual reconciliation. Although it’s 2022, many organizations still use Excel spreadsheets as the workhorses of the organization where there are different manual data sources connected to create essential insights. There may also be multiple sources of truth embedded into separate spreadsheets or across analytics teams to answer similar questions. Lots of time and energy is spent trying to reconcile what the right answer is for a specific question.
  • Misaligned goals. On the data science side of the house, there can be misalignment on organizational goals. For example, a data scientist might be working on an important question. However, that one question answered doesn’t have a broader organizational impact. It’s not tied to the company’s strategic objectives. So the impact is not there for the data scientist or the organization.
  • Black box of data. The black box starts with providing different variables or inputs to a data scientist to develop a machine learning model. The result is recommendations and different predictions that the business receives. However, stakeholders don’t understand the recommendations because they weren’t brought along during the process. The result: they do not trust the data.
  • Lab versus production. There is a concept where data scientists are great experimenters. They create proofs of concepts that need to go to a production state environment. Some organizations simply don’t have the infrastructure to be able to push them forward to production. And once they’re there, those models might decay, the results may be less accurate over time.

If an organization experiences any of these yellow flags, they should proceed with caution and work to get around them. One way to avoid many of these all together is to follow best practices that will mitigate the yellow flags and ultimately drive better collaboration and communication.

Data Science Best Practices

With a core set of best practices firmly in place, your data science initiatives are sure to see fewer starts and stops, and ultimately drive more value to the business. Here are the top best practices to follow for successful data science:

  • Strategic alignment. Every organization needs a data science champion and it needs to start at the top of the executive leadership team. Further, the priorities related to data science and analytics need to be well understood. Then data teams must align to them in terms of how they deliver solutions throughout the organization. The data analysts need to align and feel inspired and motivated in order to keep ticking away at important data decisions at the organization.
  • Expert domain or domain expert partnership. This best practice means having analysts and data scientists partner with their operational counterparts. Whether this is a clinician on the front line seeing patients or a leader of a business or service line, they must understand what their scope of responsibility is and how that relates to the data that is flowing through the organization. With greater domain expertise, comes more relevant and useful data solutions.
  • Rapid iteration. Move fast and fail faster. Start piloting data before the team is comfortable with it. Get the data in front of key business partners so they can see what the data science analyst or team is building toward. Their input early on is going to lead to better results, better outcomes, and more relevant pragmatic data solutions.
  • More data faster. Take the data engineers and partner them with data scientists on day two, when the team is starting to get to know new data sources. This allows engineers to create insight around the quality of the data so that they can create data pipelines for the broader analytics community – allowing them to access data faster so that they can answer more questions with the new datasets from the data scientists.
  • Higher quality data. This best practice requires all hands on deck. Every person in the data team needs to be invested in setting up different data process improvements, change control, and automated testing to ensure that the overall quality of the data increases as more is used. Data quality feedback loops with business owners ensure they have skin in the game to improve data quality at the source of data entry.

A focus on best practices creates more collaborative teams. When data science practices are properly aligned to organizational goals, business value grows. While best practices will lead an organization down the right path, it’s important to be aware of the yellow flags that can cause inefficient starts and stops. Want to get started on your data science journey? Part two will focus on, “A Pragmatic Approach to Data Science.”


By Lisa Acomb, Operations Leader at Exadel