Adapt to the agile & growing needs of your organization. New data types, new data products, and new apps are easily added to the mesh without any impact on established data services.
What is a data mesh?
A data mesh is a network of distributed data nodes linked together, which follow FAIR principles (findable, accessible, interoperable, and reusable) using smart APIs.
Another way to understand the concept of a data mesh is to think of it like the world wide web:
- You have your data in one or more data nodes, which are like web servers
- You have a data portal, which is like a web browser
- Data nodes have smart APIs, which allow the data portal to access and utilize any data node, like HTTP / DNS
Following the WWW concept, when you have a web server, you can serve domain-specific content. Similarly, Tag.bio’s data node (data product) serves domain-driven analysis functionality.
Why this matters:
This decentralized WWW pattern scales better than the centralized data lake pattern.
- Data nodes are developed, published and maintained independently by domain-centric teams
- Data nodes are able to communicate within the data mesh via smart APIs
- Data sources and functionality available within each data node are tightly versioned
- The same data node can be accessed by multiple teams, data portals, R/Python SDKs, and third-party clients
Similar to how a web browser allows a user to browse for and interact with content served by many disparate web servers, the data portal allows a user to browse for and interact with content served by the data nodes. The content can be private or public with access dependent on the user’s authorization roles.
Why this matters:
- A centralized platform for access to registered data nodes
- Ability to cross-compare disparate data in one platform
- Store a reproducible history of all your analysis activities and reusable UDATs (useful data artifacts)
- The portal allows FAIR (findable, accessible, interoperable, reusable) interaction with both data and analyses
In order for a web browser to communicate with the web servers, it needs a common language, which is the HTTP. In the data mesh case, the data portal can communicate with the data nodes using a common language, which is the smart API.
Why this matters:
- Smart API is a universal communication protocol to access domain-specific data and functionality within data nodes
- It enables each data node to communicate domain-specific language and functionality to end users
- A common way to extract and transfer data from data products to third party softwares (i.e. R, Python, Jupyter notebook, Tableau)
Data must be FAIR
Tag.bio’s data mesh focuses on making the data FAIR.
“I know where to look for any of my organization’s data.”
“I can access the data that I need.”
“I integrate this data with another data.”
“I can use the same data to ask and answer different types of questions.”
Advantages of the data mesh
Attempting to bring all of your data into the same place and the same universal schema is unsustainable at scale. A decentralized data mesh solves that problem, with domain-driven – yet harmonized – data products designed and quickly deployed by smaller, specialized teams.
Each data node (data product) in the mesh can be worked on independently. As each node is containerized, it can be deployed as soon as any changes are ready.
As new data arises, new nodes can be constructed and deployed to the mesh. The same node can be accessed by many portals and teams. This allows your organization to scale your data mesh as you grow.
Accelerate time to value
Get value from day one. As a single node with a single analysis app can be released within hours. This allows domain experts to instantly start asking and answering their own questions.