Tag.bio data products are the fundamental building block of the data mesh architecture and implement a domain-driven design to ensure that each data product is usable by anyone.
Data products represent a harmonized, composable application layer on top of disparate data sources. Along with employing a universal “Smart” API, they also present a simple, clean, standardized data model for apps and data scientists who would do queries and extract data frames. It has the following features:
Ownership of functionality is shareable with domain teams and business units.
All data snapshots, mapping, statistics, ML models, visualizations and reports are versioned for analysis provenance and perfect reproducibility.
Each data product presents its data model to algorithms, visualizations, BI tools, and data scientists as a single, sliceable data frame with common vocabularies and ontology concepts.
Disparate data products can map into an archetypal data model, which enables instant transfer of algorithms and data consumption methodologies.
Each data product utilizes its own git repository (or archetype repository) and is deployed as a Docker container into a robust, fully-managed Kubernetes compute infrastructure.
Data products can be designed, managed, and tested independently, minimizing overhead and maintenance costs as data and usage evolves.
Connect to multiple data technologies – CSV, TSV, SQL, Parquet, data warehouses, and data lakes.
For CDISC, OMOP, EHR, RWE, DNA-Seq, RNA-Seq (incl. single-cell).
Using JSON or YAML to deploy a harmonized data product for novel data sources.
Algorithms, visualizations, reports, ML, and LLM apps.
All disparate, composable data products speak the same API language; specific functionality is enabled using callable methods through the API.
Anyone can utilize the Tag.bio web application to create and share cohorts, statistics, visualizations, reports, and machine learning models – with versioning and provenance.
Data scientists can access any data product (with authorization) to easily slice and extract data frames for ad-hoc analysis & visualization and building of machine learning and generative AI models.
Data analysts, BI tools and other software clients can connect to a data product’s restful API or SQL API to slice and extract data frames for downstream use.
Bring the compute to the data (pro-code) – implement pluggable algorithms and visualizations with R scripts, Python scripts, RMarkdown templates and Python notebooks.
Enables fast algorithms, visualization, and ML processing of data.
PNG, PDF, HTML, Plotly, RMarkdown, Python notebooks.
Data products can serve API methods and apps for prediction, classification and generative AI from structured queries or exploratory prompts.
Tag.bio's Data Mesh presents a network of interconnected data products, adhering to FAIR principles via smart APIs. This framework offers several benefits to customers. It promotes data discoverability and interoperability, ensuring that both data and analyses are FAIR-compliant, expediting the process of connecting real-world data sources.
Additionally, the platform is designed to retain institutional knowledge by automatically saving and cataloging analysis activities, creating a valuable resource for domain experts.
Tag.bio provides a robust suite of features to enhance data collaboration and analysis. It facilitates scalable data mesh deployment with built-in CI/CD pipelines, allowing for the creation of modular, containerized data products that can be geographically distributed with Federated Computational Governance. Data products can be added, modified, or removed independently, ensuring flexibility and adaptability. The platform prioritizes security, running within customers secure network environments, and implementing Single Sign On (SSO) to simplify user access management.
Tag.bio empowers users to create, share, and analyze data artifacts, supporting data scientists, ML engineers and BI professionals with a comprehensive suite of tools, no-code analyses, with seamless integration with popular cloud services like AWS, Azure and Google Cloud.
Tag.bio offers a versatile range of APIs and services that enhance data product accessibility and functionality. These APIs support various essential functions, including the Search and Discovery System, R and Python SDKs, Dashboard, Data Products Insights, Monitoring, and more.
Tag.bio's Smart API streamlines access to unique data product functionalities, allowing seamless integration with other data products and facilitating guided analysis apps for domain experts. Designed for universal communication, the API combines diverse elements such as datasets, algorithms, R/Python scripts, and analysis workflows into API calls. By fostering interoperability and adhering to the FAIR (findable, accessible, interoperable, reusable) principles, Tag.bio's API and services empower data scientists, engineers, and domain experts to harness data product capabilities, making the data truly useful.
Tag.bio Enterprise AI is an all-encompassing solution that simplifies the AI lifecycle by offering a unified platform for building, deploying, and managing Predictive & Generative AI projects.
The Tag.bio platform seamlessly integrates data sets, smart APIs, and advanced statistical and machine learning algorithms into data products enabling users to uncover valuable insights through user-friendly apps and/or generative AI prompts.
The Tag.bio platform includes:
• Containerized Private Model Registries
• Expansive Developer Studio
• AL/ML Frameworks including SciKit Learn, TensorFlow, PyTorch, MxNet, Microsoft CTNK, Keras, LangChain
• Integration with Cloud ML Platforms including Amazon SageMaker, Microsoft Azure AI, Google Vertex AI
• Coordination with agent orchestration systems, such as CrewAI, Azure AutoGen Studio, LangGraph
We help customers navigate the complexity of retrieval-augmented generation (RAG) and fine tuning within our platform. RAG enhances the quality of responses generated from LLMs by incorporating internal domain-specific knowledge from data products. This augmentation enriches the LLM's internal information representation, resulting in more comprehensive and contextually relevant responses.
Tag.bio enables users to:
• Perform prompt-tuning to derive insights from data products
• Use RAGs with interconnected data products
• Fine-tune models using harmonized internal domain knowledge
Tag.bio's platform provides a robust support for LLM agents (LangChain, LlamaIndex, Azure AutoGen Studio, Amazon BedRock and Google Vertex AI). In addition, it seamlessly integrates with a variety of vector databases, including Pinecone, Chroma DB, Meta Faiss and Pgvector, to ensure efficient data handling.
The platform also leverages a spectrum of foundation models such as OpenAI GPT-4o, Anthropic Claude-3 Haiku, Meta Llama 3, TTI Falcon 3, Mistral Large and Google Gemini, thereby providing a comprehensive toolkit for users to harness the power of diverse AI technologies.