Understanding and leveraging useful data artifacts (UDATs)

Bringing something revolutionary to data research – full reproducibility and reusability of analysis.

What are UDATs (useful data artifacts)?

UDATs are the distillation of your investigation. A UDAT is a structured data object created when you or an algorithm extract something useful from the data.

Read Our CSO’s In-Depth Explanation

Below are examples of UDATs.

Analysis parameters

  • When you use an analysis app, you usually will select some parameters to run your analysis. 
  • These parameters are automatically saved in your analysis history, automatically making them data artifacts.

Cohorts

  • A cohort is a group of entities which share common data characteristics. For example, all patients within a specific age range can be a cohort. Cohorts can also be defined as “clusters” or “segments” after running algorithms.
  • Whether or not you save your cohorts when you run analyses, your cohorts are automatically saved in your analysis history, automatically making them data artifacts.

Variable groups

  • A variable group is a group of entity attributes, such as factors, measurements, or conditions. Variable groups are also known as signatures or feature sets.
  • Whether or not you save your groups of variables when you run analyses, they are automatically saved in your analysis history, automatically making them data artifacts.

Analysis results

There are many types of analysis results. For example:

  • Summary: an analysis on a single cohort
  • Comparison: an analysis comparing cohorts
  • Similarity: similarities or differences between entities, such as nearest-neighbors
  • Correlation: similarities or differences between variables
  • Descriptive models: sophisticated algorithms
  • Projection models: supervised algorithms
  • Systems models: combining cohorts and variable groups with external knowledge

Optimize the reuse of UDATs

The most important aspects of using the FAIR (findable, accessible, interoperable, reusable) principles are reproducibility and reusability.

Something that can be reproduced can be reused. Reusability reduces redundancy and makes your research more efficient. Tag.bio promotes data, analysis, and UDAT reuse.

Findable

“I know where to look for my UDATs.”

Accessible

“I can access any of my UDATs.”

Interoperable

“I can use my UDATs as signals that apply across datasets.”

Reusable

“I can use my UDATs as a starting point for a new analysis.”

What can you do with UDATs?

Share them with your team members to reproduce

Use it as a starting point for further investigations

Reproduce it for quality assurance and auditing

Save it for future references

Publish your findings

And anything else that you can think of!

Let’s get the conversation started

From a 30-minute demo to an inquiry about our 4-week pilot project, we are here to answer all of your questions!