Skip to contents

This vignette introduces the {laminr} workflow.

To learn more about LaminDB, see docs.lamin.ai.

Setup

Install {laminr} from CRAN:

install.packages("laminr", dependencies = TRUE)

Install the underlying Python packages LaminDB and Bionty:

laminr::install_lamindb(extra_packages = c("bionty"))

Set the default LaminDB instance:

laminr::lamin_connect("<owner>/<name>")

This instance acts as the default instance for everything that follows. Any data and tracking information will be added to it.

If you don’t have access to an instance, create a local test instance.

lamin_init(storage = "./laminr-intro", modules = c("bionty"))

Import

To start working with {laminr}, import the lamindb module:

ln <- import_module("lamindb")
#> β†’ connected lamindb: testuser1/laminr-intro

This is equivalent to import lamindb as ln in Python.

Walkthrough

This section of the vignette reproduces the walkthrough from the LaminDB Introduction guide. The equivalent {laminr} code is included here, for the related text see the associated links.

See https://docs.lamin.ai/guide#walkthrough.

Transforms

See https://docs.lamin.ai/guide#transforms.

ln <- import_module("lamindb")
ln$track()
#> β†’ created Transform('fFqSLCzfPR3y0000'), started new Run('gFnVLAHL...') at 2025-04-07 11:34:13 UTC

ln$Transform$df()
#>                uid              key      description     type source_code hash
#> 1 fFqSLCzfPR3y0000 introduction.Rmd introduction.Rmd notebook        <NA> <NA>
#>   reference reference_type space_id _template_id version is_latest
#> 1      <NA>           <NA>        1         <NA>    <NA>      TRUE
#>            created_at created_by_id _aux _branch_code
#> 1 2025-04-07 11:34:13             1 <NA>            1

ln$Run$df()
#>                    uid name          started_at finished_at reference
#> 1 gFnVLAHLmNQ3h9N7LNwI <NA> 2025-04-07 11:34:13        <NA>      <NA>
#>   reference_type _is_consecutive _status_code space_id transform_id report_id
#> 1           <NA>            <NA>            0        1            1      <NA>
#>   _logfile_id environment_id initiated_by_run_id          created_at
#> 1        <NA>           <NA>                <NA> 2025-04-07 11:34:13
#>   created_by_id _aux _branch_code
#> 1             1 <NA>            1

Artifacts

Artifacts are objects that bundle data and associated metadata. An artifact can be any file or folder but is typically a dataset.

See https://docs.lamin.ai/guide#artifacts.

df <- ln$core$datasets$small_dataset1(otype = "DataFrame", with_typo = TRUE)
df
#>         ENSG00000153563 ENSG00000010610 ENSG00000170458 perturbation
#> sample1               1               3               5         DMSO
#> sample2               2               4               6         IFNJ
#> sample3               3               5               7         DMSO
#>         sample_note             cell_type_by_expert cell_type_by_model
#> sample1      was ok                          B cell             B cell
#> sample2  looks naah CD8-positive, alpha-beta T cell             T cell
#> sample3  pretty! 🀩 CD8-positive, alpha-beta T cell             T cell
#>           assay_oid concentration treatment_time_h donor
#> sample1 EFO:0008913          0.1%               24 D0001
#> sample2 EFO:0008913        200 nM               24 D0002
#> sample3 EFO:0008913          0.1%                6  <NA>

artifact <- ln$Artifact$from_df(df, key = "my_datasets/rnaseq1.parquet")$save()
artifact$describe()
#> Artifact .parquet/DataFrame
#> └── General
#>     β”œβ”€β”€ .uid = 'Q3dgy9yok1Uuclsb0000'
#>     β”œβ”€β”€ .key = 'my_datasets/rnaseq1.parquet'
#>     β”œβ”€β”€ .size = 8530
#>     β”œβ”€β”€ .hash = 'hDTNFwEPy7pY1EKnOHog2A'
#>     β”œβ”€β”€ .n_observations = 3
#>     β”œβ”€β”€ .path = 
#>     β”‚   /home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb
#>     β”‚   /Q3dgy9yok1Uuclsb0000.parquet
#>     β”œβ”€β”€ .created_by = testuser1 (Test User1)
#>     β”œβ”€β”€ .created_at = 2025-04-07 11:34:14
#>     └── .transform = 'introduction.Rmd'

artifact$cache()
#> [1] "/home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb/Q3dgy9yok1Uuclsb0000.parquet"

dataset <- artifact$open()
as.data.frame(dataset)
#> # A tibble: 3 Γ— 12
#>   ENSG00000153563 ENSG00000010610 ENSG00000170458 perturbation sample_note
#>             <dbl>           <dbl>           <dbl> <fct>        <chr>      
#> 1               1               3               5 DMSO         was ok     
#> 2               2               4               6 IFNJ         looks naah 
#> 3               3               5               7 DMSO         pretty! 🀩 
#> # β„Ή 7 more variables: cell_type_by_expert <fct>, cell_type_by_model <fct>,
#> #   assay_oid <fct>, concentration <chr>, treatment_time_h <dbl>, donor <chr>,
#> #   `__index_level_0__` <chr>

artifact$load()
#>         ENSG00000153563 ENSG00000010610 ENSG00000170458 perturbation
#> sample1               1               3               5         DMSO
#> sample2               2               4               6         IFNJ
#> sample3               3               5               7         DMSO
#>         sample_note             cell_type_by_expert cell_type_by_model
#> sample1      was ok                          B cell             B cell
#> sample2  looks naah CD8-positive, alpha-beta T cell             T cell
#> sample3  pretty! 🀩 CD8-positive, alpha-beta T cell             T cell
#>           assay_oid concentration treatment_time_h donor
#> sample1 EFO:0008913          0.1%               24 D0001
#> sample2 EFO:0008913        200 nM               24 D0002
#> sample3 EFO:0008913          0.1%                6  <NA>

artifact$view_lineage()
#> βœ– `view_lineage()` is not yet implemented. Please view the lineage in the web interface.

df_typo <- df
levels(df$perturbation) <- c("DMSO", "IFNG")
df["sample2", "perturbation"] <- "IFNG"
artifact <- ln$Artifact$from_df(df, key = "my_datasets/rnaseq1.parquet")$save()
#> β†’ creating new artifact version for key='my_datasets/rnaseq1.parquet' (storage: '/home/runner/work/laminr/laminr/vignettes/articles/laminr-intro')
artifact$versions$df()
#>                    uid                         key description   suffix    kind
#> 1 Q3dgy9yok1Uuclsb0000 my_datasets/rnaseq1.parquet        <NA> .parquet dataset
#> 2 Q3dgy9yok1Uuclsb0001 my_datasets/rnaseq1.parquet        <NA> .parquet dataset
#>       otype size                   hash n_files n_observations _hash_type
#> 1 DataFrame 8530 hDTNFwEPy7pY1EKnOHog2A    <NA>              3        md5
#> 2 DataFrame 8530 RVElCjRCfyOAxbLNDEAMOg    <NA>              3        md5
#>   _key_is_virtual _overwrite_versions space_id storage_id schema_id version
#> 1            TRUE               FALSE        1          1      <NA>    <NA>
#> 2            TRUE               FALSE        1          1      <NA>    <NA>
#>   is_latest run_id          created_at created_by_id _aux _branch_code
#> 1     FALSE      1 2025-04-07 11:34:14             1 <NA>            1
#> 2      TRUE      1 2025-04-07 11:34:15             1 <NA>            1

Labels

See https://docs.lamin.ai/guide#labels.

bt <- import_module("bionty")

experiment_type <- ln$ULabel(name = "Experiment", is_type = TRUE)$save()
candidate_marker_experiment <- ln$ULabel(
  name = "Candidate marker experiment", type = experiment_type
)$save()

artifact$ulabels$add(candidate_marker_experiment)

cell_type <- bt$CellType$from_source(name = "effector T cell")$save()
artifact$cell_types$add(cell_type)

artifact$describe()
#> Artifact .parquet/DataFrame
#> β”œβ”€β”€ General
#> β”‚   β”œβ”€β”€ .uid = 'Q3dgy9yok1Uuclsb0001'
#> β”‚   β”œβ”€β”€ .key = 'my_datasets/rnaseq1.parquet'
#> β”‚   β”œβ”€β”€ .size = 8530
#> β”‚   β”œβ”€β”€ .hash = 'RVElCjRCfyOAxbLNDEAMOg'
#> β”‚   β”œβ”€β”€ .n_observations = 3
#> β”‚   β”œβ”€β”€ .path = 
#> β”‚   β”‚   /home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb
#> β”‚   β”‚   /Q3dgy9yok1Uuclsb0001.parquet
#> β”‚   β”œβ”€β”€ .created_by = testuser1 (Test User1)
#> β”‚   β”œβ”€β”€ .created_at = 2025-04-07 11:34:15
#> β”‚   └── .transform = 'introduction.Rmd'
#> └── Labels
#>     └── .cell_types         bionty.CellType    effector T cell                  
#>         .ulabels            ULabel             Candidate marker experiment

Registries

See https://docs.lamin.ai/guide#registries.

ln$ULabel$df()
#>        uid                        name is_type description reference
#> 2 Su0Niv0T Candidate marker experiment   FALSE        <NA>      <NA>
#> 1 x8GIqLSm                  Experiment    TRUE        <NA>      <NA>
#>   reference_type space_id type_id run_id          created_at created_by_id _aux
#> 2           <NA>        1       1      1 2025-04-07 11:34:15             1 <NA>
#> 1           <NA>        1     NaN      1 2025-04-07 11:34:15             1 <NA>
#>   _branch_code
#> 2            1
#> 1            1

ln$Artifact
#> Artifact
#>   Simple fields
#>     .uid: CharField
#>     .key: CharField
#>     .description: CharField
#>     .suffix: CharField
#>     .kind: CharField
#>     .otype: CharField
#>     .size: BigIntegerField
#>     .hash: CharField
#>     .n_files: BigIntegerField
#>     .n_observations: BigIntegerField
#>     .version: CharField
#>     .is_latest: BooleanField
#>     .created_at: DateTimeField
#>     .updated_at: DateTimeField
#>   Relational fields
#>     .space: Space
#>     .storage: Storage
#>     .run: Run
#>     .schema: Schema
#>     .created_by: User
#>     .ulabels: ULabel
#>     .input_of_runs: Run
#>     .feature_sets: Schema
#>     .collections: Collection
#>     .references: Reference
#>     .projects: Project
#>   Bionty fields
#>     .organisms: bionty.Organism
#>     .genes: bionty.Gene
#>     .proteins: bionty.Protein
#>     .cell_markers: bionty.CellMarker
#>     .tissues: bionty.Tissue
#>     .cell_types: bionty.CellType
#>     .diseases: bionty.Disease
#>     .cell_lines: bionty.CellLine
#>     .phenotypes: bionty.Phenotype
#>     .pathways: bionty.Pathway
#>     .experimental_factors: bionty.ExperimentalFactor
#>     .developmental_stages: bionty.DevelopmentalStage
#>     .ethnicities: bionty.Ethnicity
#>  signature: (*args, **kwargs)

See https://docs.lamin.ai/guide#query-search.

transform <- ln$Transform$get(key = "introduction.Rmd")

ln$Artifact$filter(key__startswith = "my_datasets/")$df()
#>                    uid                         key description   suffix    kind
#> 1 Q3dgy9yok1Uuclsb0000 my_datasets/rnaseq1.parquet        <NA> .parquet dataset
#> 2 Q3dgy9yok1Uuclsb0001 my_datasets/rnaseq1.parquet        <NA> .parquet dataset
#>       otype size                   hash n_files n_observations _hash_type
#> 1 DataFrame 8530 hDTNFwEPy7pY1EKnOHog2A    <NA>              3        md5
#> 2 DataFrame 8530 RVElCjRCfyOAxbLNDEAMOg    <NA>              3        md5
#>   _key_is_virtual _overwrite_versions space_id storage_id schema_id version
#> 1            TRUE               FALSE        1          1      <NA>    <NA>
#> 2            TRUE               FALSE        1          1      <NA>    <NA>
#>   is_latest run_id          created_at created_by_id _aux _branch_code
#> 1     FALSE      1 2025-04-07 11:34:14             1 <NA>            1
#> 2      TRUE      1 2025-04-07 11:34:15             1 <NA>            1

artifacts <- ln$Artifact$filter(transform = transform)$all()

artifacts <- ln$Artifact$filter(
  transform__description__icontains = "intro", ulabels = candidate_marker_experiment
)$all()

ln$Transform$search("intro")$df()
#>                uid              key      description     type source_code hash
#> 1 fFqSLCzfPR3y0000 introduction.Rmd introduction.Rmd notebook        <NA> <NA>
#>   reference reference_type space_id _template_id version is_latest
#> 1      <NA>           <NA>        1         <NA>    <NA>      TRUE
#>            created_at created_by_id _aux _branch_code
#> 1 2025-04-07 11:34:13             1 <NA>            1
ulabels <- ln$ULabel$lookup()
cell_types <- bt$CellType$lookup()

Features

See https://docs.lamin.ai/guide#features.

ln$Feature(name = "temperature", dtype = "float")$save()
#> Feature(uid='9JLyu3L1y7as', name='temperature', dtype='float', array_rank=0, array_size=0, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-07 11:34:17 UTC)

ln$Feature(name = "experiment", dtype = ln$ULabel)$save()
#> Feature(uid='2VSaKqnaKQPq', name='experiment', dtype='cat[ULabel]', array_rank=0, array_size=0, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-07 11:34:17 UTC)

artifact$features$add_values(
  list("temperature" = 21.6, "experiment" = "Candidate marker experiment")
)

artifact$describe()
#> Artifact .parquet/DataFrame
#> β”œβ”€β”€ General
#> β”‚   β”œβ”€β”€ .uid = 'Q3dgy9yok1Uuclsb0001'
#> β”‚   β”œβ”€β”€ .key = 'my_datasets/rnaseq1.parquet'
#> β”‚   β”œβ”€β”€ .size = 8530
#> β”‚   β”œβ”€β”€ .hash = 'RVElCjRCfyOAxbLNDEAMOg'
#> β”‚   β”œβ”€β”€ .n_observations = 3
#> β”‚   β”œβ”€β”€ .path = 
#> β”‚   β”‚   /home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb
#> β”‚   β”‚   /Q3dgy9yok1Uuclsb0001.parquet
#> β”‚   β”œβ”€β”€ .created_by = testuser1 (Test User1)
#> β”‚   β”œβ”€β”€ .created_at = 2025-04-07 11:34:15
#> β”‚   └── .transform = 'introduction.Rmd'
#> β”œβ”€β”€ Linked features
#> β”‚   └── experiment          cat[ULabel]        Candidate marker experiment      
#> β”‚       temperature         float              21.6                             
#> └── Labels
#>     └── .cell_types         bionty.CellType    effector T cell                  
#>         .ulabels            ULabel             Candidate marker experiment

ln$Artifact$features$filter(experiment__contains = "marker experiment")$df()
#>                    uid                         key description   suffix    kind
#> 2 Q3dgy9yok1Uuclsb0001 my_datasets/rnaseq1.parquet        <NA> .parquet dataset
#>       otype size                   hash n_files n_observations _hash_type
#> 2 DataFrame 8530 RVElCjRCfyOAxbLNDEAMOg    <NA>              3        md5
#>   _key_is_virtual _overwrite_versions space_id storage_id schema_id version
#> 2            TRUE               FALSE        1          1      <NA>    <NA>
#>   is_latest run_id          created_at created_by_id _aux _branch_code
#> 2      TRUE      1 2025-04-07 11:34:15             1 <NA>            1

Key use cases

This section of reproduces the key use cases from the LaminDB Introduction guide.

See https://docs.lamin.ai/guide#key-use-cases.

Understand data lineage

See https://docs.lamin.ai/guide#understand-data-lineage.

artifact$view_lineage()
#> βœ– `view_lineage()` is not yet implemented. Please view the lineage in the web interface.
transform$view_lineage()
#> βœ– `view_lineage()` is not yet implemented. Please view the lineage in the web interface.
# Example only, not run
ln <- import_module("lamindb")
ln$track()
ln$finish()

# lamin load https://lamin.ai/laminlabs/lamindata/transform/13VINnFk89PE0004

Curate datasets

See https://docs.lamin.ai/introduction#curate-datasets.

perturbation_type <- ln$ULabel(name = "Perturbation", is_type = TRUE)$save()
ln$ULabel(name = "DMSO", type = perturbation_type)$save()
#> ULabel(uid='6HEbX640', name='DMSO', is_type=False, space_id=1, created_by_id=1, run_id=1, type_id=3, created_at=2025-04-07 11:34:17 UTC)
ln$ULabel(name = "IFNG", type = perturbation_type)$save()
#> ULabel(uid='al6kvAtl', name='IFNG', is_type=False, space_id=1, created_by_id=1, run_id=1, type_id=3, created_at=2025-04-07 11:34:17 UTC)

# Load Python built ins to get access to dtypes
py_builtins <- reticulate::import_builtins()

schema <- ln$Schema(
  name = "My DataFrame schema",
  features = list(
    # NOTE: These have dtype=int in the original guide
    ln$Feature(name = "ENSG00000153563", dtype = py_builtins$float)$save(),
    ln$Feature(name = "ENSG00000010610", dtype = py_builtins$float)$save(),
    ln$Feature(name = "ENSG00000170458", dtype = py_builtins$float)$save(),
    ln$Feature(name = "perturbation", dtype = ln$ULabel)$save()
  )
)$save()

curator <- ln$curators$DataFrameCurator(df, schema)
artifact <- curator$save_artifact(key = "my_curated_dataset.parquet")
#> βœ“ "perturbation" is validated against ULabel.name
#> β†’ returning existing artifact with same hash: Artifact(uid='Q3dgy9yok1Uuclsb0001', is_latest=True, key='my_datasets/rnaseq1.parquet', suffix='.parquet', kind='dataset', otype='DataFrame', size=8530, hash='RVElCjRCfyOAxbLNDEAMOg', n_observations=3, space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-04-07 11:34:15 UTC); to track this artifact as an input, use: ln.Artifact.get()
#> ! key my_datasets/rnaseq1.parquet on existing artifact differs from passed key my_curated_dataset.parquet
#> βœ“ 4 unique terms (36.40%) are validated for name
#> ! 7 unique terms (63.60%) are not validated for name: 'sample_note', 'cell_type_by_expert', 'cell_type_by_model', 'assay_oid', 'concentration', 'treatment_time_h', 'donor'
#> βœ“ loaded 4 Feature records matching name: 'ENSG00000153563', 'ENSG00000010610', 'ENSG00000170458', 'perturbation'
#> ! did not create Feature records for 7 non-validated names: 'assay_oid', 'cell_type_by_expert', 'cell_type_by_model', 'concentration', 'donor', 'sample_note', 'treatment_time_h'
#> β†’ returning existing schema with same hash: Schema(uid='anyCXdQt5yZfzYOjzdHH', name='My DataFrame schema', n=4, itype='Feature', is_type=False, hash='Wj32No5I4aB6ETal2smSxw', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-07 11:34:18 UTC)
#> ! updated otype from None to DataFrame
artifact$describe()
#> Artifact .parquet/DataFrame
#> β”œβ”€β”€ General
#> β”‚   β”œβ”€β”€ .uid = 'Q3dgy9yok1Uuclsb0001'
#> β”‚   β”œβ”€β”€ .key = 'my_datasets/rnaseq1.parquet'
#> β”‚   β”œβ”€β”€ .size = 8530
#> β”‚   β”œβ”€β”€ .hash = 'RVElCjRCfyOAxbLNDEAMOg'
#> β”‚   β”œβ”€β”€ .n_observations = 3
#> β”‚   β”œβ”€β”€ .path = 
#> β”‚   β”‚   /home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb
#> β”‚   β”‚   /Q3dgy9yok1Uuclsb0001.parquet
#> β”‚   β”œβ”€β”€ .created_by = testuser1 (Test User1)
#> β”‚   β”œβ”€β”€ .created_at = 2025-04-07 11:34:15
#> β”‚   └── .transform = 'introduction.Rmd'
#> β”œβ”€β”€ Dataset features/.feature_sets
#> β”‚   └── columns β€’ 4         [Feature]                                           
#> β”‚       perturbation        cat[ULabel]        DMSO, IFNG                       
#> β”‚       ENSG00000153563     float                                               
#> β”‚       ENSG00000010610     float                                               
#> β”‚       ENSG00000170458     float                                               
#> β”œβ”€β”€ Linked features
#> β”‚   └── experiment          cat[ULabel]        Candidate marker experiment      
#> β”‚       temperature         float              21.6                             
#> └── Labels
#>     └── .cell_types         bionty.CellType    effector T cell                  
#>         .ulabels            ULabel             Candidate marker experiment, DMS…
ln$Artifact$get(ulabels__name = "IFNG")
#> Artifact(uid='Q3dgy9yok1Uuclsb0001', is_latest=True, key='my_datasets/rnaseq1.parquet', suffix='.parquet', kind='dataset', otype='DataFrame', size=8530, hash='RVElCjRCfyOAxbLNDEAMOg', n_observations=3, space_id=1, storage_id=1, run_id=1, schema_id=1, created_by_id=1, created_at=2025-04-07 11:34:15 UTC)

curator <- ln$curators$DataFrameCurator(df_typo, schema)
tryCatch(
  curator$validate(),
  error = function(err) {
    cat(conditionMessage(err))
  }
)
#> β€’ mapping "perturbation" on ULabel.name
#> !   1 term is not validated: 'IFNJ'
#>     β†’ fix typos, remove non-existent values, or save terms via .add_new_from("perturbation")
#> lamindb.errors.ValidationError: 1 term is not validated: 'IFNJ'
#>     β†’ fix typos, remove non-existent values, or save terms via .add_new_from("perturbation")
#> Run `reticulate::py_last_error()` for details.

Manage biological registries

See https://docs.lamin.ai/introduction#manage-biological-registries.

cell_types <- bt$CellType$public()
cell_types
#> PublicOntology
#> Entity: CellType
#> Organism: all
#> Source: cl, 2024-08-16
#> #terms: 2959
cell_types$search("gamma-delta T cell") |> head(2)
#>                                  name
#> CL:0000798         gamma-delta T cell
#> CL:4033072 cycling gamma-delta T cell
#>                                                                definition
#> CL:0000798 A T Cell That Expresses A Gamma-Delta T Cell Receptor Complex.
#> CL:4033072                       A(N) Gamma-Delta T Cell That Is Cycling.
#>                                                                                          synonyms
#> CL:0000798 gamma-delta T-cell|gamma-delta T lymphocyte|gammadelta T cell|gamma-delta T-lymphocyte
#> CL:4033072                                                       proliferating gamma-delta T cell
#>                           parents
#> CL:0000798             CL:0000084
#> CL:4033072 CL:4033069, CL:0000798

var_schema <- ln$Schema(
  name = "my_var_schema",
  itype = bt$Gene$ensembl_gene_id,
  dtype = py_builtins$float
)$save()
obs_schema <- ln$Schema(
  name = "my_obs_schema",
  features = list(
    ln$Feature(name = "perturbation", dtype = ln$ULabel)$save()
  )
)$save()
#> β†’ returning existing Feature record with same name: 'perturbation'
anndata_schema <- ln$Schema(
  name = "my_anndata_schema",
  otype = "AnnData",
  components = list("obs" = obs_schema, "var" = var_schema)
)$save()

library(anndata)
adata <- AnnData(
  df[c("ENSG00000153563", "ENSG00000010610", "ENSG00000170458")],
  obs = df[, "perturbation", drop = FALSE]
)
curator <- ln$curators$AnnDataCurator(adata, anndata_schema)
#> βœ“ created 1 Organism record from Bionty matching name: 'human'
#> β€’ saving validated records of 'columns'
#> βœ“ added 3 records from public with Gene.ensembl_gene_id for "columns": 'ENSG00000010610', 'ENSG00000153563', 'ENSG00000170458'
artifact <- curator$save_artifact(description = "my RNA-seq")
#> βœ“ "perturbation" is validated against ULabel.name
#> β€’ path content will be copied to default storage upon `save()` with key `None` ('.lamindb/3HXDBdg8MxRsKTlg0000.h5ad')
#> βœ“ storing artifact '3HXDBdg8MxRsKTlg0000' at '/home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb/3HXDBdg8MxRsKTlg0000.h5ad'
#> βœ“ 3 unique terms (100.00%) are validated for ensembl_gene_id
#> βœ“ 1 unique term (100.00%) is validated for name
#> β†’ returning existing schema with same hash: Schema(uid='ev9BnzwcaG8uoUdz5uPz', name='my_obs_schema', n=1, itype='Feature', is_type=False, hash='ELzKeWx4rgC5FKaVtvH-9Q', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-07 11:34:19 UTC)
#> ! updated otype from None to DataFrame
#> βœ“ saved 1 feature set for slot: 'var'
artifact$describe()
#> Artifact .h5ad/AnnData
#> β”œβ”€β”€ General
#> β”‚   β”œβ”€β”€ .uid = '3HXDBdg8MxRsKTlg0000'
#> β”‚   β”œβ”€β”€ .size = 19240
#> β”‚   β”œβ”€β”€ .hash = 'gO44MDqttaaKNyBLVM-zzA'
#> β”‚   β”œβ”€β”€ .n_observations = 3
#> β”‚   β”œβ”€β”€ .path = 
#> β”‚   β”‚   /home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb
#> β”‚   β”‚   /3HXDBdg8MxRsKTlg0000.h5ad
#> β”‚   β”œβ”€β”€ .created_by = testuser1 (Test User1)
#> β”‚   β”œβ”€β”€ .created_at = 2025-04-07 11:34:21
#> β”‚   └── .transform = 'introduction.Rmd'
#> β”œβ”€β”€ Dataset features/.feature_sets
#> β”‚   β”œβ”€β”€ var β€’ 3             [bionty.Gene]                                       
#> β”‚   β”‚   CD4                 float                                               
#> β”‚   β”‚   CD8A                float                                               
#> β”‚   β”‚   CD14                float                                               
#> β”‚   └── obs β€’ 1             [Feature]                                           
#> β”‚       perturbation        cat[ULabel]        DMSO, IFNG                       
#> └── Labels
#>     └── .ulabels            ULabel             DMSO, IFNG

genes <- bt$Gene$filter(organism__name = "human")$lookup()
feature_sets <- ln$FeatureSet$filter(genes = genes$cd8a)$all()
ln$Artifact$filter(feature_sets__in = feature_sets)$df()
#>                    uid  key description suffix    kind   otype  size
#> 3 3HXDBdg8MxRsKTlg0000 <NA>  my RNA-seq  .h5ad dataset AnnData 19240
#>                     hash n_files n_observations _hash_type _key_is_virtual
#> 3 gO44MDqttaaKNyBLVM-zzA    <NA>              3        md5            TRUE
#>   _overwrite_versions space_id storage_id schema_id version is_latest run_id
#> 3               FALSE        1          1         4    <NA>      TRUE      1
#>            created_at created_by_id _aux _branch_code
#> 3 2025-04-07 11:34:21             1 <NA>            1

neuron <- bt$CellType$from_source(name = "neuron")$save()
#> βœ“ created 1 CellType record from Bionty matching name: 'neuron'
#> βœ“ created 3 CellType records from Bionty matching ontology_id: 'CL:0002319', 'CL:0000404', 'CL:0000393'
new_cell_state <- bt$CellType(
  name = "my neuron cell state", description = "explains X"
)$save()
new_cell_state$parents$add(neuron)
new_cell_state$view_parents(distance = 2)

Scale learning

See https://docs.lamin.ai/introduction#scale-learning.

df2 <- ln$core$datasets$small_dataset2(otype = "DataFrame")
adata <- AnnData(
  df2[c("ENSG00000153563", "ENSG00000010610", "ENSG00000004468")],
  obs = df2[, "perturbation", drop = FALSE]
)
curator <- ln$curators$AnnDataCurator(adata, anndata_schema)
#> β€’ saving validated records of 'columns'
#> βœ“ added 1 record from public with Gene.ensembl_gene_id for "columns": 'ENSG00000004468'
artifact2 <- curator$save_artifact(key = "my_datasets/my_rnaseq2.h5ad")
#> βœ“ "perturbation" is validated against ULabel.name
#> β€’ path content will be copied to default storage upon `save()` with key 'my_datasets/my_rnaseq2.h5ad'
#> βœ“ storing artifact 'Dhb8smPQGadISDIP0000' at '/home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb/Dhb8smPQGadISDIP0000.h5ad'
#> βœ“ 3 unique terms (100.00%) are validated for ensembl_gene_id
#> βœ“ 1 unique term (100.00%) is validated for name
#> β†’ returning existing schema with same hash: Schema(uid='ev9BnzwcaG8uoUdz5uPz', name='my_obs_schema', n=1, itype='Feature', is_type=False, hash='ELzKeWx4rgC5FKaVtvH-9Q', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-07 11:34:19 UTC)
#> ! updated otype from None to DataFrame
#> βœ“ saved 1 feature set for slot: 'var'

collection <- ln$Collection(
  list(artifact, artifact2),
  key = "my-RNA-seq-collection"
)$save()
collection$describe()
#> Collection 
#> └── General
#>     β”œβ”€β”€ .uid = 'P1g9vekkwk22Kk4V0000'
#>     β”œβ”€β”€ .key = 'my-RNA-seq-collection'
#>     β”œβ”€β”€ .hash = 'DbgO9hDdS-KOydwDZDLp3g'
#>     β”œβ”€β”€ .created_by = testuser1 (Test User1)
#>     β”œβ”€β”€ .created_at = 2025-04-07 11:34:25
#>     └── .transform = 'introduction.Rmd'
collection$view_lineage()
#> βœ– `view_lineage()` is not yet implemented. Please view the lineage in the web interface.

collection$load()
#> AnnData object with n_obs Γ— n_vars = 6 Γ— 4
#>     obs: 'perturbation', 'artifact_uid'
collection$artifacts$all()
#> <QuerySet [Artifact(uid='3HXDBdg8MxRsKTlg0000', is_latest=True, description='my RNA-seq', suffix='.h5ad', kind='dataset', otype='AnnData', size=19240, hash='gO44MDqttaaKNyBLVM-zzA', n_observations=3, space_id=1, storage_id=1, run_id=1, schema_id=4, created_by_id=1, created_at=2025-04-07 11:34:21 UTC), Artifact(uid='Dhb8smPQGadISDIP0000', is_latest=True, key='my_datasets/my_rnaseq2.h5ad', suffix='.h5ad', kind='dataset', otype='AnnData', size=19240, hash='Ti7bcnIOlk0fIt2TFW_VAw', n_observations=3, space_id=1, storage_id=1, run_id=1, schema_id=4, created_by_id=1, created_at=2025-04-07 11:34:24 UTC)]>
collection$artifacts$df()
#>                    uid                         key description suffix    kind
#> 3 3HXDBdg8MxRsKTlg0000                        <NA>  my RNA-seq  .h5ad dataset
#> 4 Dhb8smPQGadISDIP0000 my_datasets/my_rnaseq2.h5ad        <NA>  .h5ad dataset
#>     otype  size                   hash n_files n_observations _hash_type
#> 3 AnnData 19240 gO44MDqttaaKNyBLVM-zzA    <NA>              3        md5
#> 4 AnnData 19240 Ti7bcnIOlk0fIt2TFW_VAw    <NA>              3        md5
#>   _key_is_virtual _overwrite_versions space_id storage_id schema_id version
#> 3            TRUE               FALSE        1          1         4    <NA>
#> 4            TRUE               FALSE        1          1         4    <NA>
#>   is_latest run_id          created_at created_by_id _aux _branch_code
#> 3      TRUE      1 2025-04-07 11:34:21             1 <NA>            1
#> 4      TRUE      1 2025-04-07 11:34:24             1 <NA>            1

Other examples

Slice a TileDB-SOMA array store

When artifacts contain TileDB-SOMA array stores they can be opened and sliced using the {tiledbsoma} package.

# Set some environment variables to avoid an issue with {tiledbsoma}
# https://github.com/chanzuckerberg/cellxgene-census/issues/1261
Sys.setenv(TILEDB_VFS_S3_REGION = "us-west-2")
Sys.setenv(AWS_DEFAULT_REGION = "us-west-2")
Sys.setenv(TILEDB_VFS_S3_NO_SIGN_REQUEST = "true")

# Define a filter to select specific cells
value_filter <- paste(
  "tissue == 'brain' &&",
  "cell_type %in% c('microglial cell', 'neuron') &&",
  "suspension_type == 'cell' &&",
  "assay == '10x 3\\' v3'"
)

# Get the artifact containing the CELLxGENE Census TileDB-SOMA store
census_artifact <- ln$Artifact$using("laminlabs/cellxgene")$get("FYMewVq5twKMDXVy0001")
# Open the SOMACollection
soma_collection <- census_artifact$open()
#> β†’ completing transfer to track Artifact('FYMewVq5') as input
#> β†’ mapped records: 
#> β†’ transferred records: Artifact(uid='FYMewVq5twKMDXVy0001'), Storage(uid='oIYGbD74')
#> β€’ adding artifact ids [5] as inputs for run 1, adding parent transform 2
# Slice the store to get a SOMADataFrame containing metadata for the cells of interest
cell_metadata <- soma_collection$get("census_data")$get("homo_sapiens")$obs$read(value_filter = value_filter)
# Concatenate the results to an arrow::Table
cell_metadata <- cell_metadata$concat()
# Convert to a data.frame
cell_metadata <- cell_metadata$to_data_frame()

cell_metadata
#> # A tibble: 66,418 Γ— 28
#>    soma_joinid dataset_id                 assay assay_ontology_term_id cell_type
#>          <int> <fct>                      <fct> <fct>                  <fct>    
#>  1    48182177 c888b684-6c51-431f-972a-6… 10x … EFO:0009922            microgli…
#>  2    48182178 c888b684-6c51-431f-972a-6… 10x … EFO:0009922            microgli…
#>  3    48182185 c888b684-6c51-431f-972a-6… 10x … EFO:0009922            microgli…
#>  4    48182187 c888b684-6c51-431f-972a-6… 10x … EFO:0009922            microgli…
#>  5    48182188 c888b684-6c51-431f-972a-6… 10x … EFO:0009922            microgli…
#>  6    48182189 c888b684-6c51-431f-972a-6… 10x … EFO:0009922            microgli…
#>  7    48182190 c888b684-6c51-431f-972a-6… 10x … EFO:0009922            microgli…
#>  8    48182191 c888b684-6c51-431f-972a-6… 10x … EFO:0009922            microgli…
#>  9    48182192 c888b684-6c51-431f-972a-6… 10x … EFO:0009922            microgli…
#> 10    48182194 c888b684-6c51-431f-972a-6… 10x … EFO:0009922            microgli…
#> # β„Ή 66,408 more rows
#> # β„Ή 23 more variables: cell_type_ontology_term_id <fct>,
#> #   development_stage <fct>, development_stage_ontology_term_id <fct>,
#> #   disease <fct>, disease_ontology_term_id <fct>, donor_id <fct>,
#> #   is_primary_data <lgl>, observation_joinid <chr>,
#> #   self_reported_ethnicity <fct>,
#> #   self_reported_ethnicity_ontology_term_id <fct>, sex <fct>, …

Finish tracking

Mark the analysis run as finished to create a time stamp and upload source code to the hub.

ln$finish()
#> β„Ή Creating lockfile /home/runner/.cache/lamindb/environments/run_gFnVLAHLmNQ3h9…
#> 
#> βœ” Updated metadata database: 2.36 MB in 4 files.
#> 
#> β„Ή Creating lockfile /home/runner/.cache/lamindb/environments/run_gFnVLAHLmNQ3h9…ℹ source packages are missing from RSPM: Could not resolve host: RSPM
#> β„Ή Creating lockfile /home/runner/.cache/lamindb/environments/run_gFnVLAHLmNQ3h9…ℹ Updating metadata database
#> βœ” Updating metadata database ... done
#> 
#> β„Ή Creating lockfile /home/runner/.cache/lamindb/environments/run_gFnVLAHLmNQ3h9β€¦βœ” Created lockfile /home/runner/.cache/lamindb/environments/run_gFnVLAHLmNQ3h9N…
#> ! no html report found; to attach one, create an .html export for your .Rmd file and then run: lamin save introduction.Rmd
#> β†’ finished Run('gFnVLAHL') after 47s at 2025-04-07 11:35:00 UTC

Save a notebook report (not needed for .R scripts)

Save a run report of your notebook (.Rmd or .qmd file) to your instance:

  1. Render the notebook to HTML
  • In RStudio, click the β€œKnit” button

  • OR From the command line, run:

    Rscript -e 'rmarkdown::render("introduction.Rmd")'
  • OR Use the rmarkdown package in R:

    rmarkdown::render("introduction.Rmd")
  1. Save it to your LaminDB instance:
lamin_save("introduction.Rmd")
  • OR Using the lamin CLI:
lamin save introduction.Rmd

Design

See https://docs.lamin.ai/introduction#design for more information on the design of LaminDB.