This vignette introduces the {laminr} workflow.
To learn more about LaminDB, see docs.lamin.ai.
Setup
Install {laminr} from CRAN:
install.packages("laminr", dependencies = TRUE)
Install the underlying Python packages LaminDB and Bionty:
laminr::install_lamindb(extra_packages = c("bionty"))
Set the default LaminDB instance:
laminr::lamin_connect("<owner>/<name>")
This instance acts as the default instance for everything that follows. Any data and tracking information will be added to it.
If you donβt have access to an instance, create a local test instance.
lamin_init(storage = "./laminr-intro", modules = c("bionty"))
Import
To start working with {laminr}, import the lamindb module:
ln <- import_module("lamindb")
#> β connected lamindb: testuser1/laminr-intro
This is equivalent to import lamindb as ln
in
Python.
Walkthrough
This section of the vignette reproduces the walkthrough from the LaminDB Introduction guide. The equivalent {laminr} code is included here, for the related text see the associated links.
See https://docs.lamin.ai/guide#walkthrough.
Transforms
See https://docs.lamin.ai/guide#transforms.
ln <- import_module("lamindb")
ln$track()
#> β created Transform('fFqSLCzfPR3y0000'), started new Run('gFnVLAHL...') at 2025-04-07 11:34:13 UTC
ln$Transform$df()
#> uid key description type source_code hash
#> 1 fFqSLCzfPR3y0000 introduction.Rmd introduction.Rmd notebook <NA> <NA>
#> reference reference_type space_id _template_id version is_latest
#> 1 <NA> <NA> 1 <NA> <NA> TRUE
#> created_at created_by_id _aux _branch_code
#> 1 2025-04-07 11:34:13 1 <NA> 1
ln$Run$df()
#> uid name started_at finished_at reference
#> 1 gFnVLAHLmNQ3h9N7LNwI <NA> 2025-04-07 11:34:13 <NA> <NA>
#> reference_type _is_consecutive _status_code space_id transform_id report_id
#> 1 <NA> <NA> 0 1 1 <NA>
#> _logfile_id environment_id initiated_by_run_id created_at
#> 1 <NA> <NA> <NA> 2025-04-07 11:34:13
#> created_by_id _aux _branch_code
#> 1 1 <NA> 1
Artifacts
Artifacts are objects that bundle data and associated metadata. An artifact can be any file or folder but is typically a dataset.
See https://docs.lamin.ai/guide#artifacts.
df <- ln$core$datasets$small_dataset1(otype = "DataFrame", with_typo = TRUE)
df
#> ENSG00000153563 ENSG00000010610 ENSG00000170458 perturbation
#> sample1 1 3 5 DMSO
#> sample2 2 4 6 IFNJ
#> sample3 3 5 7 DMSO
#> sample_note cell_type_by_expert cell_type_by_model
#> sample1 was ok B cell B cell
#> sample2 looks naah CD8-positive, alpha-beta T cell T cell
#> sample3 pretty! π€© CD8-positive, alpha-beta T cell T cell
#> assay_oid concentration treatment_time_h donor
#> sample1 EFO:0008913 0.1% 24 D0001
#> sample2 EFO:0008913 200 nM 24 D0002
#> sample3 EFO:0008913 0.1% 6 <NA>
artifact <- ln$Artifact$from_df(df, key = "my_datasets/rnaseq1.parquet")$save()
artifact$describe()
#> Artifact .parquet/DataFrame
#> βββ General
#> βββ .uid = 'Q3dgy9yok1Uuclsb0000'
#> βββ .key = 'my_datasets/rnaseq1.parquet'
#> βββ .size = 8530
#> βββ .hash = 'hDTNFwEPy7pY1EKnOHog2A'
#> βββ .n_observations = 3
#> βββ .path =
#> β /home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb
#> β /Q3dgy9yok1Uuclsb0000.parquet
#> βββ .created_by = testuser1 (Test User1)
#> βββ .created_at = 2025-04-07 11:34:14
#> βββ .transform = 'introduction.Rmd'
artifact$cache()
#> [1] "/home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb/Q3dgy9yok1Uuclsb0000.parquet"
dataset <- artifact$open()
as.data.frame(dataset)
#> # A tibble: 3 Γ 12
#> ENSG00000153563 ENSG00000010610 ENSG00000170458 perturbation sample_note
#> <dbl> <dbl> <dbl> <fct> <chr>
#> 1 1 3 5 DMSO was ok
#> 2 2 4 6 IFNJ looks naah
#> 3 3 5 7 DMSO pretty! π€©
#> # βΉ 7 more variables: cell_type_by_expert <fct>, cell_type_by_model <fct>,
#> # assay_oid <fct>, concentration <chr>, treatment_time_h <dbl>, donor <chr>,
#> # `__index_level_0__` <chr>
artifact$load()
#> ENSG00000153563 ENSG00000010610 ENSG00000170458 perturbation
#> sample1 1 3 5 DMSO
#> sample2 2 4 6 IFNJ
#> sample3 3 5 7 DMSO
#> sample_note cell_type_by_expert cell_type_by_model
#> sample1 was ok B cell B cell
#> sample2 looks naah CD8-positive, alpha-beta T cell T cell
#> sample3 pretty! π€© CD8-positive, alpha-beta T cell T cell
#> assay_oid concentration treatment_time_h donor
#> sample1 EFO:0008913 0.1% 24 D0001
#> sample2 EFO:0008913 200 nM 24 D0002
#> sample3 EFO:0008913 0.1% 6 <NA>
artifact$view_lineage()
#> β `view_lineage()` is not yet implemented. Please view the lineage in the web interface.
df_typo <- df
levels(df$perturbation) <- c("DMSO", "IFNG")
df["sample2", "perturbation"] <- "IFNG"
artifact <- ln$Artifact$from_df(df, key = "my_datasets/rnaseq1.parquet")$save()
#> β creating new artifact version for key='my_datasets/rnaseq1.parquet' (storage: '/home/runner/work/laminr/laminr/vignettes/articles/laminr-intro')
artifact$versions$df()
#> uid key description suffix kind
#> 1 Q3dgy9yok1Uuclsb0000 my_datasets/rnaseq1.parquet <NA> .parquet dataset
#> 2 Q3dgy9yok1Uuclsb0001 my_datasets/rnaseq1.parquet <NA> .parquet dataset
#> otype size hash n_files n_observations _hash_type
#> 1 DataFrame 8530 hDTNFwEPy7pY1EKnOHog2A <NA> 3 md5
#> 2 DataFrame 8530 RVElCjRCfyOAxbLNDEAMOg <NA> 3 md5
#> _key_is_virtual _overwrite_versions space_id storage_id schema_id version
#> 1 TRUE FALSE 1 1 <NA> <NA>
#> 2 TRUE FALSE 1 1 <NA> <NA>
#> is_latest run_id created_at created_by_id _aux _branch_code
#> 1 FALSE 1 2025-04-07 11:34:14 1 <NA> 1
#> 2 TRUE 1 2025-04-07 11:34:15 1 <NA> 1
Labels
See https://docs.lamin.ai/guide#labels.
bt <- import_module("bionty")
experiment_type <- ln$ULabel(name = "Experiment", is_type = TRUE)$save()
candidate_marker_experiment <- ln$ULabel(
name = "Candidate marker experiment", type = experiment_type
)$save()
artifact$ulabels$add(candidate_marker_experiment)
cell_type <- bt$CellType$from_source(name = "effector T cell")$save()
artifact$cell_types$add(cell_type)
artifact$describe()
#> Artifact .parquet/DataFrame
#> βββ General
#> β βββ .uid = 'Q3dgy9yok1Uuclsb0001'
#> β βββ .key = 'my_datasets/rnaseq1.parquet'
#> β βββ .size = 8530
#> β βββ .hash = 'RVElCjRCfyOAxbLNDEAMOg'
#> β βββ .n_observations = 3
#> β βββ .path =
#> β β /home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb
#> β β /Q3dgy9yok1Uuclsb0001.parquet
#> β βββ .created_by = testuser1 (Test User1)
#> β βββ .created_at = 2025-04-07 11:34:15
#> β βββ .transform = 'introduction.Rmd'
#> βββ Labels
#> βββ .cell_types bionty.CellType effector T cell
#> .ulabels ULabel Candidate marker experiment
Registries
See https://docs.lamin.ai/guide#registries.
ln$ULabel$df()
#> uid name is_type description reference
#> 2 Su0Niv0T Candidate marker experiment FALSE <NA> <NA>
#> 1 x8GIqLSm Experiment TRUE <NA> <NA>
#> reference_type space_id type_id run_id created_at created_by_id _aux
#> 2 <NA> 1 1 1 2025-04-07 11:34:15 1 <NA>
#> 1 <NA> 1 NaN 1 2025-04-07 11:34:15 1 <NA>
#> _branch_code
#> 2 1
#> 1 1
ln$Artifact
#> Artifact
#> Simple fields
#> .uid: CharField
#> .key: CharField
#> .description: CharField
#> .suffix: CharField
#> .kind: CharField
#> .otype: CharField
#> .size: BigIntegerField
#> .hash: CharField
#> .n_files: BigIntegerField
#> .n_observations: BigIntegerField
#> .version: CharField
#> .is_latest: BooleanField
#> .created_at: DateTimeField
#> .updated_at: DateTimeField
#> Relational fields
#> .space: Space
#> .storage: Storage
#> .run: Run
#> .schema: Schema
#> .created_by: User
#> .ulabels: ULabel
#> .input_of_runs: Run
#> .feature_sets: Schema
#> .collections: Collection
#> .references: Reference
#> .projects: Project
#> Bionty fields
#> .organisms: bionty.Organism
#> .genes: bionty.Gene
#> .proteins: bionty.Protein
#> .cell_markers: bionty.CellMarker
#> .tissues: bionty.Tissue
#> .cell_types: bionty.CellType
#> .diseases: bionty.Disease
#> .cell_lines: bionty.CellLine
#> .phenotypes: bionty.Phenotype
#> .pathways: bionty.Pathway
#> .experimental_factors: bionty.ExperimentalFactor
#> .developmental_stages: bionty.DevelopmentalStage
#> .ethnicities: bionty.Ethnicity
#> signature: (*args, **kwargs)
Query & search
See https://docs.lamin.ai/guide#query-search.
transform <- ln$Transform$get(key = "introduction.Rmd")
ln$Artifact$filter(key__startswith = "my_datasets/")$df()
#> uid key description suffix kind
#> 1 Q3dgy9yok1Uuclsb0000 my_datasets/rnaseq1.parquet <NA> .parquet dataset
#> 2 Q3dgy9yok1Uuclsb0001 my_datasets/rnaseq1.parquet <NA> .parquet dataset
#> otype size hash n_files n_observations _hash_type
#> 1 DataFrame 8530 hDTNFwEPy7pY1EKnOHog2A <NA> 3 md5
#> 2 DataFrame 8530 RVElCjRCfyOAxbLNDEAMOg <NA> 3 md5
#> _key_is_virtual _overwrite_versions space_id storage_id schema_id version
#> 1 TRUE FALSE 1 1 <NA> <NA>
#> 2 TRUE FALSE 1 1 <NA> <NA>
#> is_latest run_id created_at created_by_id _aux _branch_code
#> 1 FALSE 1 2025-04-07 11:34:14 1 <NA> 1
#> 2 TRUE 1 2025-04-07 11:34:15 1 <NA> 1
artifacts <- ln$Artifact$filter(transform = transform)$all()
artifacts <- ln$Artifact$filter(
transform__description__icontains = "intro", ulabels = candidate_marker_experiment
)$all()
ln$Transform$search("intro")$df()
#> uid key description type source_code hash
#> 1 fFqSLCzfPR3y0000 introduction.Rmd introduction.Rmd notebook <NA> <NA>
#> reference reference_type space_id _template_id version is_latest
#> 1 <NA> <NA> 1 <NA> <NA> TRUE
#> created_at created_by_id _aux _branch_code
#> 1 2025-04-07 11:34:13 1 <NA> 1
ulabels <- ln$ULabel$lookup()
cell_types <- bt$CellType$lookup()
Features
See https://docs.lamin.ai/guide#features.
ln$Feature(name = "temperature", dtype = "float")$save()
#> Feature(uid='9JLyu3L1y7as', name='temperature', dtype='float', array_rank=0, array_size=0, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-07 11:34:17 UTC)
ln$Feature(name = "experiment", dtype = ln$ULabel)$save()
#> Feature(uid='2VSaKqnaKQPq', name='experiment', dtype='cat[ULabel]', array_rank=0, array_size=0, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-07 11:34:17 UTC)
artifact$features$add_values(
list("temperature" = 21.6, "experiment" = "Candidate marker experiment")
)
artifact$describe()
#> Artifact .parquet/DataFrame
#> βββ General
#> β βββ .uid = 'Q3dgy9yok1Uuclsb0001'
#> β βββ .key = 'my_datasets/rnaseq1.parquet'
#> β βββ .size = 8530
#> β βββ .hash = 'RVElCjRCfyOAxbLNDEAMOg'
#> β βββ .n_observations = 3
#> β βββ .path =
#> β β /home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb
#> β β /Q3dgy9yok1Uuclsb0001.parquet
#> β βββ .created_by = testuser1 (Test User1)
#> β βββ .created_at = 2025-04-07 11:34:15
#> β βββ .transform = 'introduction.Rmd'
#> βββ Linked features
#> β βββ experiment cat[ULabel] Candidate marker experiment
#> β temperature float 21.6
#> βββ Labels
#> βββ .cell_types bionty.CellType effector T cell
#> .ulabels ULabel Candidate marker experiment
ln$Artifact$features$filter(experiment__contains = "marker experiment")$df()
#> uid key description suffix kind
#> 2 Q3dgy9yok1Uuclsb0001 my_datasets/rnaseq1.parquet <NA> .parquet dataset
#> otype size hash n_files n_observations _hash_type
#> 2 DataFrame 8530 RVElCjRCfyOAxbLNDEAMOg <NA> 3 md5
#> _key_is_virtual _overwrite_versions space_id storage_id schema_id version
#> 2 TRUE FALSE 1 1 <NA> <NA>
#> is_latest run_id created_at created_by_id _aux _branch_code
#> 2 TRUE 1 2025-04-07 11:34:15 1 <NA> 1
Key use cases
This section of reproduces the key use cases from the LaminDB Introduction guide.
See https://docs.lamin.ai/guide#key-use-cases.
Understand data lineage
See https://docs.lamin.ai/guide#understand-data-lineage.
artifact$view_lineage()
#> β `view_lineage()` is not yet implemented. Please view the lineage in the web interface.
transform$view_lineage()
#> β `view_lineage()` is not yet implemented. Please view the lineage in the web interface.
# Example only, not run
ln <- import_module("lamindb")
ln$track()
ln$finish()
# lamin load https://lamin.ai/laminlabs/lamindata/transform/13VINnFk89PE0004
Curate datasets
See https://docs.lamin.ai/introduction#curate-datasets.
perturbation_type <- ln$ULabel(name = "Perturbation", is_type = TRUE)$save()
ln$ULabel(name = "DMSO", type = perturbation_type)$save()
#> ULabel(uid='6HEbX640', name='DMSO', is_type=False, space_id=1, created_by_id=1, run_id=1, type_id=3, created_at=2025-04-07 11:34:17 UTC)
ln$ULabel(name = "IFNG", type = perturbation_type)$save()
#> ULabel(uid='al6kvAtl', name='IFNG', is_type=False, space_id=1, created_by_id=1, run_id=1, type_id=3, created_at=2025-04-07 11:34:17 UTC)
# Load Python built ins to get access to dtypes
py_builtins <- reticulate::import_builtins()
schema <- ln$Schema(
name = "My DataFrame schema",
features = list(
# NOTE: These have dtype=int in the original guide
ln$Feature(name = "ENSG00000153563", dtype = py_builtins$float)$save(),
ln$Feature(name = "ENSG00000010610", dtype = py_builtins$float)$save(),
ln$Feature(name = "ENSG00000170458", dtype = py_builtins$float)$save(),
ln$Feature(name = "perturbation", dtype = ln$ULabel)$save()
)
)$save()
curator <- ln$curators$DataFrameCurator(df, schema)
artifact <- curator$save_artifact(key = "my_curated_dataset.parquet")
#> β "perturbation" is validated against ULabel.name
#> β returning existing artifact with same hash: Artifact(uid='Q3dgy9yok1Uuclsb0001', is_latest=True, key='my_datasets/rnaseq1.parquet', suffix='.parquet', kind='dataset', otype='DataFrame', size=8530, hash='RVElCjRCfyOAxbLNDEAMOg', n_observations=3, space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-04-07 11:34:15 UTC); to track this artifact as an input, use: ln.Artifact.get()
#> ! key my_datasets/rnaseq1.parquet on existing artifact differs from passed key my_curated_dataset.parquet
#> β 4 unique terms (36.40%) are validated for name
#> ! 7 unique terms (63.60%) are not validated for name: 'sample_note', 'cell_type_by_expert', 'cell_type_by_model', 'assay_oid', 'concentration', 'treatment_time_h', 'donor'
#> β loaded 4 Feature records matching name: 'ENSG00000153563', 'ENSG00000010610', 'ENSG00000170458', 'perturbation'
#> ! did not create Feature records for 7 non-validated names: 'assay_oid', 'cell_type_by_expert', 'cell_type_by_model', 'concentration', 'donor', 'sample_note', 'treatment_time_h'
#> β returning existing schema with same hash: Schema(uid='anyCXdQt5yZfzYOjzdHH', name='My DataFrame schema', n=4, itype='Feature', is_type=False, hash='Wj32No5I4aB6ETal2smSxw', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-07 11:34:18 UTC)
#> ! updated otype from None to DataFrame
artifact$describe()
#> Artifact .parquet/DataFrame
#> βββ General
#> β βββ .uid = 'Q3dgy9yok1Uuclsb0001'
#> β βββ .key = 'my_datasets/rnaseq1.parquet'
#> β βββ .size = 8530
#> β βββ .hash = 'RVElCjRCfyOAxbLNDEAMOg'
#> β βββ .n_observations = 3
#> β βββ .path =
#> β β /home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb
#> β β /Q3dgy9yok1Uuclsb0001.parquet
#> β βββ .created_by = testuser1 (Test User1)
#> β βββ .created_at = 2025-04-07 11:34:15
#> β βββ .transform = 'introduction.Rmd'
#> βββ Dataset features/.feature_sets
#> β βββ columns β’ 4 [Feature]
#> β perturbation cat[ULabel] DMSO, IFNG
#> β ENSG00000153563 float
#> β ENSG00000010610 float
#> β ENSG00000170458 float
#> βββ Linked features
#> β βββ experiment cat[ULabel] Candidate marker experiment
#> β temperature float 21.6
#> βββ Labels
#> βββ .cell_types bionty.CellType effector T cell
#> .ulabels ULabel Candidate marker experiment, DMSβ¦
ln$Artifact$get(ulabels__name = "IFNG")
#> Artifact(uid='Q3dgy9yok1Uuclsb0001', is_latest=True, key='my_datasets/rnaseq1.parquet', suffix='.parquet', kind='dataset', otype='DataFrame', size=8530, hash='RVElCjRCfyOAxbLNDEAMOg', n_observations=3, space_id=1, storage_id=1, run_id=1, schema_id=1, created_by_id=1, created_at=2025-04-07 11:34:15 UTC)
curator <- ln$curators$DataFrameCurator(df_typo, schema)
tryCatch(
curator$validate(),
error = function(err) {
cat(conditionMessage(err))
}
)
#> β’ mapping "perturbation" on ULabel.name
#> ! 1 term is not validated: 'IFNJ'
#> β fix typos, remove non-existent values, or save terms via .add_new_from("perturbation")
#> lamindb.errors.ValidationError: 1 term is not validated: 'IFNJ'
#> β fix typos, remove non-existent values, or save terms via .add_new_from("perturbation")
#> Run `reticulate::py_last_error()` for details.
Manage biological registries
See https://docs.lamin.ai/introduction#manage-biological-registries.
cell_types <- bt$CellType$public()
cell_types
#> PublicOntology
#> Entity: CellType
#> Organism: all
#> Source: cl, 2024-08-16
#> #terms: 2959
cell_types$search("gamma-delta T cell") |> head(2)
#> name
#> CL:0000798 gamma-delta T cell
#> CL:4033072 cycling gamma-delta T cell
#> definition
#> CL:0000798 A T Cell That Expresses A Gamma-Delta T Cell Receptor Complex.
#> CL:4033072 A(N) Gamma-Delta T Cell That Is Cycling.
#> synonyms
#> CL:0000798 gamma-delta T-cell|gamma-delta T lymphocyte|gammadelta T cell|gamma-delta T-lymphocyte
#> CL:4033072 proliferating gamma-delta T cell
#> parents
#> CL:0000798 CL:0000084
#> CL:4033072 CL:4033069, CL:0000798
var_schema <- ln$Schema(
name = "my_var_schema",
itype = bt$Gene$ensembl_gene_id,
dtype = py_builtins$float
)$save()
obs_schema <- ln$Schema(
name = "my_obs_schema",
features = list(
ln$Feature(name = "perturbation", dtype = ln$ULabel)$save()
)
)$save()
#> β returning existing Feature record with same name: 'perturbation'
anndata_schema <- ln$Schema(
name = "my_anndata_schema",
otype = "AnnData",
components = list("obs" = obs_schema, "var" = var_schema)
)$save()
library(anndata)
adata <- AnnData(
df[c("ENSG00000153563", "ENSG00000010610", "ENSG00000170458")],
obs = df[, "perturbation", drop = FALSE]
)
curator <- ln$curators$AnnDataCurator(adata, anndata_schema)
#> β created 1 Organism record from Bionty matching name: 'human'
#> β’ saving validated records of 'columns'
#> β added 3 records from public with Gene.ensembl_gene_id for "columns": 'ENSG00000010610', 'ENSG00000153563', 'ENSG00000170458'
artifact <- curator$save_artifact(description = "my RNA-seq")
#> β "perturbation" is validated against ULabel.name
#> β’ path content will be copied to default storage upon `save()` with key `None` ('.lamindb/3HXDBdg8MxRsKTlg0000.h5ad')
#> β storing artifact '3HXDBdg8MxRsKTlg0000' at '/home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb/3HXDBdg8MxRsKTlg0000.h5ad'
#> β 3 unique terms (100.00%) are validated for ensembl_gene_id
#> β 1 unique term (100.00%) is validated for name
#> β returning existing schema with same hash: Schema(uid='ev9BnzwcaG8uoUdz5uPz', name='my_obs_schema', n=1, itype='Feature', is_type=False, hash='ELzKeWx4rgC5FKaVtvH-9Q', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-07 11:34:19 UTC)
#> ! updated otype from None to DataFrame
#> β saved 1 feature set for slot: 'var'
artifact$describe()
#> Artifact .h5ad/AnnData
#> βββ General
#> β βββ .uid = '3HXDBdg8MxRsKTlg0000'
#> β βββ .size = 19240
#> β βββ .hash = 'gO44MDqttaaKNyBLVM-zzA'
#> β βββ .n_observations = 3
#> β βββ .path =
#> β β /home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb
#> β β /3HXDBdg8MxRsKTlg0000.h5ad
#> β βββ .created_by = testuser1 (Test User1)
#> β βββ .created_at = 2025-04-07 11:34:21
#> β βββ .transform = 'introduction.Rmd'
#> βββ Dataset features/.feature_sets
#> β βββ var β’ 3 [bionty.Gene]
#> β β CD4 float
#> β β CD8A float
#> β β CD14 float
#> β βββ obs β’ 1 [Feature]
#> β perturbation cat[ULabel] DMSO, IFNG
#> βββ Labels
#> βββ .ulabels ULabel DMSO, IFNG
genes <- bt$Gene$filter(organism__name = "human")$lookup()
feature_sets <- ln$FeatureSet$filter(genes = genes$cd8a)$all()
ln$Artifact$filter(feature_sets__in = feature_sets)$df()
#> uid key description suffix kind otype size
#> 3 3HXDBdg8MxRsKTlg0000 <NA> my RNA-seq .h5ad dataset AnnData 19240
#> hash n_files n_observations _hash_type _key_is_virtual
#> 3 gO44MDqttaaKNyBLVM-zzA <NA> 3 md5 TRUE
#> _overwrite_versions space_id storage_id schema_id version is_latest run_id
#> 3 FALSE 1 1 4 <NA> TRUE 1
#> created_at created_by_id _aux _branch_code
#> 3 2025-04-07 11:34:21 1 <NA> 1
neuron <- bt$CellType$from_source(name = "neuron")$save()
#> β created 1 CellType record from Bionty matching name: 'neuron'
#> β created 3 CellType records from Bionty matching ontology_id: 'CL:0002319', 'CL:0000404', 'CL:0000393'
new_cell_state <- bt$CellType(
name = "my neuron cell state", description = "explains X"
)$save()
new_cell_state$parents$add(neuron)
new_cell_state$view_parents(distance = 2)
Scale learning
See https://docs.lamin.ai/introduction#scale-learning.
df2 <- ln$core$datasets$small_dataset2(otype = "DataFrame")
adata <- AnnData(
df2[c("ENSG00000153563", "ENSG00000010610", "ENSG00000004468")],
obs = df2[, "perturbation", drop = FALSE]
)
curator <- ln$curators$AnnDataCurator(adata, anndata_schema)
#> β’ saving validated records of 'columns'
#> β added 1 record from public with Gene.ensembl_gene_id for "columns": 'ENSG00000004468'
artifact2 <- curator$save_artifact(key = "my_datasets/my_rnaseq2.h5ad")
#> β "perturbation" is validated against ULabel.name
#> β’ path content will be copied to default storage upon `save()` with key 'my_datasets/my_rnaseq2.h5ad'
#> β storing artifact 'Dhb8smPQGadISDIP0000' at '/home/runner/work/laminr/laminr/vignettes/articles/laminr-intro/.lamindb/Dhb8smPQGadISDIP0000.h5ad'
#> β 3 unique terms (100.00%) are validated for ensembl_gene_id
#> β 1 unique term (100.00%) is validated for name
#> β returning existing schema with same hash: Schema(uid='ev9BnzwcaG8uoUdz5uPz', name='my_obs_schema', n=1, itype='Feature', is_type=False, hash='ELzKeWx4rgC5FKaVtvH-9Q', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-07 11:34:19 UTC)
#> ! updated otype from None to DataFrame
#> β saved 1 feature set for slot: 'var'
collection <- ln$Collection(
list(artifact, artifact2),
key = "my-RNA-seq-collection"
)$save()
collection$describe()
#> Collection
#> βββ General
#> βββ .uid = 'P1g9vekkwk22Kk4V0000'
#> βββ .key = 'my-RNA-seq-collection'
#> βββ .hash = 'DbgO9hDdS-KOydwDZDLp3g'
#> βββ .created_by = testuser1 (Test User1)
#> βββ .created_at = 2025-04-07 11:34:25
#> βββ .transform = 'introduction.Rmd'
collection$view_lineage()
#> β `view_lineage()` is not yet implemented. Please view the lineage in the web interface.
collection$load()
#> AnnData object with n_obs Γ n_vars = 6 Γ 4
#> obs: 'perturbation', 'artifact_uid'
collection$artifacts$all()
#> <QuerySet [Artifact(uid='3HXDBdg8MxRsKTlg0000', is_latest=True, description='my RNA-seq', suffix='.h5ad', kind='dataset', otype='AnnData', size=19240, hash='gO44MDqttaaKNyBLVM-zzA', n_observations=3, space_id=1, storage_id=1, run_id=1, schema_id=4, created_by_id=1, created_at=2025-04-07 11:34:21 UTC), Artifact(uid='Dhb8smPQGadISDIP0000', is_latest=True, key='my_datasets/my_rnaseq2.h5ad', suffix='.h5ad', kind='dataset', otype='AnnData', size=19240, hash='Ti7bcnIOlk0fIt2TFW_VAw', n_observations=3, space_id=1, storage_id=1, run_id=1, schema_id=4, created_by_id=1, created_at=2025-04-07 11:34:24 UTC)]>
collection$artifacts$df()
#> uid key description suffix kind
#> 3 3HXDBdg8MxRsKTlg0000 <NA> my RNA-seq .h5ad dataset
#> 4 Dhb8smPQGadISDIP0000 my_datasets/my_rnaseq2.h5ad <NA> .h5ad dataset
#> otype size hash n_files n_observations _hash_type
#> 3 AnnData 19240 gO44MDqttaaKNyBLVM-zzA <NA> 3 md5
#> 4 AnnData 19240 Ti7bcnIOlk0fIt2TFW_VAw <NA> 3 md5
#> _key_is_virtual _overwrite_versions space_id storage_id schema_id version
#> 3 TRUE FALSE 1 1 4 <NA>
#> 4 TRUE FALSE 1 1 4 <NA>
#> is_latest run_id created_at created_by_id _aux _branch_code
#> 3 TRUE 1 2025-04-07 11:34:21 1 <NA> 1
#> 4 TRUE 1 2025-04-07 11:34:24 1 <NA> 1
Other examples
Slice a TileDB-SOMA array store
When artifacts contain TileDB-SOMA array stores they can be opened and sliced using the {tiledbsoma} package.
# Set some environment variables to avoid an issue with {tiledbsoma}
# https://github.com/chanzuckerberg/cellxgene-census/issues/1261
Sys.setenv(TILEDB_VFS_S3_REGION = "us-west-2")
Sys.setenv(AWS_DEFAULT_REGION = "us-west-2")
Sys.setenv(TILEDB_VFS_S3_NO_SIGN_REQUEST = "true")
# Define a filter to select specific cells
value_filter <- paste(
"tissue == 'brain' &&",
"cell_type %in% c('microglial cell', 'neuron') &&",
"suspension_type == 'cell' &&",
"assay == '10x 3\\' v3'"
)
# Get the artifact containing the CELLxGENE Census TileDB-SOMA store
census_artifact <- ln$Artifact$using("laminlabs/cellxgene")$get("FYMewVq5twKMDXVy0001")
# Open the SOMACollection
soma_collection <- census_artifact$open()
#> β completing transfer to track Artifact('FYMewVq5') as input
#> β mapped records:
#> β transferred records: Artifact(uid='FYMewVq5twKMDXVy0001'), Storage(uid='oIYGbD74')
#> β’ adding artifact ids [5] as inputs for run 1, adding parent transform 2
# Slice the store to get a SOMADataFrame containing metadata for the cells of interest
cell_metadata <- soma_collection$get("census_data")$get("homo_sapiens")$obs$read(value_filter = value_filter)
# Concatenate the results to an arrow::Table
cell_metadata <- cell_metadata$concat()
# Convert to a data.frame
cell_metadata <- cell_metadata$to_data_frame()
cell_metadata
#> # A tibble: 66,418 Γ 28
#> soma_joinid dataset_id assay assay_ontology_term_id cell_type
#> <int> <fct> <fct> <fct> <fct>
#> 1 48182177 c888b684-6c51-431f-972a-6β¦ 10x β¦ EFO:0009922 microgliβ¦
#> 2 48182178 c888b684-6c51-431f-972a-6β¦ 10x β¦ EFO:0009922 microgliβ¦
#> 3 48182185 c888b684-6c51-431f-972a-6β¦ 10x β¦ EFO:0009922 microgliβ¦
#> 4 48182187 c888b684-6c51-431f-972a-6β¦ 10x β¦ EFO:0009922 microgliβ¦
#> 5 48182188 c888b684-6c51-431f-972a-6β¦ 10x β¦ EFO:0009922 microgliβ¦
#> 6 48182189 c888b684-6c51-431f-972a-6β¦ 10x β¦ EFO:0009922 microgliβ¦
#> 7 48182190 c888b684-6c51-431f-972a-6β¦ 10x β¦ EFO:0009922 microgliβ¦
#> 8 48182191 c888b684-6c51-431f-972a-6β¦ 10x β¦ EFO:0009922 microgliβ¦
#> 9 48182192 c888b684-6c51-431f-972a-6β¦ 10x β¦ EFO:0009922 microgliβ¦
#> 10 48182194 c888b684-6c51-431f-972a-6β¦ 10x β¦ EFO:0009922 microgliβ¦
#> # βΉ 66,408 more rows
#> # βΉ 23 more variables: cell_type_ontology_term_id <fct>,
#> # development_stage <fct>, development_stage_ontology_term_id <fct>,
#> # disease <fct>, disease_ontology_term_id <fct>, donor_id <fct>,
#> # is_primary_data <lgl>, observation_joinid <chr>,
#> # self_reported_ethnicity <fct>,
#> # self_reported_ethnicity_ontology_term_id <fct>, sex <fct>, β¦
Finish tracking
Mark the analysis run as finished to create a time stamp and upload source code to the hub.
ln$finish()
#> βΉ Creating lockfile /home/runner/.cache/lamindb/environments/run_gFnVLAHLmNQ3h9β¦
#>
#> β Updated metadata database: 2.36 MB in 4 files.
#>
#> βΉ Creating lockfile /home/runner/.cache/lamindb/environments/run_gFnVLAHLmNQ3h9β¦βΉ source packages are missing from RSPM: Could not resolve host: RSPM
#> βΉ Creating lockfile /home/runner/.cache/lamindb/environments/run_gFnVLAHLmNQ3h9β¦βΉ Updating metadata database
#> β Updating metadata database ... done
#>
#> βΉ Creating lockfile /home/runner/.cache/lamindb/environments/run_gFnVLAHLmNQ3h9β¦β Created lockfile /home/runner/.cache/lamindb/environments/run_gFnVLAHLmNQ3h9Nβ¦
#> ! no html report found; to attach one, create an .html export for your .Rmd file and then run: lamin save introduction.Rmd
#> β finished Run('gFnVLAHL') after 47s at 2025-04-07 11:35:00 UTC
Save a notebook report (not needed for .R
scripts)
Save a run report of your notebook (.Rmd
or
.qmd
file) to your instance:
- Render the notebook to HTML
In RStudio, click the βKnitβ button
-
OR From the command line, run:
-
OR Use the
rmarkdown
package in R:rmarkdown::render("introduction.Rmd")
- Save it to your LaminDB instance:
- Using the
lamin_save()
function in R:
lamin_save("introduction.Rmd")
-
OR Using the
lamin
CLI:
Design
See https://docs.lamin.ai/introduction#design for more information on the design of LaminDB.