This vignette provides a more detailed introduction to the concepts and features of {laminr}. We’ll start with a brief overview of key concepts and then walk through the basic steps to connect to a LaminDB instance and work with its core components.
Key Concepts in LaminDB
Before diving into the practical usage of {laminr},
it’s helpful to understand some core concepts in LaminDB. For a more
detailed explanation, refer to the Architecture vignette
(vignette("architecture", package = "laminr")
).
- Instance: A LaminDB instance is a self-contained environment for storing and managing data and metadata. Think of it like a database or a project directory. Each instance has its own schema, storage location, and metadata database.
- Module: A module is a collection of related registries that provide specific functionality. For example, the core module contains essential registries for general data management, while the bionty module provides registries for biological entities like genes and proteins.
- Registry: A registry is a centralized collection of related records, similar to a table in a database. Each registry holds a specific type of metadata, such as information about artifacts, transforms, or features.
- Record: A record is a single entry within a registry, analogous to a row in a database table. Each record represents a specific entity and combines multiple fields of information.
- Field: A field is a single piece of information within a record, like a column in a database table. For example, an artifact record might have fields for its name, description, and creation date.
Initial setup
Now, let’s set up your environment to use {laminr}.
R setup
- Install the {laminr} package.
install.packages("laminr")
- (Optional) Install suggested dependencies.
install.packages("laminr", dependencies = TRUE)
This includes packages like {anndata} for working with AnnData objects and {s3} for interacting with S3 storage.
Connecting to LaminDB from R
Connect to the laminlabs/cellxgene
instance from your R
session:
The db
object now represents your connection to the
LaminDB instance. You can explore the available registries (like
Artifact
, Collection
, Feature
,
etc.) by simply printing the db
object:
db
#> cellxgene
#> Core registries
#> $Run
#> $User
#> $Param
#> $ULabel
#> $Feature
#> $Storage
#> $Artifact
#> $Transform
#> $Collection
#> $FeatureSet
#> $ParamValue
#> $FeatureValue
#> Additional modules
#> bionty
These registries correspond to Python classes in LaminDB.
To access registries within specific modules, use the $ operator. For example, to access the bionty module:
db$bionty
#> bionty
#> Registries
#> $Gene
#> $Source
#> $Tissue
#> $Disease
#> $Pathway
#> $Protein
#> $CellLine
#> $CellType
#> $Organism
#> $Ethnicity
#> $Phenotype
#> $CellMarker
#> $DevelopmentalStage
#> $ExperimentalFactor
The bionty
and other registries also have corresponding
Python classes.
Working with registries
Let’s use the Artifact
registry as an example. This
registry stores datasets, models, and other data entities.
To see the available functions for the Artifact
registry, print the registry object:
db$Artifact
#> Artifact
#> Simple fields
#> id: AutoField
#> key: CharField
#> uid: CharField
#> hash: CharField
#> size: BigIntegerField
#> type: CharField
#> suffix: CharField
#> version: CharField
#> is_latest: BooleanField
#> n_objects: BigIntegerField
#> created_at: DateTimeField
#> updated_at: DateTimeField
#> visibility: SmallIntegerField
#> description: CharField
#> n_observations: BigIntegerField
#> Relational fields
#> run: Run (many-to-one)
#> storage: Storage (many-to-one)
#> ulabels: ULabel (many-to-many)
#> transform: Transform (many-to-one)
#> created_by: User (many-to-one)
#> collections: Collection (many-to-many)
#> feature_sets: FeatureSet (many-to-many)
#> input_of_runs: Run (many-to-many)
#> Bionty fields
#> genes: bionty$Gene (many-to-many)
#> tissues: bionty$Tissue (many-to-many)
#> diseases: bionty$Disease (many-to-many)
#> pathways: bionty$Pathway (many-to-many)
#> proteins: bionty$Protein (many-to-many)
#> organisms: bionty$Organism (many-to-many)
#> cell_lines: bionty$CellLine (many-to-many)
#> cell_types: bionty$CellType (many-to-many)
#> phenotypes: bionty$Phenotype (many-to-many)
#> ethnicities: bionty$Ethnicity (many-to-many)
#> cell_markers: bionty$CellMarker (many-to-many)
#> developmental_stages: bionty$DevelopmentalStage (many-to-many)
#> experimental_factors: bionty$ExperimentalFactor (many-to-many)
You can also get a data frame summarising the records associated with a registry.
db$Artifact$df(limit = 5)
#> id suffix X_accessor n_objects visibility
#> 1 2846 tiledbsoma 290 1
#> 2 3665 tiledbsoma 330 1
#> 3 1270 .h5ad AnnData NA 1
#> 4 2840 .ipynb <NA> NA 0
#> 5 2842 .html <NA> NA 0
#> key
#> 1 cell-census/2023-12-15/soma
#> 2 cell-census/2024-07-01/soma
#> 3 cell-census/2023-07-25/h5ads/7a0a8891-9a22-4549-a55b-c2aca23c3a2a.h5ad
#> 4 <NA>
#> 5 <NA>
#> uid size hash
#> 1 FYMewVq5twKMDXVy0000 635848093433 Mfyw8VuqftX5REITfQH_yg
#> 2 FYMewVq5twKMDXVy0001 870700998221 bzrXBPNvitSVKvb3GG38_w
#> 3 tczTlSHFPOcAcBnfyxKA 1297573950 UlsVvBz9kMzn2r9RdoAAOg
#> 4 JIIPyQX5l9qELPl42d75 36297 gNdUkonYgQJP_Mi3xLzt_g
#> 5 Whyxwf3k2GjJwTPCl1FK 716529 BDGZac3qU3oLVFpO035Qhg
#> description n_observations is_latest X_hash_type
#> 1 Census 2023-12-15 68683222 FALSE md5-d
#> 2 Census 2024-07-01 115556140 TRUE md5-d
#> 3 Supercluster: Hippocampal CA1-3 74979 FALSE md5-n
#> 4 Source of transform G69jtgzKO0eJ6K79 NA FALSE md5
#> 5 Report of run UAAiLAi0BrLvlKnsuvP3 NA FALSE md5
#> type created_at X_key_is_virtual
#> 1 dataset 2024-07-12T12:12:16.091881+00:00 FALSE
#> 2 dataset 2024-07-16T12:52:01.424629+00:00 FALSE
#> 3 <NA> 2023-11-28T21:46:12.685907+00:00 FALSE
#> 4 <NA> 2024-01-29T08:32:13.311741+00:00 TRUE
#> 5 <NA> 2024-01-29T08:32:18.346499+00:00 TRUE
#> updated_at version
#> 1 2024-09-17T13:00:13.714256+00:00 2023-12-15
#> 2 2024-09-17T13:01:23.739635+00:00 2024-07-01
#> 3 2024-01-24T07:10:21.725547+00:00 2023-07-25
#> 4 2024-01-29T08:32:13.311792+00:00 0
#> 5 2024-01-30T09:12:06.027928+00:00 1
Working with records
You can fetch a specific record from a registry using its ID or UID. For instance, to get the artifact with UID KBW89Mf7IGcekja2hADu:
artifact <- db$Artifact$get("KBW89Mf7IGcekja2hADu")
This artifact contains an AnnData
object with myeloid
cell data. You can view its metadata:
artifact
#> Artifact(uid='KBW89Mf7IGcekja2hADu', description='Myeloid compartment', key='cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad', id=3659, run_id=27, hash='SZ5tB0T4YKfiUuUkAL09ZA', size=691757462, type='dataset', suffix='.h5ad', storage_id=2, version='2024-07-01', _accessor='AnnData', is_latest=TRUE, transform_id=22, _hash_type='md5-n', created_at='2024-07-12T12:34:10.345829+00:00', created_by_id=1, updated_at='2024-07-12T12:40:48.837026+00:00', visibility=1, n_observations=51552, _key_is_virtual=FALSE)
For artifact records, you can get more detailed information:
artifact$describe()
#> Artifact(uid='KBW89Mf7IGcekja2hADu', description='Myeloid compartment', key='cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad', id=3659, run_id=27, hash='SZ5tB0T4YKfiUuUkAL09ZA', size=691757462, type='dataset', suffix='.h5ad', storage_id=2, version='2024-07-01', _accessor='AnnData', is_latest=TRUE, transform_id=22, _hash_type='md5-n', created_at='2024-07-12T12:34:10.345829+00:00', created_by_id=1, updated_at='2024-07-12T12:40:48.837026+00:00', visibility=1, n_observations=51552, _key_is_virtual=FALSE)
#> Provenance
#> $storage = 's3://cellxgene-data-public'
#> $transform = 'Census release 2024-07-01 (LTS)'
#> $run = '2024-07-16T12:49:41.81955+00:00'
#> $created_by = 'sunnyosun'
Access specific fields of the record using the $
operator:
artifact$id
#> [1] 3659
artifact$uid
#> [1] "KBW89Mf7IGcekja2hADu"
artifact$key
#> [1] "cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad"
Some fields of a record contain links to related information.
artifact$storage
#> Storage(uid='oIYGbD74', root='s3://cellxgene-data-public', id=2, type='s3', region='us-west-2', created_at='2023-09-19T13:17:56.273068+00:00', created_by_id=1, updated_at='2023-10-16T15:04:08.998203+00:00')
artifact$developmental_stages
#> RelatedRecords(field_name='developmental_stages', relation_type='many-to-many', related_to='KBW89Mf7IGcekja2hADu')
When those that are one-to-many or many-to-many relationship, a summary of the related information can be retrieved as a data frame.
artifact$developmental_stages$df()
#> id uid abbr name synonyms
#> 1 422 1xebUrrX NA sixth decade human stage NA
#> 2 423 3yuYMeZt NA seventh decade human stage NA
#> 3 424 2EztBuvx NA eighth decade human stage NA
#> created_at updated_at
#> 1 2023-11-28T23:05:31.450102+00:00 2023-11-28T23:05:31.450106+00:00
#> 2 2023-11-28T23:05:31.450123+00:00 2023-11-28T23:05:31.450127+00:00
#> 3 2023-11-28T23:05:31.450144+00:00 2023-11-28T23:05:31.450149+00:00
#> description
#> 1 Human Stage That Refers To An Individual Who Is Over 50 And Under 60 Years Old.
#> 2 Human Stage That Refers To An Individual Who Is Over 60 And Under 70 Years Old.
#> 3 Human Stage That Refers To An Individual Who Is Over 70 And Under 80 Years Old.
#> ontology_id
#> 1 HsapDv:0000240
#> 2 HsapDv:0000241
#> 3 HsapDv:0000242
Finally, for artifact records only, you can download the associated data:
artifact$cache() # Cache the data locally
#> | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |== | 4% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 12% | |========= | 13% | |========= | 14% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================= | 34% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================ | 41% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |============================== | 44% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |=================================== | 50% | |=================================== | 51% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================ | 64% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |========================================================== | 84% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================= | 94% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100%
artifact$load() # Load the data into memory
#> ℹ s3://cellxgene-data-public/cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad already exists at /home/runner/.cache/lamindb/cellxgene-data-public/cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad
#> AnnData object with n_obs × n_vars = 51552 × 36398
#> obs: 'donor_id', 'Predicted_labels_CellTypist', 'Majority_voting_CellTypist', 'Manually_curated_celltype', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'suspension_type', 'tissue_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'
#> var: 'gene_symbols', 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length'
#> uns: 'cell_type_ontology_term_id_colors', 'citation', 'default_embedding', 'schema_reference', 'schema_version', 'sex_ontology_term_id_colors', 'title'
#> obsm: 'X_umap'
Currently, {laminr} primarily supports S3 storage.
Support for other storage backends will be added in the future. For more
information related to planned features and the roadmap, please refer to
the Development vignette
(vignette("development", package = "laminr")
).