Function parameter description #2

HarryZhang1224 · 2025-02-21T19:44:03Z

Thanks for the amazing tool! It would be great if you can update the function files with a description for each parameter. For example, it is not clear what the parameter mik_graph in sn.pp.prepare_data_batch mean and how should the users choose a value.

ForwardYang98 · 2025-02-22T05:50:21Z

Thanks for your interest in our work.
Sorry to say that since I have been busy with my Ph.D. thesis lately, I expect to update the function files after a while.
For your question:
The parameter 'mik_graph' determines the number of nearest neighbors identified for each sample in the multi-view mutual information maximization (MMIM) module. Subsequently, the MMIM module will boost the similarity of the multi-view joint representations of each sample and its nearest neighbors to guide the model to ultimately generate more useful and discriminative joint representations. A detailed description of this can be found in the “Methods” section of our article. In general, we don't need to change the default value of the parameter 'mik_graph'.

HarryZhang1224 · 2025-02-22T14:47:01Z

Thank you! Is there any recommendations for setting the parameters in Xenium data/Xenium data across multiple samples (much higher cell number than the examples given in the paper, gene panel ~ 500)

ForwardYang98 · 2025-02-23T05:28:04Z

Hi Zhang,
While we have not tested scNiche on Xenium data, our scalability analysis on the mouse whole brain MERFISH dataset (129 slices, about 3.7 million cells) shows that scNiche can effectively scale to large datasets containing multiple samples (see the "Scalability analysis of scNiche to large datasets" section of our article). The following are the parameter settings we used on this dataset for your reference:
k_cutoff = 30, epochs = 25, lr = 0.01, batch_num = 500.
The running time of scNiche is about 3h.

Additionally, based on my personal experience, the following three may be worth noting in practice:

batch number setting. Usually ~5k cells/batch can balance accuracy and computational efficiency;
epoch number setting. For large datasets (e.g., over 1 million cells), starting with a smaller epochs initially is recommended. And you can also evaluate the convergence of the model by visualizing the training information stored in adata.uns['loss'].
Dimensionality reduction and batch effect removal. Considering that the number of genes measured in Xenium data usually far exceeds the number of cell types that exist, dimensionality reduction (scVI, scArches, or PCA) can help balance the dimensionality of features across different views, allowing for more accurate niche identification.

Overall, it may take some time to find the optimal parameter configuration. If you have any results to share, I would be most interested in seeing them!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function parameter description #2

Function parameter description #2

HarryZhang1224 commented Feb 21, 2025

ForwardYang98 commented Feb 22, 2025

HarryZhang1224 commented Feb 22, 2025

ForwardYang98 commented Feb 23, 2025 •

edited

Loading

Function parameter description #2

Function parameter description #2

Comments

HarryZhang1224 commented Feb 21, 2025

ForwardYang98 commented Feb 22, 2025

HarryZhang1224 commented Feb 22, 2025

ForwardYang98 commented Feb 23, 2025 • edited Loading

ForwardYang98 commented Feb 23, 2025 •

edited

Loading