documentation update

sureshhewabi · Feb 25, 2025 · 64f6eb9 · 64f6eb9
1 parent b873c7f
commit 64f6eb9
Show file tree

Hide file tree

Showing 6 changed files with 25 additions and 3 deletions.
diff --git a/documentation/docs/assets/stat_analysis.png b/documentation/docs/assets/stat_analysis.png
diff --git a/documentation/docs/mics/limitations.md → documentation/docs/misc/limitations.md b/documentation/docs/mics/limitations.md → documentation/docs/misc/limitations.md
diff --git a/documentation/docs/mics/log-files.md → documentation/docs/misc/log-files.md b/documentation/docs/mics/log-files.md → documentation/docs/misc/log-files.md
diff --git a/documentation/docs/mics/workflow.md → documentation/docs/misc/workflow.md b/documentation/docs/mics/workflow.md → documentation/docs/misc/workflow.md
diff --git a/documentation/docs/workflow/overview.md b/documentation/docs/workflow/overview.md
@@ -1,3 +1,25 @@
 ## **Workflow**
 
+### 1. Copy Data 
+   If you are running the pipeline in EBI infrastructure, the pipeline will copy data from the original log file location to your path
+   Currently, original log files are stored in a place where only `datamover` can be read. So, as the first step, our pipeline will copy(`rsync`) the log files to the location you specified which can be accessed by the `standard` queue.
+    Once this job is completed, it will automatically launched the next dependant job to process the log files and do the statistical analysis.
+
+!!! note "Running first time"
+
+    It could take 2-3 hours to copy the log files for the first time, then it is will be few minutes for the subsequent runs.
+
+### 2. Process Log Files
+
+   This step will collect the names of log files, process the log files parallel and apply many filters excluding the unwanted data. 
+   The processed log files will be stored in the Parquet format which is a columnar storage format that is optimized for reading and writing large datasets.
+
 ![log_file_parser.png](../assets/log_file_parser.png)
+
+### 3. Produce Statistics Report
+   Using dask framework, parquet will be queried and the statistics will be generated.
+   This step will generate the statistics report in the HTML format and will be stored in the location you specified.
+
+![stat_analysis.png](../assets/stat_analysis.png)
+
+Detailed workflow steps can be found in the [workflow documentation](../../misc/workflow).
diff --git a/documentation/mkdocs.yml b/documentation/mkdocs.yml
@@ -13,9 +13,9 @@ nav:
       - report/report-interpretation.md
       - report/report-customization.md
   - Miscellaneous:
-      - mics/log-files.md
-      - mics/workflow.md
-      - mics/limitations.md
+      - misc/log-files.md
+      - misc/workflow.md
+      - misc/limitations.md
 theme:
   name: material
   features: