Skip to content

Commit

Permalink
documentation update
Browse files Browse the repository at this point in the history
  • Loading branch information
sureshhewabi committed Feb 25, 2025
1 parent b873c7f commit 64f6eb9
Show file tree
Hide file tree
Showing 6 changed files with 25 additions and 3 deletions.
Binary file added documentation/docs/assets/stat_analysis.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes.
File renamed without changes.
22 changes: 22 additions & 0 deletions documentation/docs/workflow/overview.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,25 @@
## **Workflow**

### 1. Copy Data
If you are running the pipeline in EBI infrastructure, the pipeline will copy data from the original log file location to your path
Currently, original log files are stored in a place where only `datamover` can be read. So, as the first step, our pipeline will copy(`rsync`) the log files to the location you specified which can be accessed by the `standard` queue.
Once this job is completed, it will automatically launched the next dependant job to process the log files and do the statistical analysis.

!!! note "Running first time"

It could take 2-3 hours to copy the log files for the first time, then it is will be few minutes for the subsequent runs.

### 2. Process Log Files

This step will collect the names of log files, process the log files parallel and apply many filters excluding the unwanted data.
The processed log files will be stored in the Parquet format which is a columnar storage format that is optimized for reading and writing large datasets.

![log_file_parser.png](../assets/log_file_parser.png)

### 3. Produce Statistics Report
Using dask framework, parquet will be queried and the statistics will be generated.
This step will generate the statistics report in the HTML format and will be stored in the location you specified.

![stat_analysis.png](../assets/stat_analysis.png)

Detailed workflow steps can be found in the [workflow documentation](../../misc/workflow).
6 changes: 3 additions & 3 deletions documentation/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ nav:
- report/report-interpretation.md
- report/report-customization.md
- Miscellaneous:
- mics/log-files.md
- mics/workflow.md
- mics/limitations.md
- misc/log-files.md
- misc/workflow.md
- misc/limitations.md
theme:
name: material
features:
Expand Down

0 comments on commit 64f6eb9

Please sign in to comment.