Skip to content
Snippets Groups Projects
README.md 2.46 KiB
Newer Older
  • Learn to ignore specific revisions
  • TRON Kelly's avatar
    TRON Kelly committed
    # m2reprod
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    ---
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    ## Overview
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    This project conducts a reproducible data analysis using R and containerized environments to ensure consistent package versions and dependencies. We download and process a dataset, manage dependencies with micromamba, and run analysis scripts organized via a Makefile workflow. This README will guide users through setting up, executing, and understanding each component of the analysis pipeline.
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    ## Setup Instructions
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    ### Step 1: Install Micromamba
    To install micromamba, please refer to the "01install.md" documentation, which provides detailed installation and setup instructions.
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    ### Step 2: Data Download and Integrity Check
    The data is sourced from an external URL and checked for integrity using an MD5 checksum to ensure reproducibility. Follow the instructions in `src/download_data.R` to download and verify the data.
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    ### Step 3: Creating the Virtual Image
    Before running the workflow, please refer to the complete documentation in the "01install.md" file for instructions on how to create the virtual image .sif. This ensures that all dependencies are properly encapsulated.
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    ### Step 4: Analysis Pipeline Setup
    The analysis is organized into four sequential scripts (`tp1.R` to `tp4.R`) and managed via a Makefile, ensuring that each step only executes if necessary. The Makefile enforces dependencies, avoiding redundant execution.
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    ### Step 5 : Execute the Workflow
    To run the workflow, please refer to the "02run" documentation, which provides detailed instructions on how to execute the workflow within the micromamba environment :
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    ```
    apptainer exec results/containers/m2bsgreprod3.sif make -f workflows/makefile
    ```
    ### Scripts
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    1. **download_data.R**: Downloads the dataset, verifies the MD5 checksum, and extracts the files if valid.
       
    2. **tp1.R**:
       - Loads required libraries.
       - Reads and processes genotype data.
       - Selects a random subset of 250,000 SNPs for analysis.
       - Outputs processed data to `results/tp1`.
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    3. **tp2.R**:
       - Loads data from the previous script.
       - Initializes additional libraries.
       - Saves the intermediate processed data to `results/tp2`.
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    4. **tp3.R**:
       - Loads data from `tp2`.
       - Performs additional processing and saves to `results/tp3`.
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    5. **tp4.R**:
       - Loads data from `tp3`.
       - Finalizes data processing and saves results to `results/tp4`.
    
    TRON Kelly's avatar
    TRON Kelly committed
       
    
    TRON Kelly's avatar
    TRON Kelly committed
    --- 
    
    TRON Kelly's avatar
    TRON Kelly committed
    
    
    TRON Kelly's avatar
    TRON Kelly committed
    This README provides a comprehensive guide for setting up, executing, and troubleshooting your project, ensuring clarity and reproducibility for each step.