nextflow 基本介紹

Published

2026-03-30

1 基本介紹

Nextflow 是一套用於建立 可重現（reproducible）且可擴展（scalable）資料分析流程的 workflow 管理工具，廣泛應用於生物資訊（bioinformatics）與資料科學領域。

其核心設計採用 資料流導向（dataflow-driven execution），透過 channel 傳遞資料，並由 process 負責資料處理，最終由 workflow 進行流程編排（orchestration）。

2 基本組成

以下是簡單的 nextflow 流程

#!/usr/bin/env nextflow

process sayHello {

    input:
    val greeting

    output:
    path 'output.txt'

    script:
    """
    echo '${greeting}' > output.txt
    """
}

params {
    input: String
}

workflow {

    main:
    sayHello(params.input)

    publish:
    first_output = sayHello.out
}

output {
    first_output {
        path '1-hello'
        mode 'copy'
    }
}

接著我們一一介紹上述流程的細節

2.1 process

在 Nextflow 中，process 是 workflow 的 最基本執行單元，其核心設計遵循 資料流導向（dataflow-driven execution）的架構，主要關注 接收什麼資料、產出什麼結果 與 如何轉換資料。

process sayHello {

    input:
    val greeting

    output:
    path 'output.txt'

    script:
    """
    echo '${greeting}' > output.txt
    """
}

在實務上，process 通常會拆分成可重用模組（module）

首先在模組資料夾中定義 modules/fastp/main.nf:

process FASTP {

    input:
    tuple val(sample_id), path(reads)

    output:
    tuple val(sample_id), path("*.fastq.gz")

    script:
    """
    fastp -i ${reads[0]} -o ${sample_id}_R1.fastq.gz
    """
}

接著在主程式 main.nf 的 workflow 中引入:

include { FASTP } from './modules/fastp/main'

workflow {

    take:
    ch_reads

    main:
    FASTP(ch_reads)

}

2.2 params

params 用於定義 workflow 的輸入參數（pipeline configuration），在新的 nextflow語法中，可以定義型別

params {
    input: String
}

在實務上，params 通常使用 CLI 傳入或是 nextflow.config 帶入

在實際的 pipeline 開發中，params 很少直接寫死在程式內，而是透過外部設定傳入，以提升彈性與可維護性。

透過 CLI 傳入

nextflow run main.nf --input "Hello World"

透過 nextflow.config 設定

params {
    input = "Hello World"
}

👉 常見做法是「config 設預設值 + CLI 覆蓋」，兼顧穩定性與彈性

2.3 workflow

workflow 用於定義 整個 pipeline 的流程控制（orchestration），負責將 各個 process 串接 起來，並管理資料流（dataflow）。

main: 定義 pipeline 的主要執行流程
publish: 定義要對外輸出的資料（export layer）

workflow {

    main:
    sayHello(params.input)

    publish:
    first_output = sayHello.out
}

點我看更複雜的範例

workflow {

    main:
    // create a channel for inputs from a CSV file
    greeting_ch = channel.fromPath(params.input)
                        .splitCsv()
                        .map { line -> line[0] }
    // emit a greeting
    sayHello(greeting_ch)
    // convert the greeting to uppercase
    convertToUpper(sayHello.out)
    // collect all the greetings into one file
    collectGreetings(convertToUpper.out.collect(), params.batch)
    // generate ASCII art of the greetings with cowpy
    cowpy(collectGreetings.out.outfile, params.character)

    publish:
    first_output = sayHello.out
    uppercased = convertToUpper.out
    collected = collectGreetings.out.outfile
    batch_report = collectGreetings.out.report
    cowpy_art = cowpy.out
}

2.4 output

output 用於 定義 pipeline 最終結果的輸出方式（result exporting），將 workflow.publish 中指定的資料，實際寫入指定目錄。

output {
    first_output {
        path '1-hello'
        mode 'copy'
    }
}

3 工程化(進階版)

3.1 profile（環境管理）

profile 用於定義不同的執行環境（例如 local、docker、HPC），讓同一個 pipeline 可以在不同環境中執行。

profiles {

    standard {
        process.executor = 'local'
    }

    docker {
        process.container = 'ubuntu:22.04'
        docker.enabled = true
    }

    slurm {
        process.executor = 'slurm'
        process.queue = 'normal'
    }
}

使用以下指令指定 profile

nextflow run main.nf -profile docker

3.2 config（設定分層）

在實務中，Nextflow 的設定通常會分開管理，而不是全部寫在同一個檔案。

常見的結構如下：

nextflow.config
conf/
├── base.config
├── docker.config
└── slurm.config

nextflow.config：主設定檔
conf/：依環境拆分設定（docker / HPC 等）

接著在 nextflow.config 中載入其他設定：

includeConfig 'conf/base.config'
includeConfig 'conf/docker.config'

👉 通常會搭配 profile 使用，用於切換不同環境設定

3.3 資源管理

Nextflow 可以針對 process 設定 CPU、記憶體與執行時間，以確保 pipeline 穩定運行。

process {
    // 預設資源（所有 process）
    memory = 1.GB
    cpus = 1

    // 指定某個 process（例如 cowpy）
    withName: 'cowpy' {
        memory = 2.GB
        cpus = 2
    }

    // 全域資源上限（避免超出系統限制）
    resourceLimits = [
        memory: 750.GB,
        cpus: 200,
        time: 30.d
    ]
}

process.memory / cpus：設定預設資源
withName：針對特定 process 覆蓋設定
resourceLimits：限制整體資源使用上限

4 常用指令

4.1 修改輸出 log 名稱與路徑

nextflow -log logs/.nextflow.$(date +%Y%m%d_%H%M%S).log \ 
run main.nf

4.2 輸出報告

nextflow run main.nf -with-report report.html

4.3 跳過已完成的步驟

nextflow run main.nf -resume