ケィオスの時系列解析メモランダム

時系列解析、生体情報解析などをやわらかく語ります

【Cで高速DMA】長いデータでは畳み込みDMAアルゴリズムは遅すぎる:高速アルゴリズムで解決

今回はC言語プログラムを使って、DMA(Detrending Moving Average Analysis)の高速アルゴリズムの処理速度が如何に速いかを見てみます。素朴に Savitzky-Golay フィルタの畳み込みを計算をする DMA アルゴリズムと、我々のグループが提案している高速DMAアルゴリズムでは、圧倒的に高速アルゴリズムが速いのです。畳み込み版と高速版の2つのアルゴリズムを使ったDMAのCプログラムを、最後に掲載しておきますので、みなさんも自分で試してみてください。自分で実行したい人は最初に「Cプログラム」の節を参照してください。
 DFA(detrended fluctuation analysis)では、高次DFAが手軽に実行できるのに、DMAでは無理とあきらめる人が多いと思います。それは私の責任です。かなり前に高次DMAの高速アルゴリズムの論文を出したのですが、そのプログラムの一般公開がうまくできていませんでした。そのことを反省し、このブログで、最新のプログラムを公開します。

サンプル時系列の生成

 数値実験用に 1/f^ \beta 型のパワースペクトルを示すサンプル時系列を生成するRスクリプトを示しておきます。

#########################################################
# 1/f^beta sample path (f=0 set to 0) and save one CSV
#########################################################
DIR   <- r"(出力フォルダを指定)"
N_raw <- 100001
beta  <- 0.8

if (!dir.exists(DIR)) dir.create(DIR, recursive = TRUE)

N <- round(N_raw / 2) * 2 + 1
M <- (N - 1) / 2

# Frequency vector for R's FFT ordering
f <- c((0:M) / N, -(M:1) / N)

# PSD model: |f|^{-beta}, with f=0 forced to 0
PSD <- numeric(N)
idx <- (f != 0)
PSD[idx] <- abs(f[idx])^(-beta)

# Generate one path
wn <- rnorm(N)
x  <- Re(fft(sqrt(PSD) * fft(wn), inverse = TRUE)) / N

# Save
prefix <- sprintf("tms_fbeta_%03d_N%d", round(beta * 100), N)
fn <- file.path(DIR, sprintf("%s.csv", prefix))
write.csv(data.frame(t = seq_len(N), x = x), fn, row.names = FALSE)

cat("Saved:", normalizePath(fn, winslash = "/"), "\n")

このRスクリプトを使う際は、

DIR   <- r"(出力フォルダを指定)"

の部分にデータを出力するフォルダを指定して下さい。必ず、Cプログラムをコンパイルして出力される実行ファイルも同じフォルダに入れてください。その他のパラメタについては、N_raw が時系列の長さ、beta が、スケーリング指数  \beta の値です。Rスクリプトでは、

N_raw <- 100001
beta  <- 0.8

の部分で指定しています。

Windowsのコマンドプロンプトで実行

 Windowsのコマンドプロンプトで実行する場合は、時系列データと実行ファイル sgconvDMA.exesgfastDMA.exe が含まれるフォルダを、キーボードの[shift] を押したまま、マウスの右ボタンをクリックし、「powershellウィンドウをここで開く」を選択してください(下図)。

フォルダ上でシフトを押しながらマウスの右ボタンをクリックすると「powershellウィンドウをここで開く」が選べるようになる。
 その後、下図のように起動したPowerShellのコマンドプロンプトで cmd と入力して Enter を押してください。そうすれば、昔ながらのコマンドプロンプトの状態になります。

PowerShell で cmd を実行すれば、コマンドプロントになる。

■ DMAの実行

 解析したい時系列のファイル名を tms_fbeta_080_N10001.csv として、その中身は、

t, x
1, 2.2428062
2, -1.7382714
3, 1.8949382
4, 1.2974768
..., ...

のようになっているとします。このデータで解析したいのは2列目の x です。
 たとえば、2次のDMAプログラム sgconvDMA.exe でこの時系列を解析する場合は、コマンドプロンプトで

sgconvDMA.exe  -s 1 -c 2 -d 2 tms_fbeta_080_N10001.csv > convDMA_output.csv

とします。このコマンドの -s 1 は1行目を飛ばす指示、-c 2 は2列目を解析する指示、-d 2 は2次のDMAを使う指示です。これらの指示の後に、入力するファイル名(ここでは、 tms_fbeta_080_N10001.csv)を書いてください。さらに、> convDMA_output.csv を付け加えることで、解析結果が convDMA_output.csv に出力されます。> はリダイレクトの記号で、出力先を指定するために使います。
 解析結果は convDMA_output.csv という名前のファイルに収められます。
 また、sgfastDMA.exe で解析する場合は、

sgfastDMA.exe  -s 1 -c 2 -d 2 tms_fbeta_080_N10001.csv > fastDMA_output.csv

とします。命令のオプション指定などは sgconvDMA.exe と全く同じです。解析結果は fastDMA_output.csv という名前のファイルに収められます。

■ 結果のプロット

 解析結果の確認、プロット、スケーリング指数推定については、以下のRスクリプトを実行してみてください。実行すれば下図が出力されます。fastDMA と convDMA の結果を比較すると、長いスケールでズレがありますので、数値誤差の影響がみられます。これは、必ずしも、fastDMA の数値誤差の蓄積が大きいとは言えませんので注意してください。畳み込み演算も長いスケールでは、数値誤差の蓄積が生じます。

Rスクリプトの実行例

# ------------------------------------------------------------
# Compare convDMA vs fastDMA (3 panels)
#  1) overlay
#  2) convDMA + linear fit (middle 80%)
#  3) fastDMA + linear fit (middle 80%)
# Input files: 2 columns (log10(s_eff), log10(F))
# ------------------------------------------------------------

DIR <- r"(D:\Document\研究\DMAプログラム0926\C_rev_2026)"
FN.conv <- "convDMA_output.csv"
FN.fast <- "fastDMA_output.csv"

trim <- 0.10  # fraction trimmed at both ends (fit uses middle 1-2*trim)

# ---- I/O helpers ----
read_scaling <- function(path) {
  dat <- read.csv(path, header = FALSE)
  colnames(dat) <- c("log10_s", "log10_F")
  dat <- dat[is.finite(dat$log10_s) & is.finite(dat$log10_F), ]
  dat[order(dat$log10_s), , drop = FALSE]
}

fit_scaling <- function(dat, trim = 0.10) {
  n <- nrow(dat)
  if (n < 5) stop("Too few rows to fit: ", n)

  i1 <- max(1L, floor(trim * n))
  i2 <- min(n,  ceiling((1 - trim) * n))
  fit_dat <- dat[i1:i2, , drop = FALSE]

  fit <- lm(log10_F ~ log10_s, data = fit_dat)

  list(
    fit = fit,
    fit_dat = fit_dat,
    i1 = i1, i2 = i2,
    slope = unname(coef(fit)[["log10_s"]]),
    r2 = summary(fit)$r.squared
  )
}

plot_with_fit <- function(dat, res, main, col_fit, pch_all = 16, pch_fit = 16) {
  plot(dat$log10_s, dat$log10_F,
       pch = pch_all, cex = 0.9, col = gray(0.7),
       xlab = expression(log[10](s[eff])),
       ylab = expression(log[10](F(s))),
       main = main)

  points(res$fit_dat$log10_s, res$fit_dat$log10_F, pch = pch_fit, cex = 1.0, col = col_fit)

  xr <- range(res$fit_dat$log10_s)
  pred <- predict(res$fit, newdata = data.frame(log10_s = xr))
  lines(xr, pred, lty = 2, lwd = 2)

  legend("topleft",
         legend = c(sprintf("slope = %.4f", res$slope),
                    sprintf("R^2 = %.4f", res$r2),
                    sprintf("rows %d-%d", res$i1, res$i2)),
         bty = "n")
}

# ---- Load ----
path.conv <- file.path(DIR, FN.conv)
path.fast <- file.path(DIR, FN.fast)
stopifnot(dir.exists(DIR), file.exists(path.conv), file.exists(path.fast))

dat.conv <- read_scaling(path.conv)
dat.fast <- read_scaling(path.fast)

res.conv <- fit_scaling(dat.conv, trim = trim)
res.fast <- fit_scaling(dat.fast, trim = trim)

# ---- Plot (3 panels) ----
op <- par(no.readonly = TRUE)
on.exit(par(op), add = TRUE)
par(mfrow = c(1, 3), mar = c(4.5, 4.5, 2.2, 1), cex = 0.85)

# 1) overlay
plot(dat.conv$log10_s, dat.conv$log10_F,
     pch = 16, cex = 0.9, col = 4,
     xlab = expression(log[10](s[eff])),
     ylab = expression(log[10](F(s))),
     main = "Overlay: convDMA vs fastDMA")
points(dat.fast$log10_s, dat.fast$log10_F, pch = 1, cex = 1.1, col = 2)
legend("topleft",
       legend = c(sprintf("convDMA (slope=%.3f)", res.conv$slope),
                  sprintf("fastDMA (slope=%.3f)", res.fast$slope)),
       pch = c(16, 1), col = c(4, 2), bty = "n")

# 2) convDMA + fit
plot_with_fit(dat.conv, res.conv, main = "convDMA: fit (middle 80%)", col_fit = 4)

# 3) fastDMA + fit
plot_with_fit(dat.fast, res.fast, main = "fastDMA: fit (middle 80%)", col_fit = 2)

# ---- Console summary ----
cat("convDMA : slope =", res.conv$slope, " R^2 =", res.conv$r2, "\n")
cat("fastDMA : slope =", res.fast$slope, " R^2 =", res.fast$r2, "\n")
cat("delta(slope) =", res.fast$slope - res.conv$slope, "\n")

実行時間の比較:高速アルゴリズムの威力

 今回示したプログラムでは、計算時間も出力されます。その出力結果を使って convDMAfastDMA の計算時間を比較してみます。以下で解析に使ったPCは、Intel(R) Core(TM) i9-10900 CPUのマシンで、64GBのメモリを積んでいます。
 N_raw <- 100001\approx 10^ 5)の長さの時系列の解析結果は、convDMA では、

Total data length = 100001
***** centered SG-DMA (even poly order d=2; demean+integrate (profile)) *****
Processing time: 22131.000 [ms]

となり、処理に 22131ミリ秒 = 約22秒かかっています。これに対し、fastDMA では、

Total data length = 100001
***** centered DMA (d=2) *****
Processing time: 145.000 [ms]

となり、処理に 145ミリ秒 = 0.145秒かかってます。これは、1秒以下ですので一瞬で処理が終わっています。convDMA と比べれば、150倍程度早くなっています。
 これで、高速アルゴリズムの速さをわかってもらえると思いますが、さらに、N_raw <- 1000001\approx 10^ 6)の長さの時系列を解析してみると、convDMA では、

Total data length = 1000001
***** centered SG-DMA (even poly order d=2; demean+integrate (profile)) *****
Processing time: 2092115.000 [ms]

となり、処理に 2092115ミリ秒 = 約35分かかりました。これに対し、fastDMA では、

Total data length = 1000001
***** centered DMA (d=2) *****
Processing time: 11174.000 [ms]

となり、処理に 11174ミリ秒 = 約11秒かかりました。つまり、convDMA と比べて fastDMA は、187倍速かったということで。
 時系列が長くなるほど、convDMAfastDMA の処理時間の差はどんどん開いていきます。なぜなら、時系列の長さ  N に対して、convDMA の処理時間が  N^ 2 に比例するのに対し、fastDMA の処理時間は  N に比例するからです。一般に、アルゴリズムの高速化とは、計算複雑性、すなわち処理に必要な計算ステップ数を削減することに本質があります。我々が開発した高速アルゴリズムは、DMA の計算複雑性を劇的に低減しており、その結果として処理時間が大幅に短縮されています。

Cプログラム

 以下のコードをコピーしてファイルに保存してください。ファイル名は sgconvDMA.csgfastDMA.c とします。保存したファイルのコンパイルに gcc を使う場合は、コマンドラインで、

gcc sgconvDMA.c -lm -o sgconvDMA
gcc sgfastDMA.c -lm -o sgfastDMA

を実行してください。これだけでコンパイルできます。より高速で、安全にコンパイルするための推奨コマンドは、

gcc -O3 -std=c99 -Wall -Wextra -pedantic sgconvDMA.c -lm -o sgconvDMA
gcc -O3 -std=c99 -Wall -Wextra -pedantic sgfastDMA.c -lm -o sgfastDMA

です。

 Windows 環境では、Visual Studio に付属する Visual C++ コンパイラ(cl.exe) を使用できます。Developer Command Prompt for Visual Studio(または Developer PowerShell)を起動して実行してください。最小コマンドで、とにかく動かしたい場合は、コマンドラインで、

cl sgconvDMA.c
cl sgfastDMA.c

を実行してください。これだけで、コンパイルできて、

sgconvDMA.exe
sgfastDMA.exe

が生成されます。より高速で、安全にコンパイルするための推奨コマンドは、

cl /O2 /std:c11 /W4 sgconvDMA.c /Fe:sgconvDMA.exe
cl /O2 /std:c11 /W4 sgfastDMA.c /Fe:sgfastDMA.exe

です。

■ 素朴なDMAアルゴリズム sgconvDMA.c

 畳み込みを使うDMAアルゴルズムのCプログラムは以下です。

/*
   SG-DMA (Savitzky–Golay style detrended moving-average scaling analysis)

   This program implements centered SG-DMA with selectable even-order
   polynomial detrending (d = 0, 2, 4, ...), and outputs the scaling
   relationship between scale s and fluctuation F(s).

   Features:
   - Same input/output format and command-line interface as the reference implementation
   - Optional de-meaning and integration to form the profile series
   - Centered-window detrending with polynomial least-squares projection
   - R-like edge handling:
       * edges are skipped for long series (N ≥ 1024)
       * edges are explicitly evaluated for shorter series
   - MSVC-compatible implementation (no variable-length arrays)

   Numerical behavior of the core convolution (residual computation)
   is identical to the reference algorithm.

   Output:
     CSV with two columns:
       log10(s_eff), log10(F(s))
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>
#include <errno.h>

/* =========================
   Global state
   ========================= */

static const char *g_progname = "sgdma";

/* raw input and working series */
static double *g_x = NULL; /* raw data (selected column) */
static double *g_y = NULL; /* integrated profile (default) or raw (if -i) */

/* scales (odd) */
static long *g_scales = NULL;
static long  g_scales_len = 0;

/* options */
static int  g_without_integration = 0;     /* -i */
static int  g_asymmetric_ends = 0;         /* -a (compat) */
static long g_refresh_interval = 0;        /* -r (compat) */

static int  g_order = 0;                   /* -d: any even >= 0 */
static int  g_use_scale_correction = 1;    /* -S */
static double g_scale_correction_value = 1.0;

/* numerics */
static const double g_pivot_tol = 1e-12;
static const double g_lambda0 = 1e-12;
static const double g_lambda_mult = 10.0;
static const int    g_lambda_tries = 6;
static const long   g_skip_edges_if_N_ge = 1024;

/* =========================
   Error / allocation helpers
   ========================= */

static void die(const char *msg)
{
    fprintf(stderr, "%s: %s\n", g_progname, msg);
    exit(EXIT_FAILURE);
}

static void *xmalloc(size_t nbytes)
{
    void *p = malloc(nbytes);
    if (!p) die("Out of memory (malloc).");
    return p;
}

static void *xcalloc(size_t n, size_t sz)
{
    void *p = calloc(n, sz);
    if (!p) die("Out of memory (calloc).");
    return p;
}

static void *xrealloc(void *p, size_t nbytes)
{
    void *q = realloc(p, nbytes);
    if (!q) die("Out of memory (realloc).");
    return q;
}

/* =========================
   MSVC-safe fopen wrapper
   ========================= */
static FILE *xfopen_read(const char *filename)
{
#ifdef _MSC_VER
    FILE *fp = NULL;
    if (fopen_s(&fp, filename, "r") != 0) return NULL;
    return fp;
#else
    return fopen(filename, "r");
#endif
}

/* =========================
   Usage
   ========================= */

static void print_usage(void)
{
    fprintf(stderr,
        "usage: %s [OPTIONS ...] inputfile\n"
        "OPTIONS:\n"
        "  -d ORDER     detrending polynomial order: any even >= 0 (default: 0)\n"
        "  -S FLAG      scale correction: 1=use s/C(d) (default), 0=use s\n"
        "  -s SKIP      skip first SKIP lines (default: 0)\n"
        "  -c COLUMN    select column index (1-based) (default: 1)\n"
        "  -l MINBOX    minimum box width (odd enforced; default: d+5)\n"
        "  -u MAXBOX    maximum box width (default: N/4)\n"
        "  -a           kept for CLI compatibility (SG-DMA core uses its own edge policy)\n"
        "  -i           without integration (default: with integration)\n"
        "  -r REFRESH   kept for CLI compatibility (not used in SG-DMA core)\n"
        "  -h           print this help\n"
        "\n"
        "Scale correction constants implemented: C(0)=1.00, C(2)=1.93, C(4)=2.74.\n"
        "For other even d, C(d) defaults to 1.00.\n"
        "Output (CSV): log10(s_eff), log10(F(s))\n",
        g_progname
    );
}

/* =========================
   Exact reproduction of original rscale()
   ========================= */

static long generate_scales_exact(long minbox, long maxbox, double boxratio, long **out_scales)
{
    if ((minbox % 2) == 0) minbox += 1;
    if (maxbox < minbox) maxbox = minbox;

    double ratio = (double)maxbox / (double)minbox;
    if (ratio <= 0.0) die("Invalid scale bounds.");

    long rslen = (long)(log10(ratio) / log10(boxratio) + 1.0);
    if (rslen < 1) rslen = 1;

    long *rs = (long *)xmalloc((size_t)rslen * sizeof(long));
    long n = 1;

    rs[0] = minbox;

    for (long ir = 1; (n <= rslen) && (rs[n - 1] < maxbox); ir++) {
        long rw = (long)(minbox * pow(boxratio, (double)ir) + 0.5);
        if (rw > rs[n - 1]) {
            long odd = (rw / 2) * 2 + 1;
            if (n < rslen) rs[n++] = odd;
            else break;
        }
    }

    if (n >= 1 && rs[n - 1] > maxbox) n--;
    if (n < 1) n = 1;

    *out_scales = rs;
    return n;
}

/* =========================
   Fast line tokenizer (in-place)
   ========================= */

static int parse_column_inplace(const char *line, int selcol, double *out_value)
{
    const unsigned char *p = (const unsigned char *)line;
    int col = 0;

    while (*p) {
        while (*p == (unsigned char)' ' || *p == (unsigned char)'\t' || *p == (unsigned char)',') p++;
        if (*p == '\0' || *p == '\n' || *p == '\r') break;

        col++;
        if (col == selcol) {
            char *endptr = NULL;
            errno = 0;
            double v = strtod((const char *)p, &endptr);

            if (endptr == (const char *)p) return 0;
            if (errno != 0) return 0;

            *out_value = v;
            return 1;
        }

        while (*p && *p != (unsigned char)' ' && *p != (unsigned char)'\t' &&
               *p != (unsigned char)',' && *p != (unsigned char)'\n' && *p != (unsigned char)'\r') {
            p++;
        }
    }
    return 0;
}

/* =========================
   Load series
   ========================= */

static long load_series_fast(const char *filename, int skip_lines, int selcol)
{
    FILE *fp = xfopen_read(filename);
    if (!fp) {
        fprintf(stderr, "%s: cannot open file '%s'\n", g_progname, filename);
        return 0;
    }

    /* Large stdio buffer */
    {
        static char io_buf[1 << 20];
        setvbuf(fp, io_buf, _IOFBF, sizeof(io_buf));
    }

    static char line_buf[1 << 20];

    for (int i = 0; i < skip_lines; i++) {
        if (!fgets(line_buf, (int)sizeof(line_buf), fp)) break;
    }

    long cap = 10000;
    long n = 0;
    double *tmp = (double *)xmalloc((size_t)cap * sizeof(double));

    while (fgets(line_buf, (int)sizeof(line_buf), fp)) {
        double v;
        if (!parse_column_inplace(line_buf, selcol, &v)) continue;

        if (n >= cap) {
            cap += 10000;
            tmp = (double *)xrealloc(tmp, (size_t)cap * sizeof(double));
        }
        tmp[n++] = v;
    }

    fclose(fp);

    if (n <= 0) {
        free(tmp);
        fprintf(stderr, "%s: no numeric data read from '%s'\n", g_progname, filename);
        return 0;
    }

    g_x = (double *)xmalloc((size_t)n * sizeof(double));
    memcpy(g_x, tmp, (size_t)n * sizeof(double));
    free(tmp);

    g_y = (double *)xmalloc((size_t)n * sizeof(double));

    if (!g_without_integration) {
        double mean = 0.0;
        for (long i = 0; i < n; i++) mean += g_x[i];
        mean /= (double)n;

        g_y[0] = g_x[0] - mean;
        for (long i = 1; i < n; i++) g_y[i] = g_y[i - 1] + (g_x[i] - mean);
    } else {
        for (long i = 0; i < n; i++) g_y[i] = g_x[i];
    }

    return n;
}

/* =========================
   Linear solver (unchanged)
   ========================= */

static int gauss_solve_inplace(double *A, double *b, double *x, int m, double pivot_tol)
{
    for (int k = 0; k < m; k++) {
        int piv = k;
        double amax = fabs(A[k*m + k]);
        for (int i = k + 1; i < m; i++) {
            double v = fabs(A[i*m + k]);
            if (v > amax) { amax = v; piv = i; }
        }
        if (amax < pivot_tol) return 0;

        if (piv != k) {
            for (int j = k; j < m; j++) {
                double tmp = A[k*m + j];
                A[k*m + j] = A[piv*m + j];
                A[piv*m + j] = tmp;
            }
            double tb = b[k];
            b[k] = b[piv];
            b[piv] = tb;
        }

        double Akk = A[k*m + k];
        for (int i = k + 1; i < m; i++) {
            double f = A[i*m + k] / Akk;
            A[i*m + k] = 0.0;
            for (int j = k + 1; j < m; j++) A[i*m + j] -= f * A[k*m + j];
            b[i] -= f * b[k];
        }
    }

    for (int i = m - 1; i >= 0; i--) {
        double s = b[i];
        for (int j = i + 1; j < m; j++) s -= A[i*m + j] * x[j];
        double aii = A[i*m + i];
        if (fabs(aii) < pivot_tol) return 0;
        x[i] = s / aii;
    }

    return 1;
}

static int solve_with_ridge(const double *G, const double *b, double *v, int m)
{
    double lambda = 0.0;

    for (int trial = 0; trial <= g_lambda_tries; trial++) {
        double *A  = (double *)xmalloc((size_t)m * (size_t)m * sizeof(double));
        double *bb = (double *)xmalloc((size_t)m * sizeof(double));

        memcpy(A,  G, (size_t)m * (size_t)m * sizeof(double));
        memcpy(bb, b, (size_t)m * sizeof(double));

        if (lambda > 0.0) {
            for (int i = 0; i < m; i++) A[i*m + i] += lambda;
        }

        int ok = gauss_solve_inplace(A, bb, v, m, g_pivot_tol);

        free(A);
        free(bb);

        if (ok) {
            for (int i = 0; i < m; i++) if (!isfinite(v[i])) return 0;
            return 1;
        }

        if (trial == 0) lambda = g_lambda0;
        else lambda *= g_lambda_mult;
    }

    return 0;
}

/* =========================
   Build centered residual weights r (MSVC-safe, no VLA)
   ========================= */

static int build_center_r_weights(long M, int d, double *r_out)
{
    const long nwin = 2 * M + 1;
    const int  m = d + 1;
    if (nwin <= d) return 0;

    double *b = (double *)xcalloc((size_t)m, sizeof(double));
    b[0] = 1.0;

    double *G = (double *)xcalloc((size_t)m * (size_t)m, sizeof(double));

    /* allocate powj once, reuse */
    double *powj = (double *)xmalloc((size_t)m * sizeof(double));

    for (long idx = 0; idx < nwin; idx++) {
        double j = (double)(idx - M);

        powj[0] = 1.0;
        for (int k = 1; k < m; k++) powj[k] = powj[k - 1] * j;

        for (int a = 0; a < m; a++) {
            const double pa = powj[a];
            for (int c = 0; c < m; c++) {
                G[a*m + c] += pa * powj[c];
            }
        }
    }

    double *v = (double *)xmalloc((size_t)m * sizeof(double));
    if (!solve_with_ridge(G, b, v, m)) {
        free(powj); free(G); free(b); free(v);
        return 0;
    }

    for (long idx = 0; idx < nwin; idx++) {
        double j = (double)(idx - M);
        double s = 0.0;
        double pj = 1.0;
        for (int k = 0; k < m; k++) {
            s += v[k] * pj;
            pj *= j;
        }
        r_out[idx] = -s;
    }
    r_out[M] += 1.0;

    free(powj);
    free(G);
    free(b);
    free(v);
    return 1;
}

/* =========================
   Build edge residual weights r (MSVC-safe, no VLA)
   ========================= */

static int build_edge_r_weights(long T, int d, long t0, double *r_out)
{
    const int m = d + 1;
    if (T <= d) return 0;
    if (t0 < 0 || t0 >= T) return 0;

    double *b = (double *)xmalloc((size_t)m * sizeof(double));
    double jt = (double)(t0 + 1);
    b[0] = 1.0;
    for (int k = 1; k < m; k++) b[k] = b[k - 1] * jt;

    double *G = (double *)xcalloc((size_t)m * (size_t)m, sizeof(double));

    /* allocate powj once, reuse */
    double *powj = (double *)xmalloc((size_t)m * sizeof(double));

    for (long idx = 0; idx < T; idx++) {
        double j = (double)(idx + 1);

        powj[0] = 1.0;
        for (int k = 1; k < m; k++) powj[k] = powj[k - 1] * j;

        for (int a = 0; a < m; a++) {
            const double pa = powj[a];
            for (int c = 0; c < m; c++) {
                G[a*m + c] += pa * powj[c];
            }
        }
    }

    double *v = (double *)xmalloc((size_t)m * sizeof(double));
    if (!solve_with_ridge(G, b, v, m)) {
        free(powj); free(G); free(b); free(v);
        return 0;
    }

    for (long idx = 0; idx < T; idx++) {
        double j = (double)(idx + 1);
        double s = 0.0;
        double pj = 1.0;
        for (int k = 0; k < m; k++) {
            s += v[k] * pj;
            pj *= j;
        }
        r_out[idx] = -s;
    }
    r_out[t0] += 1.0;

    free(powj);
    free(G);
    free(b);
    free(v);
    return 1;
}

/* =========================
   SG-DMA core: estimate F2
   (keep center convolution intact; safe optimizations only)
   ========================= */

static double estimate_f2_sg(long N, long scale, int d)
{
    if (scale > N) return NAN;
    if ((scale % 2) == 0) return NAN;

    const long M = (scale - 1) / 2;
    const long nwin = 2 * M + 1;
    if (N < nwin) return NAN;
    if (nwin <= d) return NAN;

    double *r = (double *)xmalloc((size_t)nwin * sizeof(double));
    if (!build_center_r_weights(M, d, r)) {
        free(r);
        return NAN;
    }

    double sumsq = 0.0;
    long count = 0;

    /* center region: unchanged convolution */
    for (long i = M; i <= N - M - 1; i++) {
        double acc = 0.0;
        const long base = i - M;
        for (long j = 0; j < nwin; j++) acc += r[j] * g_y[base + j];
        sumsq += acc * acc;
        count++;
    }

    if (N >= g_skip_edges_if_N_ge) {
        free(r);
        if (count <= 0) return NAN;
        return sumsq / (double)count;
    }

    /* edges: reuse buffer; avoid yrev allocation */
    if (M > 0) {
        const long max_left = (M < N ? M : N);

        long maxT = M + max_left;
        if (maxT > N) maxT = N;

        double *re = (double *)xmalloc((size_t)maxT * sizeof(double));

        /* left edge */
        for (long n0 = 0; n0 < max_left; n0++) {
            long T = n0 + M + 1;
            if (T > N) T = N;

            if (build_edge_r_weights(T, d, n0, re)) {
                double acc = 0.0;
                for (long j = 0; j < T; j++) acc += re[j] * g_y[j];
                sumsq += acc * acc;
                count++;
            }
        }

        /* right edge via reversed indexing */
        for (long n0 = 0; n0 < max_left; n0++) {
            long T = n0 + M + 1;
            if (T > N) T = N;

            if (build_edge_r_weights(T, d, n0, re)) {
                double acc = 0.0;
                for (long j = 0; j < T; j++) acc += re[j] * g_y[N - 1 - j];
                sumsq += acc * acc;
                count++;
            }
        }

        free(re);
    }

    free(r);
    if (count <= 0) return NAN;
    return sumsq / (double)count;
}

/* =========================
   Dispatch: estimate F2 for a given scale
   ========================= */

static double estimate_f2(long n, long scale)
{
    if (scale > n) die("Scale exceeds series length.");
    if ((scale % 2) == 0) die("Scale must be odd.");
    if (g_order < 0 || (g_order % 2) != 0) die("ORDER must be an even integer >= 0.");

    long min_required = (long)g_order + 5;
    if (scale < min_required) die("Scale too small for given ORDER (need scale >= d+5).");

    double f2 = estimate_f2_sg(n, scale, g_order);
    if (!isfinite(f2) || f2 <= 0.0) return NAN;
    return f2;
}

/* =========================
   Defaults by order
   ========================= */

static long default_minbox_for_order(int d)
{
    return (long)d + 5;
}

static long default_refresh_for_order(int d)
{
    (void)d;
    return 0;
}

static double default_scale_correction_for_order(int d)
{
    if (d == 0) return 1.0;
    if (d == 2) return 1.93;
    if (d == 4) return 2.74;
    return 1.0;
}

/* =========================
   Main
   ========================= */

int main(int argc, char **argv)
{
    g_progname = (argc > 0 && argv[0]) ? argv[0] : "sgdma";

    int  skip_lines = 0;
    int  selcol = 1;
    long minbox = -1;
    long maxbox = 0;
    const double boxratio = pow(2.0, 1.0/8.0);

    g_order = 0;
    g_use_scale_correction = 1;

    if (argc < 2) {
        print_usage();
        return EXIT_FAILURE;
    }

    int argi = 1;
    while (argi < argc && argv[argi][0] == '-') {
        const char *opt = argv[argi];

        if (strcmp(opt, "-h") == 0) {
            print_usage();
            return EXIT_SUCCESS;

        } else if (strcmp(opt, "-d") == 0) {
            if (++argi >= argc) die("Missing value after -d.");
            g_order = atoi(argv[argi]);
            if (g_order < 0 || (g_order % 2) != 0) die("ORDER must be an even integer >= 0.");

        } else if (strcmp(opt, "-S") == 0) {
            if (++argi >= argc) die("Missing value after -S.");
            g_use_scale_correction = (atoi(argv[argi]) != 0);

        } else if (strcmp(opt, "-s") == 0) {
            if (++argi >= argc) die("Missing value after -s.");
            skip_lines = atoi(argv[argi]);

        } else if (strcmp(opt, "-c") == 0) {
            if (++argi >= argc) die("Missing value after -c.");
            selcol = atoi(argv[argi]);
            if (selcol < 1) die("COLUMN must be >= 1.");

        } else if (strcmp(opt, "-l") == 0) {
            if (++argi >= argc) die("Missing value after -l.");
            minbox = atol(argv[argi]);

        } else if (strcmp(opt, "-u") == 0) {
            if (++argi >= argc) die("Missing value after -u.");
            maxbox = atol(argv[argi]);

        } else if (strcmp(opt, "-a") == 0) {
            g_asymmetric_ends = 1;

        } else if (strcmp(opt, "-i") == 0) {
            g_without_integration = 1;

        } else if (strcmp(opt, "-r") == 0) {
            if (++argi >= argc) die("Missing value after -r.");
            g_refresh_interval = atol(argv[argi]);
            if (g_refresh_interval < 0) g_refresh_interval = 0;

        } else {
            print_usage();
            die("Unknown option.");
        }

        argi++;
    }

    if (argi >= argc) {
        print_usage();
        die("Missing input filename.");
    }

    const char *filename = argv[argi];

    if (minbox < 0) minbox = default_minbox_for_order(g_order);
    g_refresh_interval = default_refresh_for_order(g_order);
    g_scale_correction_value = default_scale_correction_for_order(g_order);

    if (g_asymmetric_ends) {
        fprintf(stderr, "%s: note: -a is accepted, but SG-DMA core uses its own edge policy (R-like).\n", g_progname);
    }
    if (g_refresh_interval > 0) {
        fprintf(stderr, "%s: note: -r is accepted, but SG-DMA core does not use rolling-sum refresh.\n", g_progname);
    }
    if (g_use_scale_correction && !(g_order == 0 || g_order == 2 || g_order == 4)) {
        fprintf(stderr, "%s: note: C(d) is defined only for d=0,2,4. For d=%d, using C(d)=1.0.\n",
                g_progname, g_order);
    }

    long n = load_series_fast(filename, skip_lines, selcol);
    if (n <= 0) return EXIT_FAILURE;

    fprintf(stderr, "Input file: %s\n", filename);
    fprintf(stderr, "Total data length = %ld\n", n);
    fprintf(stderr, "***** centered SG-DMA (even poly order d=%d; %s) *****\n",
            g_order,
            (g_without_integration ? "no integration (-i)" : "demean+integrate (profile)"));

    minbox = (long)(minbox / 2) * 2 + 1;
    long min_required = default_minbox_for_order(g_order);
    if (minbox < min_required) minbox = min_required;
    if ((minbox % 2) == 0) minbox += 1;

    if (maxbox > n / 4 || maxbox <= 0 || minbox > maxbox) {
        maxbox = (long)n / 4;
    }
    if (maxbox < minbox) maxbox = minbox;

    g_scales_len = generate_scales_exact(minbox, maxbox, boxratio, &g_scales);
    if (g_scales_len <= 0) die("Failed to generate scales.");

    double *F2 = (double *)xmalloc((size_t)g_scales_len * sizeof(double));

    clock_t t0 = clock();

    for (long i = 0; i < g_scales_len; i++) {
        F2[i] = estimate_f2(n, g_scales[i]);
        if (!isfinite(F2[i]) || F2[i] <= 0.0) {
            die("Non-finite or non-positive F2 encountered (check data / parameters).");
        }
    }

    clock_t t1 = clock();
    double ms = 1000.0 * (double)(t1 - t0) / (double)CLOCKS_PER_SEC;
    fprintf(stderr, "Processing time: %.3f [ms]\n", ms);

    for (long i = 0; i < g_scales_len; i++) {
        double s = (double)g_scales[i];
        double s_eff = (g_use_scale_correction ? (s / g_scale_correction_value) : s);

        double log10s = log10(s_eff);
        double log10F = 0.5 * log10(F2[i]);

        fprintf(stdout, "%.15g,%.15g\n", log10s, log10F);
    }

    free(g_x);
    free(g_y);
    free(g_scales);
    free(F2);

    return EXIT_SUCCESS;
}

■ 高速DMAアルゴリズム sgfastDMA.c

 我々が開発した高速アルゴリズムのCプログラムは以下です。

/*
   sgdma.c
   Unified Savitzky-Golay-style detrended moving-average scaling analysis
   (centered DMA; detrending orders d = 0, 2, 4)

   Original algorithms:
     DMA0 / DMA4: Ken Kiyono, 26 Sep. 2015
     DMA2:         Ken Kiyono, 20 Nov. 2016
     Modified by:  Ken Kiyono, 27 Jul. 2024
     Public release update: 17 Jan. 2026

   Notes:
   - Some memory-handling / implementation style was inspired by "Numerical Recipes in C"
     (Press, Teukolsky, Vetterling, Flannery).
   - This implementation uses fast rolling-moment updates (O(N) per scale).
   - Optional periodic "refresh" recomputation mitigates floating-point drift.

   Citation request:
     If you use this code (or a modified version) in published research, please cite:
       Tsujimoto, Y., Miki, S., Shimatani, and K. Kiyono,
       “Fast algorithm for scaling analysis with higher-order detrending moving average method,”
       Physical Review E 93, 053304 (2016).

   License:
     CC BY 4.0 (Creative Commons Attribution 4.0 International).
     You may use/copy/redistribute/modify for any purpose, including commercial use,
     provided proper credit is given and changes are indicated.

   Output (CSV):
     Three columns:
       log10(s_eff), log10(F(s)), Emax
     where:
       s_eff = s / C(d)  if scale correction enabled (-S 1, default)
       s_eff = s         if scale correction disabled (-S 0)
     and Emax is a diagnostic: maximum absolute difference (in trend estimate units)
     between rolling-update and refresh-recomputed trend (when refresh is enabled).

   Default correction constants:
       C(0)=1.00, C(2)=1.93, C(4)=2.74

   Options:
     -d ORDER     detrending order: 0, 2, or 4 (default: 0)
     -S FLAG      scale correction: 1=use s/C(d) (default), 0=use s
     -s SKIP      skip first SKIP lines (default: 0)
     -c COLUMN    select 1-based column index (default: 1)
     -l MINBOX    minimum box width (odd enforced; default depends on d)
     -u MAXBOX    maximum box width (default: N/4)
     -a           asymmetric ends (supported for d=0,2; ignored for d=4)
     -i           disable integration (default: integrate demeaned series)
     -r REFRESH   refresh interval (0 disables refresh; default depends on d)
     -h           help

   Important:
   - The legacy "single-scale trend output mode" is intentionally removed.
*/

#include 
#include 
#include 
#include 
#include 
#include 

/* -------------------------
   Small utilities
   ------------------------- */

static const char *g_progname = "sgdma";

static void die(const char *msg)
{
    fprintf(stderr, "%s: %s\n", g_progname, msg);
    exit(EXIT_FAILURE);
}

static void *xmalloc(size_t nbytes)
{
    void *p = malloc(nbytes);
    if (!p) die("Out of memory (malloc).");
    return p;
}

static void *xrealloc(void *p, size_t nbytes)
{
    void *q = realloc(p, nbytes);
    if (!q) die("Out of memory (realloc).");
    return q;
}

/* Force odd >= 1 */
static long force_odd(long x)
{
    if (x < 1) x = 1;
    return (x / 2) * 2 + 1;
}

/* Order-dependent defaults */
static long default_minbox_for_order(int d)
{
    if (d == 0) return 5;
    if (d == 2) return 7;
    return 9; /* d==4 */
}

static long default_refresh_for_order(int d)
{
    /* chosen to balance drift suppression and overhead */
    if (d == 0) return 2000;
    if (d == 2) return 200;
    return 20; /* d==4 */
}

static double default_scale_correction_for_order(int d)
{
    if (d == 0) return 1.0;
    if (d == 2) return 1.93;
    return 2.74; /* d==4 */
}

/* -------------------------
   CLI / usage
   ------------------------- */

static void print_usage(void)
{
    fprintf(stderr,
        "usage: %s [OPTIONS ...] inputfile\n"
        "OPTIONS:\n"
        "  -d ORDER     detrending order: 0, 2, or 4 (default: 0)\n"
        "  -S FLAG      scale correction: 1=use s/C(d) (default), 0=use s\n"
        "  -s SKIP      skip first SKIP lines (default: 0)\n"
        "  -c COLUMN    select column index (1-based) (default: 1)\n"
        "  -l MINBOX    minimum box width (odd enforced; default depends on d)\n"
        "  -u MAXBOX    maximum box width (default: N/4)\n"
        "  -a           asymmetric ends (supported for d=0,2; ignored for d=4)\n"
        "  -i           disable integration (default: integrate demeaned series)\n"
        "  -r REFRESH   refresh interval (0 disables refresh; default depends on d)\n"
        "  -h           print this help\n"
        "\n"
        "Default correction constants C(d): C(0)=1.00, C(2)=1.93, C(4)=2.74\n"
        "Output (CSV): log10(s_eff), log10(F(s)), Emax\n",
        g_progname
    );
}

/* -------------------------
   Scale generator (exact behavior preserved)
   ------------------------- */

static long generate_scales_exact(long minbox, long maxbox, double boxratio, long **out_scales)
{
    /* minbox is assumed odd and >= required minimum */
    minbox = force_odd(minbox);
    if (maxbox < minbox) maxbox = minbox;

    const double ratio = (double)maxbox / (double)minbox;
    if (!(ratio > 0.0)) die("Invalid scale bounds.");

    long rslen = (long)(log10(ratio) / log10(boxratio) + 1.0);
    if (rslen < 1) rslen = 1;

    long *rs = (long *)xmalloc((size_t)rslen * sizeof(long));
    long n = 1;

    rs[0] = minbox;

    for (long ir = 1; (n <= rslen) && (rs[n - 1] < maxbox); ir++) {
        const long rw = (long)(minbox * pow(boxratio, (double)ir) + 0.5);
        if (rw > rs[n - 1]) {
            const long odd = force_odd(rw);
            if (n < rslen) rs[n++] = odd;
            else break;
        }
    }

    if (n >= 1 && rs[n - 1] > maxbox) n--;
    if (n < 1) n = 1;

    *out_scales = rs;
    return n;
}

/* -------------------------
   Fast tokenizer: parse selected column
   - Accepts separators: space/tab/comma
   - Ignores non-numeric rows
   ------------------------- */

static int parse_column_inplace(const char *line, int selcol, double *out_value)
{
    const unsigned char *p = (const unsigned char *)line;
    int col = 0;

    while (*p) {
        while (*p == (unsigned char)' ' || *p == (unsigned char)'\t' || *p == (unsigned char)',') p++;
        if (*p == '\0' || *p == '\n' || *p == '\r') break;

        col++;
        if (col == selcol) {
            char *endptr = NULL;
            errno = 0;
            const double v = strtod((const char *)p, &endptr);
            if (endptr == (const char *)p) return 0; /* no conversion */
            if (errno != 0) return 0;                /* overflow/underflow etc. */
            *out_value = v;
            return 1;
        }

        /* skip token */
        while (*p &&
               *p != (unsigned char)' ' && *p != (unsigned char)'\t' &&
               *p != (unsigned char)',' &&
               *p != (unsigned char)'\n' && *p != (unsigned char)'\r') {
            p++;
        }
    }
    return 0;
}

/* -------------------------
   Context
   ------------------------- */

typedef struct {
    /* data */
    double *x; /* raw series */
    double *y; /* working series (integrated demeaned by default) */
    long n;

    /* options */
    int order;                 /* 0,2,4 */
    int without_integration;   /* -i */
    int asymmetric_ends;       /* -a (only d=0,2) */
    long refresh_interval;     /* -r (0 disables refresh) */
    int use_scale_correction;  /* -S */
    double scale_correction;   /* C(d) */

} DMAContext;

/* -------------------------
   Load series (streaming, large-buffered)
   ------------------------- */

static long load_series_fast(DMAContext *ctx, const char *filename, int skip_lines, int selcol)
{
    FILE *fp = fopen(filename, "r");
    if (!fp) {
        fprintf(stderr, "%s: cannot open file '%s'\n", g_progname, filename);
        return 0;
    }

    /* large stdio buffer: reduces syscalls on huge files */
    {
        static char io_buf[1 << 20]; /* 1 MiB */
        setvbuf(fp, io_buf, _IOFBF, sizeof(io_buf));
    }

    static char line_buf[1 << 20]; /* 1 MiB per line (safety for wide CSVs) */

    for (int i = 0; i < skip_lines; i++) {
        if (!fgets(line_buf, (int)sizeof(line_buf), fp)) break;
    }

    long cap = 1 << 14; /* 16384 initial */
    long n = 0;
    double *tmp = (double *)xmalloc((size_t)cap * sizeof(double));

    while (fgets(line_buf, (int)sizeof(line_buf), fp)) {
        double v;
        if (!parse_column_inplace(line_buf, selcol, &v)) continue;

        if (n >= cap) {
            /* geometric growth to reduce realloc frequency */
            cap = (cap < (1L << 20)) ? (cap * 2) : (cap + (1L << 20));
            tmp = (double *)xrealloc(tmp, (size_t)cap * sizeof(double));
        }
        tmp[n++] = v;
    }
    fclose(fp);

    if (n <= 0) {
        free(tmp);
        fprintf(stderr, "%s: no numeric data read from '%s'\n", g_progname, filename);
        return 0;
    }

    ctx->x = (double *)xrealloc(tmp, (size_t)n * sizeof(double)); /* shrink-to-fit */
    ctx->y = (double *)xmalloc((size_t)n * sizeof(double));
    ctx->n = n;

    if (!ctx->without_integration) {
        /* demean and integrate */
        double mean = 0.0;
        for (long i = 0; i < n; i++) mean += ctx->x[i];
        mean /= (double)n;

        ctx->y[0] = ctx->x[0] - mean;
        for (long i = 1; i < n; i++) ctx->y[i] = ctx->y[i - 1] + (ctx->x[i] - mean);
    } else {
        memcpy(ctx->y, ctx->x, (size_t)n * sizeof(double));
    }

    return n;
}

/* -------------------------
   Center-trend coefficients by order
   (computed per scale)
   ------------------------- */

typedef struct {
    double c1, c2, c3; /* used as: a0 = c1*S0 + c2*S2 + c3*S4 (depending on order) */
} Coeff;

static Coeff coeff_dma0(long k)
{
    Coeff c;
    c.c1 = 1.0 / (double)(2 * k + 1);
    c.c2 = 0.0;
    c.c3 = 0.0;
    return c;
}

static Coeff coeff_dma2(long k)
{
    const double kk  = (double)k;
    const double k2  = kk * kk;
    const double den = 8.0 * k2 * kk + 12.0 * k2 - 2.0 * kk - 3.0;

    Coeff c;
    c.c1 = (9.0 * k2 + 9.0 * kk - 3.0) / den;
    c.c2 = -15.0 / den;
    c.c3 = 0.0;
    return c;
}

static Coeff coeff_dma4(long k)
{
    const double kk = (double)k;
    const double k2 = kk * kk;
    const double k3 = k2 * kk;
    const double k4 = k3 * kk;
    const double k6 = k2 * k4;

    const double den = 180.0 + 72.0 * kk - 800.0 * k2 - 320.0 * k3 + 320.0 * k4 + 128.0 * k6;

    Coeff c;
    c.c1 = (180.0 - 750.0 * kk - 525.0 * k2 + 450.0 * k3 + 225.0 * k4) / den;
    c.c2 = (1575.0 - 1050.0 * kk - 1050.0 * k2) / den;
    c.c3 = 945.0 / den;
    return c;
}

/* -------------------------
   Asymmetric ends (d=0,2)
   ------------------------- */

/* DMA0 end trend uses running average length that grows toward the center */
static double dma0_end_coeff(long m, long scale)
{
    /* c1 = 1 / (((scale-1)/2) + m + 1)  (original definition) */
    return 1.0 / (double)(((scale - 1) / 2) + m + 1);
}

/* SG-like coefficients for DMA2 ends (copied from refined DMA2 implementation)
   coeff array length: (2*k_half+1), indexed by (idx + k_half) for idx in [-k_half, k_half]
*/
static void sg2_fill(double *coeff, long m, long n, long k_half)
{
    for (long i = 0; i < 2 * k_half + 1; i++) coeff[i] = 0.0;

    const double mm = (double)m;
    const double nn = (double)n;

    double mn = 2.0 * mm + nn;
    const double n2 = nn * nn;
    const double n3 = n2 * nn;
    const double m2 = mm * mm;
    const double m3 = m2 * mm;

    const double d = (mn + 3.0) * (mn - 3.0) * (mn - 1.0) * (mn + 1.0) * (mn + 5.0);

    mn = mm * nn;

    for (long idx = -m; idx <= k_half; idx++) {
        const double kx  = (double)idx;
        const double kk2 = kx * kx;

        const double val =
            (90.0 + 720.0 * kk2 - 240.0 * mm + 576.0 * kx * mm + 960.0 * kk2 * mm +
             48.0 * m2 + 1728.0 * kx * m2 + 960.0 * kk2 * m2 + 576.0 * m3 +
             1152.0 * kx * m3 + 288.0 * m2 * m2 - 120.0 * nn - 432.0 * kx * nn -
             960.0 * kk2 * nn - 528.0 * mn - 2304.0 * kx * mn - 1920.0 * kk2 * mn -
             864.0 * m2 * nn - 2304.0 * kx * m2 * nn - 576.0 * m3 * nn +
             84.0 * n2 + 576.0 * kx * n2 + 240.0 * kk2 * n2 + 720.0 * mm * n2 +
             1152.0 * kx * mm * n2 + 720.0 * m2 * n2 - 72.0 * n3 - 144.0 * kx * n3 -
             144.0 * mm * n3 + 18.0 * n2 * n2) / d;

        if (idx >= -k_half && idx <= k_half) coeff[idx + k_half] = val;
    }
}

/* -------------------------
   Core estimators
   ------------------------- */

static double estimate_f2_dma0(const DMAContext *ctx, long scale, double *out_emax)
{
    const double *restrict y = ctx->y;
    const long n = ctx->n;
    const long k = (scale - 1) / 2;

    const Coeff cf = coeff_dma0(k);
    const long nsteps = n - scale;

    double f2 = 0.0;
    double maxdiff = 0.0;

    /* asymmetric ends (optional) */
    if (ctx->asymmetric_ends) {
        double ysumL = 0.0, ysumR = 0.0;

        for (long i = 0; i < k; i++) {
            ysumL += y[i];
            ysumR += y[n - 1 - i];
        }

        for (long i = 0; i < k; i++) {
            ysumL += y[k + i];
            ysumR += y[n - 1 - k - i];

            const double aL = dma0_end_coeff(i, scale) * ysumL;
            const double aR = dma0_end_coeff(i, scale) * ysumR;

            const double dL = y[i] - aL;
            const double dR = y[n - 1 - i] - aR;
            f2 += dL * dL + dR * dR;
        }
    }

    /* center part (rolling sum) */
    double S0 = 0.0;
    for (long j = 0; j < scale; j++) S0 += y[j];

    {
        const double a0 = cf.c1 * S0;
        const double d0 = y[k] - a0;
        f2 += d0 * d0;
    }

    for (long p = 0; p < nsteps; p++) {
        S0 += y[p + scale] - y[p];

        if (ctx->refresh_interval > 0 && (p % ctx->refresh_interval) == 0) {
            const double S0_old = S0;
            double S0_new = 0.0;
            const long base = (p + 1);
            for (long j = 0; j < scale; j++) S0_new += y[base + j];

            S0 = S0_new;

            const double diff = fabs(S0_new - S0_old);
            if (diff > maxdiff) maxdiff = diff;
        }

        const long center = (p + 1) + k;
        const double a0 = cf.c1 * S0;
        const double d  = y[center] - a0;
        f2 += d * d;
    }

    /* Emax in trend units */
    if (out_emax) *out_emax = cf.c1 * maxdiff;

    /* normalization consistent with refined DMA0 code */
    if (ctx->asymmetric_ends) return f2 / (double)(nsteps + 1);
    return f2 / (double)n;
}

static double estimate_f2_dma2(const DMAContext *ctx, long scale, double *out_emax)
{
    const double *restrict y = ctx->y;
    const long n = ctx->n;
    const long k = (scale - 1) / 2;

    const Coeff cf = coeff_dma2(k);
    const long nsteps = n - scale;

    double f2 = 0.0;
    double maxdiff = 0.0;

    /* asymmetric ends (optional) */
    if (ctx->asymmetric_ends) {
        double *coeff = (double *)xmalloc((size_t)(2 * k + 1) * sizeof(double));

        for (long i = 1; i <= k; i++) {
            const long m = i - 1;
            sg2_fill(coeff, m, scale, k);

            double aL = 0.0, aR = 0.0;
            for (long j0 = 0; j0 <= m + k; j0++) {
                const long cidx = (j0 - m);
                const double w = coeff[cidx + k];
                aL += w * y[j0];
                aR += w * y[n - 1 - j0];
            }

            const double dL = y[m] - aL;
            const double dR = y[n - 1 - m] - aR;
            f2 += dL * dL + dR * dR;
        }

        free(coeff);
    }

    /* initial moments on first window */
    double S0 = 0.0, S1 = 0.0, S2 = 0.0;
    for (long j = 0; j < scale; j++) {
        const double t = (double)(j - k);
        const double v = y[j];
        S0 += v;
        S1 += v * t;
        S2 += v * (t * t);
    }

    {
        const double a0 = cf.c1 * S0 + cf.c2 * S2;
        const double d0 = y[k] - a0;
        f2 += d0 * d0;
    }

    for (long p = 0; p < nsteps; p++) {
        const double x_out = y[p];
        const double x_in  = y[p + scale];

        const double S0_old = S0;
        const double S1_old = S1;
        const double S2_old = S2;

        /* rolling updates (refined DMA2 recurrence) */
        S0 = S0_old - x_out + x_in;
        S1 = S1_old - S0_old + x_out * (double)(k + 1) + x_in * (double)k;
        S2 = S2_old - 2.0 * S1_old + S0_old
                - x_out * (double)(k + 1) * (double)(k + 1)
                + x_in  * (double)k       * (double)k;

        if (ctx->refresh_interval > 0 && (p % ctx->refresh_interval) == 0) {
            const double S0r_old = S0;
            const double S2r_old = S2;

            double S0n = 0.0, S1n = 0.0, S2n = 0.0;
            const long base = (p + 1);
            for (long j = 0; j < scale; j++) {
                const double t = (double)(j - k);
                const double v = y[base + j];
                S0n += v;
                S1n += v * t;
                S2n += v * (t * t);
            }
            S0 = S0n; S1 = S1n; S2 = S2n;

            /* compare trend (not raw moments) to get a meaningful drift measure */
            const double diff = fabs(cf.c1 * (S0n - S0r_old) + cf.c2 * (S2n - S2r_old));
            if (diff > maxdiff) maxdiff = diff;
        }

        const long center = (p + 1) + k;
        const double a0 = cf.c1 * S0 + cf.c2 * S2;
        const double d  = y[center] - a0;
        f2 += d * d;
    }

    if (out_emax) *out_emax = maxdiff;

    if (ctx->asymmetric_ends) return f2 / (double)(nsteps + 1);
    return f2 / (double)n;
}

static double estimate_f2_dma4(const DMAContext *ctx, long scale, double *out_emax)
{
    const double *restrict y = ctx->y;
    const long n = ctx->n;
    const long k = (scale - 1) / 2;

    const Coeff cf = coeff_dma4(k);
    const long nsteps = n - scale;

    double maxdiff = 0.0;

    /* initial moments */
    double S0 = 0.0, S1 = 0.0, S2 = 0.0, S3 = 0.0, S4 = 0.0;
    for (long j = 0; j < scale; j++) {
        const double t  = (double)(j - k);
        const double v  = y[j];
        const double t2 = t * t;
        const double t3 = t2 * t;
        const double t4 = t2 * t2;

        S0 += v;
        S1 += v * t;
        S2 += v * t2;
        S3 += v * t3;
        S4 += v * t4;
    }

    double f2;
    {
        const double a0 = cf.c1 * S0 + cf.c2 * S2 + cf.c3 * S4;
        const double d0 = y[k] - a0;
        f2 = d0 * d0;
    }

    const double kp1 = (double)(k + 1);
    const double k0  = (double)k;

    for (long p = 0; p < nsteps; p++) {
        const double x_out = y[p];
        const double x_in  = y[p + scale];

        /* store old values for recurrences and drift diagnostics */
        const double S0_old = S0;
        const double S1_old = S1;
        const double S2_old = S2;
        const double S3_old = S3;
        const double S4_old = S4;

        S0 = S0_old - x_out + x_in;

        S1 = S1_old - S0_old + x_out * kp1 + x_in * k0;

        S2 = S2_old - 2.0 * S1_old + S0_old
             - x_out * (kp1 * kp1) + x_in * (k0 * k0);

        S3 = S3_old - 3.0 * S2_old + 3.0 * S1_old - S0_old
             + x_out * (kp1 * kp1 * kp1) + x_in * (k0 * k0 * k0);

        S4 = S4_old - 4.0 * S3_old + 6.0 * S2_old - 4.0 * S1_old + S0_old
             - x_out * (kp1 * kp1 * kp1 * kp1) + x_in * (k0 * k0 * k0 * k0);

        if (ctx->refresh_interval > 0 && (p % ctx->refresh_interval) == 0) {
            double S0n = 0.0, S1n = 0.0, S2n = 0.0, S3n = 0.0, S4n = 0.0;
            const long base = (p + 1);
            for (long j = 0; j < scale; j++) {
                const double t  = (double)(j - k);
                const double v  = y[base + j];
                const double t2 = t * t;
                const double t3 = t2 * t;
                const double t4 = t2 * t2;

                S0n += v;
                S1n += v * t;
                S2n += v * t2;
                S3n += v * t3;
                S4n += v * t4;
            }

            S0 = S0n; S1 = S1n; S2 = S2n; S3 = S3n; S4 = S4n;

            const double diff = fabs(cf.c1 * (S0n - S0_old) +
                                     cf.c2 * (S2n - S2_old) +
                                     cf.c3 * (S4n - S4_old));
            if (diff > maxdiff) maxdiff = diff;
        }

        const long center = (p + 1) + k;
        const double a0 = cf.c1 * S0 + cf.c2 * S2 + cf.c3 * S4;
        const double d  = y[center] - a0;
        f2 += d * d;
    }

    if (out_emax) *out_emax = maxdiff;

    /* DMA4 normalization consistent with refined DMA4 code */
    return f2 / (double)(nsteps + 1);
}

static double estimate_f2(const DMAContext *ctx, long scale, double *out_emax)
{
    const long n = ctx->n;

    if (scale > n) die("Scale exceeds series length.");
    if ((scale % 2) == 0) die("Scale must be odd.");

    if (ctx->order == 0) {
        if (scale < 5) die("For d=0, scale must be >= 5.");
        return estimate_f2_dma0(ctx, scale, out_emax);
    } else if (ctx->order == 2) {
        if (scale < 7) die("For d=2, scale must be >= 7.");
        return estimate_f2_dma2(ctx, scale, out_emax);
    } else if (ctx->order == 4) {
        if (scale < 9) die("For d=4, scale must be >= 9.");
        return estimate_f2_dma4(ctx, scale, out_emax);
    }

    die("Unsupported order. Use -d 0, -d 2, or -d 4.");
    return 0.0;
}

/* -------------------------
   Main
   ------------------------- */

int main(int argc, char **argv)
{
    g_progname = (argc > 0 && argv[0]) ? argv[0] : "sgdma";

    /* CLI defaults */
    int  skip_lines = 0;
    int  selcol = 1;
    long minbox = -1;   /* order-dependent if not specified */
    long maxbox = 0;
    const double boxratio = pow(2.0, 1.0 / 8.0);

    DMAContext ctx;
    memset(&ctx, 0, sizeof(ctx));

    ctx.order = 0;
    ctx.use_scale_correction = 1;
    ctx.without_integration = 0;
    ctx.asymmetric_ends = 0;
    ctx.refresh_interval = -1; /* sentinel: not set */

    if (argc < 2) {
        print_usage();
        return EXIT_FAILURE;
    }

    /* parse options */
    int argi = 1;
    while (argi < argc && argv[argi][0] == '-') {
        const char *opt = argv[argi];

        if (strcmp(opt, "-h") == 0) {
            print_usage();
            return EXIT_SUCCESS;

        } else if (strcmp(opt, "-d") == 0) {
            if (++argi >= argc) die("Missing value after -d.");
            ctx.order = atoi(argv[argi]);
            if (!(ctx.order == 0 || ctx.order == 2 || ctx.order == 4)) die("ORDER must be 0, 2, or 4.");

        } else if (strcmp(opt, "-S") == 0) {
            if (++argi >= argc) die("Missing value after -S.");
            ctx.use_scale_correction = (atoi(argv[argi]) != 0);

        } else if (strcmp(opt, "-s") == 0) {
            if (++argi >= argc) die("Missing value after -s.");
            skip_lines = atoi(argv[argi]);
            if (skip_lines < 0) skip_lines = 0;

        } else if (strcmp(opt, "-c") == 0) {
            if (++argi >= argc) die("Missing value after -c.");
            selcol = atoi(argv[argi]);
            if (selcol < 1) die("COLUMN must be >= 1.");

        } else if (strcmp(opt, "-l") == 0) {
            if (++argi >= argc) die("Missing value after -l.");
            minbox = atol(argv[argi]);

        } else if (strcmp(opt, "-u") == 0) {
            if (++argi >= argc) die("Missing value after -u.");
            maxbox = atol(argv[argi]);

        } else if (strcmp(opt, "-a") == 0) {
            ctx.asymmetric_ends = 1;

        } else if (strcmp(opt, "-i") == 0) {
            ctx.without_integration = 1;

        } else if (strcmp(opt, "-r") == 0) {
            if (++argi >= argc) die("Missing value after -r.");
            ctx.refresh_interval = atol(argv[argi]);
            if (ctx.refresh_interval < 0) ctx.refresh_interval = 0;

        } else {
            print_usage();
            die("Unknown option.");
        }

        argi++;
    }

    if (argi >= argc) {
        print_usage();
        die("Missing input filename.");
    }
    const char *filename = argv[argi];

    /* apply order-dependent defaults */
    if (minbox < 0) minbox = default_minbox_for_order(ctx.order);
    minbox = force_odd(minbox);

    if (ctx.refresh_interval < 0) {
        ctx.refresh_interval = default_refresh_for_order(ctx.order);
    }
    /* note: -r 0 explicitly disables refresh */

    ctx.scale_correction = default_scale_correction_for_order(ctx.order);

    /* for d=4: ignore -a but accept for CLI compatibility */
    if (ctx.order == 4 && ctx.asymmetric_ends) {
        fprintf(stderr, "%s: note: -a is ignored for d=4.\n", g_progname);
        ctx.asymmetric_ends = 0;
    }

    /* load data */
    const long n = load_series_fast(&ctx, filename, skip_lines, selcol);
    if (n <= 0) return EXIT_FAILURE;

    fprintf(stderr, "Input file: %s\n", filename);
    fprintf(stderr, "Total data length = %ld\n", n);
    fprintf(stderr, "***** centered DMA (d=%d) *****\n", ctx.order);

    /* maxbox default and validation */
    const long min_required = default_minbox_for_order(ctx.order);
    if (minbox < min_required) minbox = min_required;

    if (maxbox <= 0 || maxbox > n / 4 || minbox > maxbox) {
        maxbox = n / 4;
    }
    if (maxbox < minbox) maxbox = minbox;

    /* scales */
    long *scales = NULL;
    const long scales_len = generate_scales_exact(minbox, maxbox, boxratio, &scales);
    if (scales_len <= 0) die("Failed to generate scales.");

    double *F2   = (double *)xmalloc((size_t)scales_len * sizeof(double));
    double *Emax = (double *)xmalloc((size_t)scales_len * sizeof(double));

    /* timing */
    const clock_t t0 = clock();

    for (long i = 0; i < scales_len; i++) {
        double emax = 0.0;
        F2[i] = estimate_f2(&ctx, scales[i], &emax);
        Emax[i] = emax;
    }

    const clock_t t1 = clock();
    const double ms = 1000.0 * (double)(t1 - t0) / (double)CLOCKS_PER_SEC;
    fprintf(stderr, "Processing time: %.3f [ms]\n", ms);

    /* output */
    for (long i = 0; i < scales_len; i++) {
        const double s = (double)scales[i];
        const double s_eff = (ctx.use_scale_correction ? (s / ctx.scale_correction) : s);

        /* use log10() directly (faster and clearer than log()/log(10)) */
        const double log10s = log10(s_eff);
        const double log10F = 0.5 * log10(F2[i]); /* log10(sqrt(F2)) */

        fprintf(stdout, "%.15g,%.15g,%.15g\n", log10s, log10F, Emax[i]);
    }

    /* cleanup */
    free(ctx.x);
    free(ctx.y);
    free(scales);
    free(F2);
    free(Emax);

    return EXIT_SUCCESS;
}

※ もし記事の中で「ここ違うよ」という点や気になるところがあれば、気軽に指摘していただけると助かります。質問や「このテーマも取り上げてほしい」といったリクエストも大歓迎です。必ず対応するとは約束できませんが、できるだけ今後の記事で扱いたいと思います。それと、下のはてなブログランキングはあまり信用できる指標ではなさそうですが (私のブログを読んでいる人は、実際とても少ないです)、押してもらえるとシンプルに励みになります。気が向いたときにポチッとしていただけたら嬉しいです。

この記事に掲載したプログラム使った研究成果の発表について

 ここに掲載したプログラム(または改変版)を用いて論文等を書く場合は、以下の論文を引用してください。

Tsujimoto, Y., Miki, S., Shimatani, and K. Kiyono, “Fast algorithm for scaling analysis with higher-order detrending moving average method,” Physical Review E 93, 053304 (2016).