Backends & performance

Every method has more than one backend — a concrete implementation of the same math. You select among them with the backend= argument; the result is identical to floating-point round-off regardless of which one runs.

Selecting a backend

pg = cup.periodogram(lc, "GLS", backend="auto")    # default

The four selectors:

backend=

Meaning

"auto" (default)

Use the GPU when a CUDA device and the [gpu] extra are present; otherwise the best CPU backend. The same code runs on any machine.

"cpu"

Force the best CPU backend for this method.

"gpu"

Force the GPU backend. Raises BackendUnavailableError if no GPU is usable.

a concrete name

Force a specific implementation, e.g. "finufft", "astropy", "numpy", "cupy", "cufinufft", "numba".

What each method can run

Method

CPU backends

GPU backend

cpu resolves to

GLS

finufft, astropy

cufinufft

finufft

BLS

numba (with [fast]), astropy, numpy

cupy

numba if installed, else astropy

PDM

numpy

cupy

numpy

CE

numpy

cupy

numpy

String-Length

numpy

cupy

numpy

MHAOV

numpy

cupy

numpy

TLS

numpy

cupy

numpy

† BLS’s numpy backend is a GPU-parity reference, not the product path — it shares one array-module-generic source with the CUDA kernel so the two validate to floating point, but it is slow (it trades memory traffic for the parallelism that makes the GPU fast). For CPU BLS use numba (the default with [fast]) or astropy, not numpy.

Inspect the live picture for your install:

for info in cup.list_methods():
    print(info.name, "| all:", info.all_backends, "| available:", info.available_backends)

What’s fast where

The headline from the Validation & benchmarks (single light curve, ~900 points; an RTX 5070 Ti vs the CPU backends):

Method

CPU backend

CPU time

GPU time

GPU speed-up

GLS

finufft

0.019 s

0.007 s

~3×

BLS

numba

0.187 s

0.091 s

~2×

PDM

numpy

1.79 s

0.010 s

178×

CE

numpy

0.53 s

0.012 s

45×

String-Length

numpy

0.92 s

0.023 s

39×

MHAOV

numpy

3.56 s

0.111 s

32×

TLS

numpy

4.49 s

0.042 s

106×

How to read this:

  • GLS and BLS already have specialized CPU backends (finufft; the multicore numba box search). They’re so fast on the CPU that the GPU’s marginal gain on a single curve is modest — but the GPU still wins decisively for large grids and big catalogs (Batch processing).

  • PDM, CE, String-Length, MHAOV, TLS run on numpy on the CPU, so the GPU’s data-parallelism delivers 30–180× on a single curve. If you’re searching many periods or many stars with these, the GPU is a large win.

  • cuPeriod’s CPU path already beats the established reference tools it was checked against (GLS ~3× astropy, PDM ~4× PyAstronomy, BLS ~20× astropy’s BoxLeastSquares).

Tip

Write backend="auto" and let cuPeriod choose. Reach for "cpu"/"gpu" or a concrete name only to benchmark, to pin a reference implementation, or to keep a long-running job off a busy GPU.

When the GPU isn’t available

backend="auto" silently falls back to the CPU, so portable code “just works.” backend="gpu" is explicit and raises BackendUnavailableError when no CUDA device or the [gpu] extra is missing — use it when you want to fail loudly rather than run on the CPU by accident.

try:
    pg = cup.periodogram(lc, "TLS", backend="gpu")
except cup.BackendUnavailableError as e:
    print("no GPU:", e)

Next: Tuning: settings & grids — settings and custom grids.