Requirements¶

v0.1 Scope¶

Implement high-performance FM and FFM estimators with a sklearn-like Python API. Backend: Rust CPU (PyO3/maturin). Ground truth: pure-NumPy reference implementations in python/modern_fm/_reference.py.

Supported tasks¶

Regression¶

Squared loss
Optional MAE metric in evaluation helper

Binary classification¶

Logistic loss
predict_proba, predict, decision_function
class_weight, sample_weight, label_smoothing

Multiclass classification¶

Softmax loss
predict_proba, predict
class_weight, sample_weight, label_smoothing

Input types¶

Must support: - numpy.ndarray dense, float32/float64 - scipy.sparse.csr_matrix / csr_array - categorical integer arrays through a helper encoder - explicit field_ids array for FFM (fit(X, y, field_ids=...), required — no automatic field inference in v0.1)

Nice to have (v0.2+): - pandas / polars DataFrame - libffm text format loader

Optimizers¶

v0.1: SGD, AdaGrad
v0.2: Adam, FTRL-Proximal

Regularization¶

L2 on linear weights (l2_linear)
L2 on latent factors (l2_factors)
v0.2+: optional L1 on linear weights, dropout on pairwise interactions

Early stopping¶

eval_set, eval_metric, patience, min_delta, restore_best_weights
or internal split via early_stopping=True + validation_fraction

Reproducibility¶

random_state parameter
deterministic initialization and data shuffling under a fixed seed

Serialization¶

save_model(path) / load_model(path)
pickle-compatible Python wrapper

Phase plan within v0.1¶

Phase 0: docs + package skeleton ← done
Phase 1: Python reference predictions + losses + correctness tests ← current
Phase 2: Rust CPU backend (CSR bridge, FM/FFM predict + SGD/AdaGrad fit)
Phase 3: sklearn API polish (mixins, check_is_fitted, validation)
Phase 4: early_stopping, label_smoothing, class_weight, sample_weight, save/load