Time Series Clustering
Time series clustering is a technique used to group similar time series together. This can be useful for finding patterns in data, detecting anomalies, or reducing the dimensionality of large datasets.
augurs
provides several clustering algorithms, including DBSCAN (Density-Based Spatial Clustering of Applications with Noise). DBSCAN is particularly well-suited for time series data as it:
- Doesn't require specifying the number of clusters upfront
- Can find arbitrarily shaped clusters
- Can identify noise points that don't belong to any cluster
- Works well with Dynamic Time Warping (DTW) distance measures
Basic Example
Let's start with a simple example using DBSCAN clustering:
extern crate augurs; use augurs::{ clustering::{DbscanCluster, DbscanClusterer}, dtw::Dtw, }; // Sample time series data const SERIES: &[&[f64]] = &[ &[0.0, 1.0, 2.0, 3.0, 4.0], &[0.1, 1.1, 2.1, 3.1, 4.1], &[5.0, 6.0, 7.0, 8.0, 9.0], &[5.1, 6.1, 7.1, 8.1, 9.1], &[10.0, 11.0, 12.0, 13.0, 14.0], ]; fn main() { // Compute distance matrix using DTW let distance_matrix = Dtw::euclidean() .with_window(2) .with_lower_bound(4.0) .with_upper_bound(10.0) .with_max_distance(10.0) .distance_matrix(SERIES); // Set DBSCAN parameters let epsilon = 0.5; let min_cluster_size = 2; // Perform clustering let clusters = DbscanClusterer::new(epsilon, min_cluster_size) .fit(&distance_matrix); // Clusters are labeled: -1 for noise, 0+ for cluster membership assert_eq!( clusters, vec![ DbscanCluster::Cluster(1.try_into().unwrap()), DbscanCluster::Cluster(1.try_into().unwrap()), DbscanCluster::Cluster(2.try_into().unwrap()), DbscanCluster::Cluster(2.try_into().unwrap()), DbscanCluster::Noise, ] ); }
Understanding Parameters
DTW Parameters
window
: Size of the Sakoe-Chiba band for constraining DTW computationlower_bound
: Minimum distance to considerupper_bound
: Maximum distance to considermax_distance
: Early termination threshold
DBSCAN Parameters
epsilon
: Maximum distance between two points for one to be considered in the neighborhood of the othermin_cluster_size
: Minimum number of points required to form a dense region
Best Practices
-
Distance Measure Selection
- Use DTW for time series that might be shifted or warped
- Consider the computational cost of DTW for large datasets
- Experiment with different window sizes to balance accuracy and performance
-
Parameter Tuning
- Start with a relatively large
epsilon
and reduce it if clusters are too large - Set
min_cluster_size
based on your domain knowledge - Use the DTW window parameter to prevent pathological alignments
- Start with a relatively large
-
Performance Optimization
- Enable parallel processing for large datasets
- Use DTW bounds to speed up distance calculations
- Consider downsampling very long time series
Example: Clustering with Multiple Distance Measures
#![allow(unused)] fn main() { extern crate augurs; use augurs::{ clustering::DbscanClusterer, dtw::{Dtw, Distance} }; fn compare_distance_measures(series: &[&[f64]]) { // Euclidean DTW let euclidean_matrix = Dtw::euclidean() .distance_matrix(series); let euclidean_clusters = DbscanClusterer::new(0.5, 2) .fit(&euclidean_matrix); // Manhattan DTW let manhattan_matrix = Dtw::manhattan() .distance_matrix(series); let manhattan_clusters = DbscanClusterer::new(0.5, 2) .fit(&manhattan_matrix); // Compare results println!("Euclidean clusters: {:?}", euclidean_clusters); println!("Manhattan clusters: {:?}", manhattan_clusters); } }
Next Steps
- Learn about outlier detection using clustering
- Explore seasonality analysis for clustered time series
- Understand feature extraction for time series