I'm using the Ruptures package and not getting the changepoints I'd expected. I got this result:
Take a look at 2024-05-26. You can see that, to a human eye, this is clearly a major changepoint. The line has been in a downward trend and, at that date, has suddenly shifted into an upward climb. However, Ruptures is identifying the changepoint as 2024-06-09, a few days after the obvious change, instead.
What's going on here? Why is Ruptures picking this date and how can I improve the model?
Here is my code:
import numpy as np
import ruptures as rpt
import matplotlib.pyplot as plt
# Data Import
data = pd.read_csv('my_data.csv')
# Model calibration
algo = rpt.Pelt(model='l2', min_size=20, jump=1).fit(data)
change_points = algo.predict(pen=statistics.variance(data['effect'])) # Using the variance in the key metric as the penality.
# List change points:
print("Change points detected at indices:", data.index[change_points[:-1]])
# Visualize data:
rpt.display(data, change_points, figsize=(10, 6))
plt.xlabel('Date')
plt.ylabel('Cumulative Effect')
plt.xticks(ticks=range(0, len(data), len(data)//10), labels=data.index[::len(data)//10])
plt.grid(True)
plt.show()