Search timeseries data by date range

October 23, 2018 by rdagumampan

We’ve always been working with timeseries data and at one point we all once came accross a challenge of searching data points based on a time period. Yesterday, while we designing our API, our team shared this interesting representation from OSISoft API and I thought this could be fun code :)!

For copyright reasons, I have re-draw the visualization. Headers from left are time, headers from right are search results whre [x] are matching rows for given search mode. Red line represents the search parameters.

image

TL;DR? The source code is available here.

Context: Given the following time series data, we would like to search matching data based on time range. A time range is a period for time with Start and End. The underlying query language will be ANSI-SQL92 but would work with any language.

Mode Description
StartInclusive All data that started within the search period.
EndInclusive All data that ended within the search period.
Inclusive All data that started and ended within the search period.
Overlapped All data that overlapped the search period at any point in time.
InProgress All data that started within the search period and end time is Datetime.MaxValue or NULL.

Setup Database

CREATE DATABASE [SEARCHDATAL];
GO

CREATE TABLE [dbo].[TimeseriesData](
	[Id] [nvarchar](50) NOT NULL,
	[StartTimeUtc] [datetime2](7) NOT NULL,
	[EndTimeUtc] [datetime2](7) NOT NULL
) ON [PRIMARY]
GO

Setup Test Data

[TestInitialize]
public void TestSetup()
{
    var timeSeriesData = new List<TimeseriesData> {
        new TimeseriesData{Id="#1", StartTimeUtc = new DateTime(2018,01,10,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,24,0,0,0,DateTimeKind.Utc)},
        new TimeseriesData{Id="#2", StartTimeUtc = new DateTime(2018,01,10,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,20,0,0,0,DateTimeKind.Utc)},
        new TimeseriesData{Id="#3", StartTimeUtc = new DateTime(2018,01,10,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,14,0,0,0,DateTimeKind.Utc)},
        new TimeseriesData{Id="#4", StartTimeUtc = new DateTime(2018,01,10,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,12,0,0,0,DateTimeKind.Utc)},
        new TimeseriesData{Id="#5", StartTimeUtc = new DateTime(2018,01,10,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,11,0,0,0,DateTimeKind.Utc)},

        new TimeseriesData{Id="#6", StartTimeUtc = new DateTime(2018,01,12,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,24,0,0,0,DateTimeKind.Utc)},
        new TimeseriesData{Id="#7", StartTimeUtc = new DateTime(2018,01,12,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,20,0,0,0,DateTimeKind.Utc)},
        new TimeseriesData{Id="#8", StartTimeUtc = new DateTime(2018,01,12,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,17,0,0,0,DateTimeKind.Utc)},

        new TimeseriesData{Id="#9", StartTimeUtc = new DateTime(2018,01,16,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,24,0,0,0,DateTimeKind.Utc)},
        new TimeseriesData{Id="#10", StartTimeUtc = new DateTime(2018,01,16,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,20,0,0,0,DateTimeKind.Utc)},
        new TimeseriesData{Id="#11", StartTimeUtc = new DateTime(2018,01,16,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,18,0,0,0,DateTimeKind.Utc)},

        new TimeseriesData{Id="#12", StartTimeUtc = new DateTime(2018,01,20,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,24,0,0,0,DateTimeKind.Utc)},
        new TimeseriesData{Id="#13", StartTimeUtc = new DateTime(2018,01,21,0,0,0,DateTimeKind.Utc), EndTimeUtc = new DateTime(2018,01,24,0,0,0,DateTimeKind.Utc)},
    };

    using (var connection = new SqlConnection(connectionString))
    {
        connection.Truncate<TimeseriesData>();
        connection.BulkInsert(timeSeriesData);
    }
}

Test / StartInclusive

This is an excerpt in solving the StartInclusive search mode. Solutions to all other modes can be found in the source code.

[TestMethod]
public void SearchStartingBetweenTest()
{
    var startTimeUtc = new DateTime(2018, 01, 12, 0, 0, 0, DateTimeKind.Utc);
    var endTimeUtc = new DateTime(2018, 01, 20, 0, 0, 0, DateTimeKind.Utc);

    var query = @"
        SELECT 
            TD.[Id]
        ,TD.[StartTimeUtc]
        ,TD.[EndTimeUtc]
        FROM [SEARCHDATAL].[dbo].[TimeseriesData] TD
        WHERE
            TD.StartTimeUtc >= @startTimeUtc
            AND TD.StartTimeUtc < @endTimeUtc;
        ";

    var expected = new string[] { "#6", "#7", "#8", "#9", "#10", "#11", };
    using (var connection = new SqlConnection(connectionString))
    {
        var results = connection.ExecuteQuery<TimeseriesData>(query, new { startTimeUtc, endTimeUtc }).ToList();
        results.Count().ShouldBe(6);
        results.All(r => expected.Contains(r.Id)).ShouldBe(true);
        expected.All(r => results.Select(s=> s.Id).Contains(r)).ShouldBe(true);
    }
}

image

Conclusion

At first, the problem looks very complex but it turns out to be quite simple when aided with good data visualization. As it most timeseries problems, visualizing the scenario helps a lot in driving the solution and identifying all possible outcomes.

References

© 2017 | About | Contact | Follow me on Twitter | Powerered by Hucore & Hugo