This guide will demonstrate and provide examples on how to use the generate_series function in PostgreSQL to generate sequential data such as numbers and dates and use them in your queries.
The PostgreSQL generate_series Function Syntax
The generate_series function generates sequential data ranging from a starting value to a stopping value, incremented with a given step value.
The syntax for the generate_series function is as follows:
generate_series(start, stop, step)
Note that:
- The start value is the value the series will start at, and is included in the resulting set of data
- The stop value is the value the series will stop at, and is also included in the resulting set of data if it is am increment of the step value
- The step is the amount each value included in the series is incremented by to get the next value, before the stop value is reached
Generating a Series of Numbers
The simplest example of using the PostgreSQL generate_series function is generating a range of integer numbers:
SELECT generate_series(1, 9, 1);
Above, a SELECT statement outputs the results of a call to the generate_series function. The function is instruction to generate a set of numbers starting at 1, ending at 9, with an increment of 1, resulting in the following output:
generate_series ----------------- 1 2 3 4 5 6 7 8 9 (9 rows)
Decimal numbers can also be provided:
SELECT generate_series(1.1, 1.8, 0.1);
Which results in:
generate_series ----------------- 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 (8 rows)
If the end value is not a full increment away from the preceding value, it is not included in the results:
SELECT generate_series(0, 7, 2); generate_series ----------------- 0 2 4 6 (4 rows)
Generating a Series of Dates
In addition to generating series of integers or decimal numbers, generate_series can also generate a series of dates, with a time interval used as the step value.
SELECT generate_series('2023-01-01'::date, '2023-01-06'::date, '1 day'::interval);
The above statement will generate a list of dates from January 1 to January 6 2023, with a 1 day step/interval between the values, resulting in the following:
generate_series ------------------------ 2023-01-01 00:00:00+00 2023-01-02 00:00:00+00 2023-01-03 00:00:00+00 2023-01-04 00:00:00+00 2023-01-05 00:00:00+00 2023-01-06 00:00:00+00 (6 rows)
Though date objects were supplied, generate_series will return datetime objects, so any time interval can be supplied:
SELECT generate_series('2023-01-01'::date, '2023-01-06'::date, '18 hours'::interval);
Resulting in the following output:
generate_series ------------------------ 2023-01-01 00:00:00+00 2023-01-01 18:00:00+00 2023-01-02 12:00:00+00 2023-01-03 06:00:00+00 2023-01-04 00:00:00+00 2023-01-04 18:00:00+00 2023-01-05 12:00:00+00 (7 rows)
Generating a Series of TimeStamps
Instead of dates, timestamps can be supplied so that the start and stop points, and interval, can be defined down to the second:
SELECT generate_series('2023-01-01 00:00:00'::timestamp, '2023-01-01 05:00:00'::timestamp, '1 hour'::interval);
The above example results in the output:
generate_series --------------------- 2023-01-01 00:00:00 2023-01-01 01:00:00 2023-01-01 02:00:00 2023-01-01 03:00:00 2023-01-01 04:00:00 2023-01-01 05:00:00 (6 rows)
Generating a Series in Reverse Order
To generate a series in descending order, simply supply the start and end versions in reverse (so that the start value is greater than the end value), and provide a negative value for the step/increment:
SELECT generate_series(7, 2, -1);
The above example will result in the following output:
generate_series ----------------- 7 6 5 4 3 2 (6 rows)
Using generate_series in Queries
The generate_series function is useful for grouping data – for example by generating a list of dates, you can then query a database to group data on those dates and then perform additional actions, like summing or averaging the values for that date, and outputting that summarized information. In the example below, the number of new users that are created in a users table for each date in the past week is calculated:
SELECT d::date, COUNT(u.id) AS new_users FROM generate_series(CURRENT_DATE - INTERVAL '1 week', CURRENT_DATE, '1 day') AS d LEFT JOIN users u ON u.created_at::date = d::date GROUP BY d::date ORDER BY d::date;
In the above example, the CURRENT_DATE function is used as the starting value for generate_series, creating a series of dates for the past week. This is then LEFT JOINED to the users table, through the created_at column, and a COUNT is taken. This query returns a result set with two columns: d::date (the date) and new_users (the number of new users for that date).
Unlike simply grouping the values in the created_at column by date, by using generate_series, a row will be created in the results even if no users signed up on that date and it does not appear in the created_at column. Missing date values for dates with no user sign ups would make the data unsuitable for visualization with graphs.
Using generate_series to Generate Data
The generate_series function is also commonly used to generate test or mock data. In the example below, the test_data table is populated with a unique sequential ID and a random value.
INSERT INTO test_data (id, random_number, sequential_date) VALUES (generate_series(1, 10), trunc(random()*10 * 2), generate_series('2023-01-01'::date, '2023-01-10'::date, '1 day'::interval));
The sequential unique ID for each row is generated using generate_series, and a random number is generated along with a sequential date – in this case the data in the test_data table is meaningless, but randomly generated data combined with data from generate_series could be used to create fake data representing people with different birthdays, heights, weights, and other attributes.