If you learn one new T-SQL (i.e., Microsoft SQL Server) concept today it should be ROW_NUMBER(). Introduced in SQL 2005, this function is one of 4 window ranking functions (the others are RANK(), DENSE_RANK() and NTILE()). Oracle SQL has a similar capability.
Let’s first create some sample data we can use for a demonstration.
CREATE TABLE #ROWNUMBER_Demo ( ID INT ,MyDate DATETIME ,Price MONEY ,PRIMARY KEY (ID, MyDate) );
INSERT INTO #ROWNUMBER_Demo SELECT 1, '2012-03-04', 23.22 UNION ALL SELECT 1, '2012-03-15', 25.15 UNION ALL SELECT 1, '2012-05-10', 28.47 UNION ALL SELECT 2, '2012-02-28', 15.10 UNION ALL SELECT 2, '2012-03-22', 18.22 UNION ALL SELECT 2, '2012-05-01', 21.43 UNION ALL SELECT 3, '2012-04-01', 45.06 UNION ALL SELECT 3, '2012-05-12', 48.23 UNION ALL SELECT 3, '2012-06-01', 51.66;
SELECT * FROM #ROWNUMBER_Demo;
The results in our sample table are:
ID MyDate Price 1 2012-03-04 00:00:00.000 23.22 1 2012-03-15 00:00:00.000 25.15 1 2012-05-10 00:00:00.000 28.47 2 2012-02-28 00:00:00.000 15.10 2 2012-03-22 00:00:00.000 18.22 2 2012-05-01 00:00:00.000 21.43 3 2012-04-01 00:00:00.000 45.06 3 2012-05-12 00:00:00.000 48.23 3 2012-06-01 00:00:00.000 51.66
To use the ROW_NUMBER() function you must supply an OVER predicate consisting of PARTITION (optional) and ORDER BY (required). Let’s take a look at an example and see the results.
SELECT ID, MyDate ,rn1=ROW_NUMBER() OVER (PARTITION BY ID ORDER BY MyDate) ,rn2=ROW_NUMBER() OVER (ORDER BY MyDate) FROM #ROWNUMBER_Demo ORDER BY ID, MyDate;
The results returned are:
ID MyDate rn1 rn2 1 2012-03-04 00:00:00.000 1 2 1 2012-03-15 00:00:00.000 2 3 1 2012-05-10 00:00:00.000 3 7 2 2012-02-28 00:00:00.000 1 1 2 2012-03-22 00:00:00.000 2 4 2 2012-05-01 00:00:00.000 3 6 3 2012-04-01 00:00:00.000 1 5 3 2012-05-12 00:00:00.000 2 8 3 2012-06-01 00:00:00.000 3 9
For rn1 (where PARTITION is applied) you see that it assigns row numbers 1, 2, 3 to the rows within each ID (the column specified to PARTITION on) based on the ordering of MyDate. For the case without PARTITION (rn2), the entire set is the partition so the row numbers are 1, 2, …, 9, again based on the ordering of the MyDate column.
Eliminating Duplicates
ROW_NUMBER() is a particularly fast way to eliminate duplicate records. Suppose you want to return only one record within each ID; specifically the one whose date is the latest. You must note that ROW_NUMBER() cannot be used on the WHERE clause, so it is necessary to wrap this query in an outer query as follows:
SELECT ID, MyDate FROM ( SELECT ID, MyDate ,rn1=ROW_NUMBER() OVER (PARTITION BY ID ORDER BY MyDate DESC) FROM #ROWNUMBER_Demo ) a WHERE rn1 = 1 ORDER BY ID;
Note the DESC sort applied to MyDate. These results are:
ID MyDate 1 2012-05-10 00:00:00.000 2 2012-05-01 00:00:00.000 3 2012-06-01 00:00:00.000
Of course, you’re probably saying you can achieve the same results using a GROUP BY (and you’d be correct), like this.
SELECT ID, MyDate=MAX(MyDate) FROM #ROWNUMBER_Demo GROUP BY ID ORDER BY ID;
But try using that query to also return the Price column that corresponds to the MAX date. You cannot! But you can when you use ROW_NUMBER().
SELECT ID, MyDate, Price FROM ( SELECT ID, MyDate, Price ,rn1=ROW_NUMBER() OVER (PARTITION BY ID ORDER BY MyDate DESC) FROM #ROWNUMBER_Demo ) a WHERE rn1 = 1 ORDER BY ID;
Results:
ID MyDate Price 1 2012-05-10 00:00:00.000 28.47 2 2012-05-01 00:00:00.000 21.43 3 2012-06-01 00:00:00.000 51.66
Conclusion
ROW_NUMBER() is a very versatile T-SQL window ranking function. Besides using it to eliminate duplicates, it has a great many other very practical purposes that we’ll explore in future entries on the Aware blog.