Rob Gonda's Blog

MSSQL and Pagination

Ray posted a nice entry about ColdFusion and Pagination, useful and easy to implement. I commented that it should only be used for small recordsets, because even if you do not show all records, transferring 100k entries from the DB to the application server to only display 10 or 20 is not efficient at all.

Some people suggested using LIMIT and OFFSET, which can only by done with mysql.... I hate to admit it, but mssql is so much better than mysql, expect for that ... I can't believe that we still don't have that functionality... anyways, Adam also posted a link to an msdn article explaining a couple of different approaches to tackle this problem.

I, on the other hand, have been using a stored procedure for years to achieve this goal. I have it in production retrieving pages out of a table with over 5 million records... at it works flawlessly. Let me add that you do need indexes, defragging, maintenance, lots of RAM ... this SP doesn't do miracles :)

You can download the SP here. It works perfectly with mssql 2000 / 2005. Let me emphasize that I did not write this. I got it from a sql exchange article a few years ago. Yes, it can be improved I guess, since one of its major limitations is lacking of a way to query over left/right outer joins. You can query aggregated functions, embedded  selects, inner joins (ansi format) ... You can still get around them using views, which is how I've done it in the past... in fact, if the query is too complicated, you might as well write a view for it anyways.

As far as the ColdFusion side, here's an invocation I have to one of my tables:

    <!---
        get multiple users in pagination
    --->
    <cffunction name="getUsers" access="public" output="No" returntype="struct">
        <cfargument name="UserName" required="No" type="string" default="" />
        <cfargument name="Alias" required="No" type="string" default="" />
        <cfargument name="ageFrom" required="No" type="numeric" default="0" />
        <cfargument name="ageTo" required="No" type="numeric" default="0" />
        <cfargument name="gender" required="No" type="string" default="0" />
        <cfargument name="Race" required="No" type="string" default="0" />
        <cfargument name="Admin" required="No" type="numeric" default="0" />
       
        <cfargument name="groupNumber" required="No" type="numeric" default="1" />
        <cfargument name="groupSize" required="No" type="numeric" default="5" />
       
        <cfset var returnStruct = structNew() />
        <cfset var qGetUsers = '' />
        <cfset var recordcount = '' />
       
       
        <cfset var SqlCols = '' />
        <cfset var SqlWhere = '' />
        <cfset var OrderBy = '' />
       
        <!--- columns --->
        <cfsavecontent variable="SqlCols">
            pk_users, UserName, Alias, gender, Age, race
        </cfsavecontent>
       
        <!--- condition --->
        <cfoutput>
            <cfsavecontent variable="SqlWhere">
                0 = 0
                <cfif arguments.Admin>
                    AND    Admin = 1
                <cfelse>
                    AND    Admin = 0
                </cfif>
                <cfif len(arguments.UserName)>
                    AND UserName like '#arguments.UserName#%'
                </cfif>
                <cfif len(arguments.Alias)>
                    AND Alias like '#arguments.Alias#%'
                </cfif>
                <cfif arguments.ageFrom>
                    AND AGE >= #arguments.ageFrom#
                </cfif>
                <cfif arguments.ageTo>
                    AND AGE <= #arguments.ageTo#
                </cfif>
                <cfif arguments.gender neq '0'>
                    AND gender like '#arguments.gender#%'
                </cfif>
                <cfif arguments.Race neq '0'>
                    AND Race = '#arguments.Race#'
                </cfif>
            </cfsavecontent>
        </cfoutput>
       
        <!--- order --->
        <cfsavecontent variable="OrderBy">
            UserName
        </cfsavecontent>
       
       
        <!--- query --->
        <cfstoredproc procedure="sp_selectnextn" datasource="#variables.instance.dsn#">
            <cfprocresult name="qGetUsers" resultset="1">
            <cfprocresult name="recordcount" resultset="2">
             <cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR" dbvarname="TableName" value="users">
             <cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR" dbvarname="Columns" value="#SqlCols#">
             <cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR" dbvarname="IdentityColumn" value="pk_users">
             <cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR" dbvarname="GroupNumber" value="#arguments.groupNumber#">
             <cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR" dbvarname="GroupSize" value="#arguments.groupSize#">
             <cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR" dbvarname="SqlWhere" value="#SqlWhere#">
             <cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR" dbvarname="SqlOrderBy" value="#OrderBy#">
        </cfstoredproc>

        <cfset returnStruct.rs = qGetUsers />
        <cfset returnStruct.rc = recordcount.countAll />
       
        <cfreturn returnStruct />
    </cffunction>

It basically takes the table(s), columns, primary key to use for pagination, page size and number, where statement, and order by statement. There is a limitation of size of your statements ... You won't be able to use a Where clause larger than 4000 characters.

The original SP did not have the full recordcount, which allows to calculate the number of pages. It does require more CPU, but I didn't want to just guess if there's a next page given the count in one particular page.

If you have any comments or suggestions, feel free to add them below.

Enjoy!

TrackBacks
There are no trackbacks for this entry.

Trackback URL for this entry:
http://www.robgonda.com/blog/trackback.cfm?D44EF3CA-3048-7431-E45806503C72CE2B

Comments
Hi Rob,

Thanks alot for sharing very informative proc, to be honest this tech is much better than doing pagination in CF. I really appreacite your efforts and thanks for msdn links as well.

Thanks
# Posted By Sana Ullah | 4/26/06 3:33 AM
Here's an oracle version of the syntax for paging in recordsets , but inline SQL.
# Posted By dc | 4/26/06 8:07 AM
oops here it is....

limitation are that can only order in one column (in this case - bl.name, see how it's used in the aggregate functions)

SELECT
*
FROM
(
SELECT
bl.bl_id AS bl_id
, bl.name AS bl_name

-- psuedo columns for paging
, ROW_NUMBER() OVER (ORDER BY bl.name ASC) AS row_no
, count(*) OVER() AS row_total
, CEIL(count(*) OVER()/20) AS row_page_total
, WIDTH_BUCKET(ROW_NUMBER() OVER (ORDER BY bl.name ASC), 1, (count(*) OVER() + (20 - MOD(count(*) OVER(), 20))) , CEIL(count(*) OVER()/20)) AS row_page
, DECODE(MOD(ROW_NUMBER() OVER (ORDER BY bl.name ASC), 20), 1, 'Y', 'N') AS row_page_start
, DECODE(MOD(ROW_NUMBER() OVER (ORDER BY bl.name ASC), 20), 0, 'Y', 'N') AS row_page_end
FROM
bl
         
WHERE
1 = 1
and lower(bl.bl_id) = ?

GROUP BY
bl.bl_id,bl.name
)

WHERE
row_page = 1

ORDER BY
row_no
# Posted By dc | 4/26/06 8:10 AM
Here is a reliable method I use that is not restricted in its joining options. Even though it uses 3 nested select statements, its actually surprisingly fast. I am currently using itto page through tables using your ajaxcfc :)


<cfset tInnerRowCnt = startRow + recsPerPage - 1>
<!--- if your on the last page, modify the numbers --->
<cfif tInnerRowCnt gt numRecs>
   <cfset innerRowCnt = numRecs>
   <cfset recsPerPage = tInnerRowCnt - startRow + 1>
</cfif>      

<cfquery>
SELECT * FROM (   
   SELECT top #recsPerPage# * FROM (
      SELECT TOP #innerRowCnt# #ColList#
      FROM myTables...
      WHERE ...
      ORDER BY #sortStr#) as foo
   ORDER BY #reverseSortStr#) as bar
ORDER BY #sortStr#
</cfquery>

Here, ReverseSortStr is the Order By clause with asc changed to desc, and desc changed to asc. The other thing to remember is to include the primary key at the end of the ORDER BY clause (if its not already present).
# Posted By Stuart MacAdam | 4/26/06 9:46 AM
Hi Stuart,
The last line in your post is not clear to me. I have been using the same MS SQL Server logic in stored procedure with dynamic ORDER BY clause.
You mentioned that 'primary key at the end of the ORDER BY clause is reqd.' and if I understand correctly, this is not true.
# Posted By Dipak Parikh | 4/26/06 12:44 PM
Great work Rob!
# Posted By Sami Hoda | 4/26/06 12:46 PM
The one problem with this Stored Proc is that it's named with "sp_" which SQL identifies as a system Stored Proc and does a scan of the system DB first, creating a performance hit...

SQL 2005/Express (as well as Oracle among others) make this method moot since you can select specific rows of a recordset in the SQL now. However, this is the nicest SQL 2000 SP I've seen so far.
# Posted By Doug | 4/26/06 4:18 PM
Hi Dipak,

The problem will show itself when you sort by non-unique fields, so I just tack on the primary key at the end to ensure a stable sort. For additional info on this see http://josephlindsay.com/archives/2005/05/27/pagin...
# Posted By Stuart MacAdam | 4/26/06 7:48 PM
It all seams a little over complicated to me, when using ADODB you'd use rs.move(20) then pull back 10 records.
Perhaps "StartRow" should be added to CFQuery to compliment "MaxRows"
# Posted By Rebecca York | 4/27/06 5:01 AM
Rebecca,

It is complicated indeed, but the problem with either StartRow or MaxRows, is that it doesn't really affect the fact that the DB will load the entire table in RAM and trasfer it to the Application Server.

Limiting the query in the database level helps every single piece of the puzzle. The SQL query will take less time to run, the communication between SQL and CF (odbc, jdbc) will be faster, and the load at CF is smaller, because there is no filtering.

Hope this helps.
# Posted By Rob Gonda | 4/27/06 7:43 AM
One way that mysql has always been better than MSSQL is efficiency and speed. You can go a lot farther with cheaper hardware. Also, I've seen benchmarks comparing MSSQL 2000 and earlier to mysql, and mysql can handle a LOT more concurrent users. I can't remember the exact numbers, but MSSQL does fine up to about 2000 users, and then it tanks. MySQL, Oracle, DB2 all keep sailing higher, the more concurrent users you add. We're talking about one server, with the same test used for all the DBs.
# Posted By Jacob Munson | 4/28/06 9:57 AM
Jacob,

I am amazed by your comment. Thus far all I heard (and agreed to) is that mysql is a toy DB and won't handle enterprise load. To my knowledge, Oracle and MsSQL are infinitely better for the enterprise. All the performance test reports list both of them, none list mysql. I know for example that my VoIP CDR database is handling 500 transactions per second, including triggers that run Stored Procedures for billing and real time processing. That's running on a Windows 2003 64bits running MsSQL 2005 with 64 Gigs of RAM ... Besides the fact that mysql hasn't had stored procs and triggers until the latest version (and they still don't work properly), I doubt that it will handle this much load.

Anyways, sql wars can be another topic for a new thread :) this one was meant to give props to mysql for the one piece of functionality that mssql does not have.
# Posted By Rob Gonda | 4/28/06 11:04 AM
Rob, you must be joking!!!! mysql is a toy database? Lower your head in shame right now.
You say: "Thus far all I heard (and agreed to)..." but have you tried it???
Do you have any idea how many very large scale enterprises are using mysql? With VERY large databases?
The setup that you described for your voip cdr is silly. You'd probably need a fraction of that hardware to run that db with mysql.
Seriously man... You should rely on first hand knowledge. Not on hearsay... MSSql is probably one of the lesser databases around. And I KNOW that from first hand experience of over a dozen years of enterprise...

With love,
Ratpointer.
# Posted By Ratpointer | 5/10/06 10:39 AM
This blog is running version 5.9.003. Contact Blog Owner