SQL Query Batch for Analysis of Checkouts by Branch

bgaydos · June 27, 2025, 6:51pm

Collection Health Holds.sql (54.3 KB)
(latest version uploaded 7/31/2025)

This “Collection Health Holds Query” has been in development for over a year. I first mentioned it in this post and presented a snippet of it at IUG 2024.

Using the filled hold and checkout transactions in circ_trans, output an analysis of checkouts by branch for the [previous] reporting month:

      A.  Reporting month (or first month of the reporting period if adjusted)
      B.  Display order
      C.  Branch/Collection/Pickup Location Name
      D.  Count of holds checked out
      E.  Count of browsed items checked out
      F.  Ratio of D:E
      G.  Count of holds filled from the local collection
      H.  Count of holds transited in from another branch
      I.   Ratio of G:H ± an estimated margin of error calculated as J÷E
      J.  Count of browsed checkouts having a non-local branch location code prefix
      K.  Count of inbound INN-Reach items
      L.  Count of outbound INN-Reach items
      M.  Ratio of K:L
      N.  Count of patron visits having checkouts
      O.  Checkouts per visit, calculated as (D+E)/N
      P.   Coincidental Checkouts (see Wes’ July 10 suggestion in this thread)

Most useful if your circ_trans table is configured to retain at least 32 days of history; ours is configured to retain 1,110 days or 36½ months. You could run it weekly, but the smaller sample size will be more sensitive to spikes in circ activity, closed holidays, unscheduled branch closures, etc. The ratios are expressed as percentages of a total.

Customization will of course be required for your site, primarily to map location codes and stat groups to branches. Note that this mapping is required in three places in the query batch (there are 30 SQL statements in here; table and index creation, inserts, updates and the final select for the report itself, plus two optional diagnostic queries).

In addition to producing insightful statistics, the code presents useful examples of:

CTEs (Common Table Expressions)
Date functions and date arithmetic using INTERVAL
Regular Expression string matching
LEFT OUTER JOIN used to find non-matching rows
CROSS JOINs
UNIONs
INTERSECT
INNER JOINing a table to itself (self-referential) due to a hierarchical row relationship
Using LIMIT and OFFSET for pagination of output
Creation of Temporary Tables
Building a Primary Key on a temp table
Building Indexes on a temp table
INSERTing data into temp tables
UPDATE statements on temp tables
The ROW_NUMBER() Window Function
The LAG() Window Function
Use of the LATERAL keyword (see the code comments for more about this)
The GENERATE_SERIES() function

The concept of “visit” bears some explanation: Since the scope of this report is limited to checkout activity, so is the calculation of visits. Acknowledging the many possible reasons a patron visits a library branch, a visit in this report is purely defined by a patron checking out one or more items on a given day at a given branch, and if a long enough gap occurs between checkouts, this report considers that to be a separate visit. In the tmpReportParameters table, I have defined that gap as 90 minutes, but I may later extend that to account for patrons who pick up holds and/or browse before and after a program they’re attending. You are likewise encouraged to reconsider this parameter with your patron behavior in mind.

The visits calculation was only added because it is needed for the coincidental checkouts calculation, and the checkouts per visit column was only added because the data to produce it was available within this report. Checkouts per visit calculations based on door counter data will produce different results, particularly in the case of a parent showing up with four kids (5 door counts) and all of them are using the same library card to check out items.

When run for the default one-month reporting period, this query batch completes for me in 2-3 minutes. It naturally takes a wee bit longer if you’re running a YTD report, full year, or even all of circ_trans history… YMMV. Enjoy!

bgaydos · July 8, 2025, 4:41pm

My director asked for a breakdown of the holds/checkouts from drive-up windows and hold pickup lockers for those branches having those additional pickup locations, so I added support for that yesterday.

wesochuck · July 10, 2025, 1:10pm

Another thing I’ve often wondered is if the hold CKO caused a browse CKO too. Meaning, when someone comes in to pick up their hold, do they also CKO something else as well? Fun consideration for the next iteration?

bgaydos · July 24, 2025, 11:56am

Wes Osborn: Another thing I’ve often wondered is if the hold CKO caused a browse CKO too. Meaning, when someone comes in to pick up their hold, do they also CKO something else as well?

Challenge Accepted, @wesochuck! I have updated the original post at the top of this thread with the rewritten report and an updated list of the output columns, SQL features used and commentary. To those who downloaded the earlier version of the report, note that refactoring resulted in renamed tables and columns and the report now takes noticeably longer to complete, despite the addition of a number of indexes.

When submitting this report to my director, I include on a separate tab the output of the raw data drilldown from the commented-out query at the bottom of the script. Turn on Excel filtering and the origin of curious results is plainly visible in the source circ_trans rows.

bgaydos · July 28, 2025, 2:38pm

After running the drilldown query (the final query commented-out at the bottom of the Collection Health Holds.sql script), an attempt to copy and paste the full results into Excel can easily exceed the capacity of your PC’s paste buffer. To get around this, try slicing the output into manageable chunks; these LIMIT/OFFSET clauses can be used to output 20,000 rows at a time:

ORDER BY
    checkout_date,
    row_id,
    op_code
LIMIT 20000
OFFSET 0  --20000  --40000  --60000  --80000  --100000
;

bgaydos · September 17, 2025, 7:06am

I’m trying to gauge potential interest in multiple sessions (2 or 3 parts) at IUG 2026 where we can take a deep dive into the big query which is the subject of this thread. I wanted to offer a pre-con session but there’s no way of doing that without charging a fee to compensate the conference facility for AV and food expenses, plus the added expense of an additional night’s stay at the hotel for each participant. Please like this post if you think you would attend one or more such sessions. The Program Committee will begin meeting in October to consider proposals. Thanks!

jmgold · September 17, 2025, 12:45pm

This may not be what you’re thinking, but besides the query itself, I think what I might be particularly interested in attending would be a session around the purely developmental side of such a query. How do you plan out the structure, researching what data is available, what approach do you take to building out the code as you work…those sorts of things. What’s the path for someone to go from writing basic queries to this higher order of complexity?

bgaydos · September 17, 2025, 4:29pm

As usual @jmgold, that’s an excellent idea! Something like what you propose would be most appropriate in the first such session. The reality is that the end product came about after a long series of rewrites, edits and refactoring over a year and a half as the desired output evolved.