Titles with the highest and lowest page count - rough draft

This data is messy in the MARC record and the SQL is a vibe code mash up of several different SQL scripts, so double check for accuracy and feel free to hit that edit button on this wiki to cleanup the code as needed.

-- ====================================================================================
-- Set these variables to control the query's output
-- ====================================================================================
DECLARE @TopN int = 5;                  -- Number of titles to show for each category
DECLARE @MinPageCount int = 10;         -- Excludes records with page counts BELOW this value
DECLARE @MaxPageCount int = 5000;       -- Excludes records with page counts ABOVE this value
-- ====================================================================================


-- CTE 1: Extract and clean the page number from the MARC 300$a field.
WITH PageExtraction AS
(
    SELECT
        bts.BibliographicRecordID,
        -- This logic finds the first string of numbers in the 300$a field and attempts to convert it to an integer.
        TRY_CAST(
            LEFT(
                SUBSTRING(bts.Data, PATINDEX('%[0-9]%', bts.Data), 8000),
                PATINDEX('%[^0-9]%', SUBSTRING(bts.Data, PATINDEX('%[0-9]%', bts.Data), 8000) + 'Z') - 1
            )
        AS INT) AS PageCount
    FROM
        polaris.polaris.BibliographicTagsAndSubfields_View AS bts
    WHERE
        bts.TagNumber = 300
        AND bts.Subfield = 'a'
        -- Pre-filter to only process tags that actually contain a number.
        AND PATINDEX('%[0-9]%', bts.Data) > 0
),

-- CTE 2: Filter results, join to get TOM, and rank them by page count within each TOM.
RankedPages AS
(
    SELECT
        pe.BibliographicRecordID,
        br.BrowseTitle,
        pe.PageCount,
        mtom.MARCTypeOfMaterialID,
        mtom.Description AS TypeOfMaterial,
        -- Rank records by page count descending (highest first) FOR EACH TOM.
        ROW_NUMBER() OVER (PARTITION BY mtom.MARCTypeOfMaterialID ORDER BY pe.PageCount DESC) AS RankHighest,
        -- Rank records by page count ascending (lowest first) FOR EACH TOM.
        ROW_NUMBER() OVER (PARTITION BY mtom.MARCTypeOfMaterialID ORDER BY pe.PageCount ASC) AS RankLowest
    FROM
        PageExtraction AS pe
    JOIN
        polaris.polaris.BibliographicRecords AS br
        ON br.BibliographicRecordID = pe.BibliographicRecordID
    JOIN
        polaris.polaris.MARCTypeOfMaterial AS mtom
        ON br.PrimaryMARCTOMID = mtom.MARCTypeOfMaterialID
    WHERE
        -- NEW: Filter for records within the valid page count range.
        pe.PageCount BETWEEN @MinPageCount AND @MaxPageCount
        AND br.RecordStatusID != 4 
        AND EXISTS (
            SELECT 1
            FROM polaris.polaris.ItemRecords AS ir
            WHERE
                ir.AssociatedBibRecordID = pe.BibliographicRecordID
                AND ir.RecordStatusID != 4 
                AND ir.ItemStatusID NOT IN (13, 15) 
        )
)

-- Final SELECT: Combine the highest and lowest records for each TOM.
SELECT
    'Highest Page Count' AS Category,
    rp.TypeOfMaterial,
    rp.MARCTypeOfMaterialID,
    rp.BibliographicRecordID,
    rp.BrowseTitle,
    rp.PageCount,
    rp.RankHighest AS Rank
FROM
    RankedPages AS rp
WHERE
    rp.RankHighest <= @TopN

UNION ALL

SELECT
    'Lowest Page Count' AS Category,
    rp.TypeOfMaterial,
    rp.MARCTypeOfMaterialID,
    rp.BibliographicRecordID,
    rp.BrowseTitle,
    rp.PageCount,
    rp.RankLowest AS Rank
FROM
    RankedPages AS rp
WHERE
    rp.RankLowest <= @TopN
ORDER BY
    rp.TypeOfMaterial,
    Category DESC,
    Rank ASC;
1 Like

Hi @wesochuck.  I’m curious as to the intended application of this query.  Highest and Lowest aside, have you successfully used the page count calculation to estimate linear shelf space for a collection?  It occurs to me that such a calculation would also want to take hardcover/paperback into account, particularly for some children’s titles where the covers are as thick as the content.

It was a request on Discourse that was too long to paste into Discourse - the requester didn’t provide much additional info.

Thank you!

It was a request from a coworker who handles our social media posts. She wants to do a post with the the two books.

1 Like