Finding Potential Duplicate Bibliographic Records with Saved SQL Searches

These saved SQL searches can be used in LEAP or the Polaris Desktop Client as a Find Tool to help identify potential duplicate bibliographic records.

CLC has also published our full-featured UI front end for managing dups: CLC Bib Dedupe Tool Code

The searches below are designed to return BibliographicRecordID values that may represent duplicate bib records based on shared identifiers such as:

  • MARC 035 (OCLC) System Control Numbers

  • ISBNs

  • UPCs

These are intended to support review workflows. They identify potential duplicates, not confirmed duplicates.

Excluding Known Non-Duplicates

Each search also supports excluding records that have already been reviewed and determined not to be duplicates.

This is done through a bib record set used as a “non-duplicates” exclusion list:

where brs.RecordSetID = 31456

Records added to that bib record set will be excluded from future results. This gives staff a way to mark records as reviewed without having to change or suppress the bib records themselves.

You SHOULD change the RecordSetID value if your library uses a different bib record set for this purpose.

Suggested Workflow

  1. Save each query as a SQL search.

  2. Use the saved search in LEAP or the Polaris Desktop Client as a Find Tool.

  3. Review the returned bib records to determine whether they are true duplicates.

  4. Add records that are confirmed not to be duplicates to the designated bib record set.

  5. Future searches will exclude those reviewed non-duplicates automatically.

These searches can help narrow down duplicate review work, especially when paired with holds-based versions that prioritize records currently affecting patron activity.


035 Match — Records with Active Holds

This search finds potential duplicate bib records based on shared MARC 035 System Control Numbers. It only returns records with selected active hold statuses.

select distinct b35.BibliographicRecordID
from polaris.polaris.BibliographicTag035Index b35
join polaris.polaris.SysHoldRequests shr
    on shr.BibliographicRecordID = b35.BibliographicRecordID
left join
(
    select brs.BibliographicRecordID
    from polaris.polaris.BibRecordSets brs
    where brs.RecordSetID = 31456
) not_dupes
    on b35.BibliographicRecordID = not_dupes.BibliographicRecordID
where b35.SystemControlNumber in
(
    select b35.SystemControlNumber
    from polaris.polaris.BibliographicTag035Index b35
    group by b35.SystemControlNumber
    having COUNT(distinct b35.BibliographicRecordID) between 2 and 10
)
and shr.SysHoldStatusID in (1, 3, 4)
and not_dupes.BibliographicRecordID is null

035 Match — All Potential Duplicates

This version finds potential duplicate bib records based on shared MARC 035 System Control Numbers, regardless of hold activity.

select distinct b35.BibliographicRecordID
from polaris.polaris.BibliographicTag035Index b35
left join
(
    select brs.BibliographicRecordID
    from polaris.polaris.BibRecordSets brs
    where brs.RecordSetID = 31456
) not_dupes
    on b35.BibliographicRecordID = not_dupes.BibliographicRecordID
where b35.SystemControlNumber in
(
    select b35.SystemControlNumber
    from polaris.polaris.BibliographicTag035Index b35
    group by b35.SystemControlNumber
    having COUNT(distinct b35.BibliographicRecordID) between 2 and 10
)
and not_dupes.BibliographicRecordID is null

ISBN Match — Records with Holds

This search finds potential duplicate bib records based on shared ISBNs and matching material type. It excludes selected material types and only returns records with selected hold statuses.

select distinct br.BibliographicRecordID
from Polaris.polaris.BibliographicISBNIndex bii
join polaris.polaris.BibliographicRecords br
    on br.BibliographicRecordID = bii.BibliographicRecordID
join polaris.polaris.SysHoldRequests shr
    on shr.BibliographicRecordID = br.BibliographicRecordID
left join
(
    select brs.BibliographicRecordID
    from polaris.polaris.BibRecordSets brs
    where brs.RecordSetID = 31456
) not_dupes
    on bii.BibliographicRecordID = not_dupes.BibliographicRecordID
where br.PrimaryMARCTOMID not in (6, 36, 41, 50)
and not_dupes.BibliographicRecordID is null
and exists
(
    select 1
    from polaris.polaris.BibliographicISBNIndex _bii
    join polaris.Polaris.BibliographicRecords _br
        on _br.BibliographicRecordID = _bii.BibliographicRecordID
    where bii.ISBNString = _bii.ISBNString
    and br.PrimaryMARCTOMID = _br.PrimaryMARCTOMID
    and exists
    (
        select 1
        from polaris.polaris.BibliographicISBNIndex __bii
        where __bii.ISBNString = _bii.ISBNString
    )
    group by _bii.ISBNString, _br.PrimaryMARCTOMID
    having count(distinct _br.BibliographicRecordID) > 1
)
and shr.SysHoldStatusID in (3, 4)

ISBN Match — All Potential Duplicates

This version finds potential duplicate bib records based on shared ISBNs and matching material type, regardless of hold activity.

select distinct bii.BibliographicRecordID
from polaris.polaris.BibliographicISBNIndex bii
left join
(
    select brs.BibliographicRecordID
    from polaris.polaris.BibRecordSets brs
    where brs.RecordSetID = 31456
) not_dupes
    on bii.BibliographicRecordID = not_dupes.BibliographicRecordID
join Polaris.polaris.BibliographicRecords br
    on br.BibliographicRecordID = bii.BibliographicRecordID
where br.PrimaryMARCTOMID not in (6, 36, 41, 50)
and bii.ISBNString in
(
    select isbnstring
    from polaris.polaris.BibliographicISBNIndex as bi
    join polaris.polaris.BibliographicRecords br
        on br.BibliographicRecordID = bi.BibliographicRecordID
    group by isbnstring, br.PrimaryMARCTOMID
    having count(distinct bi.BibliographicRecordID) > 1
)
and not_dupes.BibliographicRecordID is null

UPC Match — Records with Holds

This search finds potential duplicate bib records based on shared UPCs. It only returns matches where the duplicate candidates share the same material type and have selected hold statuses.

select distinct bui.BibliographicRecordID
from polaris.polaris.BibliographicUPCIndex bui
join polaris.polaris.SysHoldRequests shr
    on bui.BibliographicRecordID = shr.BibliographicRecordID
left join
(
    select brs.BibliographicRecordID
    from polaris.polaris.BibRecordSets brs
    where brs.RecordSetID = 31456
) not_dupes
    on bui.BibliographicRecordID = not_dupes.BibliographicRecordID
where bui.UPCNumber in
(
    select bui.UPCNumber
    from polaris.polaris.BibliographicUPCIndex bui
    join polaris.polaris.BibliographicRecords br
        on bui.BibliographicRecordID = br.BibliographicRecordID
    group by bui.UPCNumber
    having COUNT(distinct bui.BibliographicRecordID) >= 2
    and MIN(br.PrimaryMARCTOMID) = MAX(br.PrimaryMARCTOMID)
)
and shr.SysHoldStatusID in (1, 4)
and not_dupes.BibliographicRecordID is null

UPC Match — All Potential Duplicates

This version finds potential duplicate bib records based on shared UPCs and matching material type, regardless of hold activity.

select distinct bui.BibliographicRecordID
from polaris.polaris.BibliographicUPCIndex bui
left join
(
    select brs.BibliographicRecordID
    from polaris.polaris.BibRecordSets brs
    where brs.RecordSetID = 31456
) not_dupes
    on bui.BibliographicRecordID = not_dupes.BibliographicRecordID
where bui.UPCNumber in
(
    select bui.UPCNumber
    from polaris.polaris.BibliographicUPCIndex bui
    join polaris.polaris.BibliographicRecords br
        on bui.BibliographicRecordID = br.BibliographicRecordID
    group by bui.UPCNumber
    having COUNT(distinct bui.BibliographicRecordID) >= 2
    and MIN(br.PrimaryMARCTOMID) = MAX(br.PrimaryMARCTOMID)
)
and not_dupes.BibliographicRecordID is null