These saved SQL searches can be used in LEAP or the Polaris Desktop Client as a Find Tool to help identify potential duplicate bibliographic records.
CLC has also published our full-featured UI front end for managing dups: CLC Bib Dedupe Tool Code
The searches below are designed to return BibliographicRecordID values that may represent duplicate bib records based on shared identifiers such as:
-
MARC 035 (OCLC) System Control Numbers
-
ISBNs
-
UPCs
These are intended to support review workflows. They identify potential duplicates, not confirmed duplicates.
Excluding Known Non-Duplicates
Each search also supports excluding records that have already been reviewed and determined not to be duplicates.
This is done through a bib record set used as a “non-duplicates” exclusion list:
where brs.RecordSetID = 31456
Records added to that bib record set will be excluded from future results. This gives staff a way to mark records as reviewed without having to change or suppress the bib records themselves.
You SHOULD change the RecordSetID value if your library uses a different bib record set for this purpose.
Suggested Workflow
-
Save each query as a SQL search.
-
Use the saved search in LEAP or the Polaris Desktop Client as a Find Tool.
-
Review the returned bib records to determine whether they are true duplicates.
-
Add records that are confirmed not to be duplicates to the designated bib record set.
-
Future searches will exclude those reviewed non-duplicates automatically.
These searches can help narrow down duplicate review work, especially when paired with holds-based versions that prioritize records currently affecting patron activity.
035 Match — Records with Active Holds
This search finds potential duplicate bib records based on shared MARC 035 System Control Numbers. It only returns records with selected active hold statuses.
select distinct b35.BibliographicRecordID
from polaris.polaris.BibliographicTag035Index b35
join polaris.polaris.SysHoldRequests shr
on shr.BibliographicRecordID = b35.BibliographicRecordID
left join
(
select brs.BibliographicRecordID
from polaris.polaris.BibRecordSets brs
where brs.RecordSetID = 31456
) not_dupes
on b35.BibliographicRecordID = not_dupes.BibliographicRecordID
where b35.SystemControlNumber in
(
select b35.SystemControlNumber
from polaris.polaris.BibliographicTag035Index b35
group by b35.SystemControlNumber
having COUNT(distinct b35.BibliographicRecordID) between 2 and 10
)
and shr.SysHoldStatusID in (1, 3, 4)
and not_dupes.BibliographicRecordID is null
035 Match — All Potential Duplicates
This version finds potential duplicate bib records based on shared MARC 035 System Control Numbers, regardless of hold activity.
select distinct b35.BibliographicRecordID
from polaris.polaris.BibliographicTag035Index b35
left join
(
select brs.BibliographicRecordID
from polaris.polaris.BibRecordSets brs
where brs.RecordSetID = 31456
) not_dupes
on b35.BibliographicRecordID = not_dupes.BibliographicRecordID
where b35.SystemControlNumber in
(
select b35.SystemControlNumber
from polaris.polaris.BibliographicTag035Index b35
group by b35.SystemControlNumber
having COUNT(distinct b35.BibliographicRecordID) between 2 and 10
)
and not_dupes.BibliographicRecordID is null
ISBN Match — Records with Holds
This search finds potential duplicate bib records based on shared ISBNs and matching material type. It excludes selected material types and only returns records with selected hold statuses.
select distinct br.BibliographicRecordID
from Polaris.polaris.BibliographicISBNIndex bii
join polaris.polaris.BibliographicRecords br
on br.BibliographicRecordID = bii.BibliographicRecordID
join polaris.polaris.SysHoldRequests shr
on shr.BibliographicRecordID = br.BibliographicRecordID
left join
(
select brs.BibliographicRecordID
from polaris.polaris.BibRecordSets brs
where brs.RecordSetID = 31456
) not_dupes
on bii.BibliographicRecordID = not_dupes.BibliographicRecordID
where br.PrimaryMARCTOMID not in (6, 36, 41, 50)
and not_dupes.BibliographicRecordID is null
and exists
(
select 1
from polaris.polaris.BibliographicISBNIndex _bii
join polaris.Polaris.BibliographicRecords _br
on _br.BibliographicRecordID = _bii.BibliographicRecordID
where bii.ISBNString = _bii.ISBNString
and br.PrimaryMARCTOMID = _br.PrimaryMARCTOMID
and exists
(
select 1
from polaris.polaris.BibliographicISBNIndex __bii
where __bii.ISBNString = _bii.ISBNString
)
group by _bii.ISBNString, _br.PrimaryMARCTOMID
having count(distinct _br.BibliographicRecordID) > 1
)
and shr.SysHoldStatusID in (3, 4)
ISBN Match — All Potential Duplicates
This version finds potential duplicate bib records based on shared ISBNs and matching material type, regardless of hold activity.
select distinct bii.BibliographicRecordID
from polaris.polaris.BibliographicISBNIndex bii
left join
(
select brs.BibliographicRecordID
from polaris.polaris.BibRecordSets brs
where brs.RecordSetID = 31456
) not_dupes
on bii.BibliographicRecordID = not_dupes.BibliographicRecordID
join Polaris.polaris.BibliographicRecords br
on br.BibliographicRecordID = bii.BibliographicRecordID
where br.PrimaryMARCTOMID not in (6, 36, 41, 50)
and bii.ISBNString in
(
select isbnstring
from polaris.polaris.BibliographicISBNIndex as bi
join polaris.polaris.BibliographicRecords br
on br.BibliographicRecordID = bi.BibliographicRecordID
group by isbnstring, br.PrimaryMARCTOMID
having count(distinct bi.BibliographicRecordID) > 1
)
and not_dupes.BibliographicRecordID is null
UPC Match — Records with Holds
This search finds potential duplicate bib records based on shared UPCs. It only returns matches where the duplicate candidates share the same material type and have selected hold statuses.
select distinct bui.BibliographicRecordID
from polaris.polaris.BibliographicUPCIndex bui
join polaris.polaris.SysHoldRequests shr
on bui.BibliographicRecordID = shr.BibliographicRecordID
left join
(
select brs.BibliographicRecordID
from polaris.polaris.BibRecordSets brs
where brs.RecordSetID = 31456
) not_dupes
on bui.BibliographicRecordID = not_dupes.BibliographicRecordID
where bui.UPCNumber in
(
select bui.UPCNumber
from polaris.polaris.BibliographicUPCIndex bui
join polaris.polaris.BibliographicRecords br
on bui.BibliographicRecordID = br.BibliographicRecordID
group by bui.UPCNumber
having COUNT(distinct bui.BibliographicRecordID) >= 2
and MIN(br.PrimaryMARCTOMID) = MAX(br.PrimaryMARCTOMID)
)
and shr.SysHoldStatusID in (1, 4)
and not_dupes.BibliographicRecordID is null
UPC Match — All Potential Duplicates
This version finds potential duplicate bib records based on shared UPCs and matching material type, regardless of hold activity.
select distinct bui.BibliographicRecordID
from polaris.polaris.BibliographicUPCIndex bui
left join
(
select brs.BibliographicRecordID
from polaris.polaris.BibRecordSets brs
where brs.RecordSetID = 31456
) not_dupes
on bui.BibliographicRecordID = not_dupes.BibliographicRecordID
where bui.UPCNumber in
(
select bui.UPCNumber
from polaris.polaris.BibliographicUPCIndex bui
join polaris.polaris.BibliographicRecords br
on bui.BibliographicRecordID = br.BibliographicRecordID
group by bui.UPCNumber
having COUNT(distinct bui.BibliographicRecordID) >= 2
and MIN(br.PrimaryMARCTOMID) = MAX(br.PrimaryMARCTOMID)
)
and not_dupes.BibliographicRecordID is null