Find Tool - Find records with duplicate OCLC numbers

Hello!

I am trying to construct a find tool query to find duplicate OCLC numbers in our system. This would be not where a record has the same 035 twice, but two or more records with the same OCLC number.

I know how to construct queries to find items with duplicate barcodes, and even bibliographic records with duplicate system numbers, but the complexity of querying duplicate data beyond an ID is stumping me. I tried adapting this query by removing the additional columns from the SELECT statements, etc., but got stumped by the t2 JOIN; querying just the t1 FROM worked.

Any thoughts?

Here is a version that shows duplicate 035 if either title has a hold on it.

select distinct b35.BibliographicRecordID
from polaris.polaris.BibliographicTag035Index b35
join polaris.polaris.SysHoldRequests shr
	on shr.BibliographicRecordID = b35.BibliographicRecordID
left join 
(
	select brs.BibliographicRecordID
	from polaris.polaris.BibRecordSets brs
	where brs.RecordSetID = 31456  --if you want to exclude titles from showing up on the results, you can create a record set and put the ID of that record set here
) not_dupes on
	b35.BibliographicRecordID = not_dupes.BibliographicRecordID
where b35.SystemControlNumber in
(
	select b35.SystemControlNumber
	from polaris.polaris.BibliographicTag035Index b35
	group by b35.SystemControlNumber
	having COUNT(distinct b35.BibliographicRecordID) between 2 and 10
)
and shr.SysHoldStatusID in ( 1,3,4 ) and not_dupes.BibliographicRecordID is null

Here is a version that looks for duplicates even if they don’t HAVE holds.

select distinct b35.BibliographicRecordID
from polaris.polaris.BibliographicTag035Index b35
left join 
(
	select brs.BibliographicRecordID
	from polaris.polaris.BibRecordSets brs
	where brs.RecordSetID = 31456 --if you want to exclude titles from showing up on the results, you can create a record set and put the ID of that record set here
) not_dupes on
	b35.BibliographicRecordID = not_dupes.BibliographicRecordID
where b35.SystemControlNumber in
(
	select b35.SystemControlNumber
	from polaris.polaris.BibliographicTag035Index b35
	group by b35.SystemControlNumber
	having COUNT(distinct b35.BibliographicRecordID) between 2 and 10
) and not_dupes.BibliographicRecordID is null

We have records with system control numbers not from OCLC, so I specifically want to narrow it down to 035s with the OCLC prefix. Any thoughts there?

Do your “real” OCLC numbers have a consistent prefix?

We use ’ (OCoLC)|ocm\d|ocn\d|on\d ’ as our regex in things like MarcEdit when doing stuff with 035s.

RegEx isn’t supported natively in SQL server but here is something that might get you close, this is for the non-hold version:

SELECT DISTINCT b35.BibliographicRecordID
FROM polaris.polaris.BibliographicTag035Index b35
LEFT JOIN
(
    SELECT brs.BibliographicRecordID
    FROM polaris.polaris.BibRecordSets brs
    WHERE brs.RecordSetID = 0 -- optional if you want to exclude some records, put them in a record set and then put the record set id in here
) not_dupes ON b35.BibliographicRecordID = not_dupes.BibliographicRecordID
WHERE b35.SystemControlNumber IN
(
    SELECT b35_inner.SystemControlNumber
    FROM polaris.polaris.BibliographicTag035Index b35_inner
    GROUP BY b35_inner.SystemControlNumber
    HAVING COUNT(DISTINCT b35_inner.BibliographicRecordID) BETWEEN 2 AND 10 --you can adjust the upper limit if you'd like
)
AND not_dupes.BibliographicRecordID IS NULL
AND (
    SUBSTRING(b35.SystemControlNumber, 1, 8) LIKE '%OCOLC%'
    OR SUBSTRING(b35.SystemControlNumber, 1, 8) LIKE '%ocm%'
    OR SUBSTRING(b35.SystemControlNumber, 1, 8) LIKE '%ocn%'
    OR SUBSTRING(b35.SystemControlNumber, 1, 8) LIKE '%on%'
)