I am trying to construct a find tool query to find duplicate OCLC numbers in our system. This would be not where a record has the same 035 twice, but two or more records with the same OCLC number.
I know how to construct queries to find items with duplicate barcodes, and even bibliographic records with duplicate system numbers, but the complexity of querying duplicate data beyond an ID is stumping me. I tried adapting this query by removing the additional columns from the SELECT statements, etc., but got stumped by the t2 JOIN; querying just the t1 FROM worked.
Here is a version that shows duplicate 035 if either title has a hold on it.
select distinct b35.BibliographicRecordID
from polaris.polaris.BibliographicTag035Index b35
join polaris.polaris.SysHoldRequests shr
on shr.BibliographicRecordID = b35.BibliographicRecordID
left join
(
select brs.BibliographicRecordID
from polaris.polaris.BibRecordSets brs
where brs.RecordSetID = 31456 --if you want to exclude titles from showing up on the results, you can create a record set and put the ID of that record set here
) not_dupes on
b35.BibliographicRecordID = not_dupes.BibliographicRecordID
where b35.SystemControlNumber in
(
select b35.SystemControlNumber
from polaris.polaris.BibliographicTag035Index b35
group by b35.SystemControlNumber
having COUNT(distinct b35.BibliographicRecordID) between 2 and 10
)
and shr.SysHoldStatusID in ( 1,3,4 ) and not_dupes.BibliographicRecordID is null
Here is a version that looks for duplicates even if they don’t HAVE holds.
select distinct b35.BibliographicRecordID
from polaris.polaris.BibliographicTag035Index b35
left join
(
select brs.BibliographicRecordID
from polaris.polaris.BibRecordSets brs
where brs.RecordSetID = 31456 --if you want to exclude titles from showing up on the results, you can create a record set and put the ID of that record set here
) not_dupes on
b35.BibliographicRecordID = not_dupes.BibliographicRecordID
where b35.SystemControlNumber in
(
select b35.SystemControlNumber
from polaris.polaris.BibliographicTag035Index b35
group by b35.SystemControlNumber
having COUNT(distinct b35.BibliographicRecordID) between 2 and 10
) and not_dupes.BibliographicRecordID is null
RegEx isn’t supported natively in SQL server but here is something that might get you close, this is for the non-hold version:
SELECT DISTINCT b35.BibliographicRecordID
FROM polaris.polaris.BibliographicTag035Index b35
LEFT JOIN
(
SELECT brs.BibliographicRecordID
FROM polaris.polaris.BibRecordSets brs
WHERE brs.RecordSetID = 0 -- optional if you want to exclude some records, put them in a record set and then put the record set id in here
) not_dupes ON b35.BibliographicRecordID = not_dupes.BibliographicRecordID
WHERE b35.SystemControlNumber IN
(
SELECT b35_inner.SystemControlNumber
FROM polaris.polaris.BibliographicTag035Index b35_inner
GROUP BY b35_inner.SystemControlNumber
HAVING COUNT(DISTINCT b35_inner.BibliographicRecordID) BETWEEN 2 AND 10 --you can adjust the upper limit if you'd like
)
AND not_dupes.BibliographicRecordID IS NULL
AND (
SUBSTRING(b35.SystemControlNumber, 1, 8) LIKE '%OCOLC%'
OR SUBSTRING(b35.SystemControlNumber, 1, 8) LIKE '%ocm%'
OR SUBSTRING(b35.SystemControlNumber, 1, 8) LIKE '%ocn%'
OR SUBSTRING(b35.SystemControlNumber, 1, 8) LIKE '%on%'
)