Duplicate case number found in dataset

Issue

Crash Magic will generate an error when rendering a diagram that indicates there is a duplicate case number. This is a little misleading because the program is not capable of importing a duplicate record into the database. In reality, this error more than likely means that there is an error in a crash record that is causing our query to generate multiple rows for a single crash. A common example is when a crash has two drivers for the same vehicle.

Explanation
Crash Magic uses an SQL query to “flatten” your crash data into a single row per crash.  We use field names such as “Direction_V1” or “Movement_V2” to combine multiple rows with field names like “Direction” or “Movement” into this single row.
The most common means of doing this is query the vehicles table like this:
V1 ON (Crash.CaseId = V1.CaseId AND V1.UnitNumber = 1)
V2 ON (Crash.CaseId = V2.CaseId AND V2.UnitNumber = 2)
We use a similar strategy for drivers:
D1 ON (Crash.CaseId = D1.CaseId AND D1.UnitNumber = 1 AND D1.PersonType = ‘Driver’)
D2 ON (Crash.CaseId = D2.CaseId AND D2.UnitNumber = 2 AND D2.PersonType = ‘Driver’)
There are a lot of other examples, but these are the most common.
 
In the event that there is more than one Person record for a crash, that is in unit 1, and is recorded as a driver, a second row will be generated by the query.  This causes an error when diagramming because it would cause multiple crashes to be displayed with the same case number when there was actually only one.
 
Solution
There are a number of strategies for chasing down this problem.  In fact, starting with CM v4, the study query panel identifies and lists such duplicates for easy identification.  However, it is still a bit of a manual process to figure out why this is occurring. Here is a set of steps that can be useful in the process. It is possible do accomplish these steps within the program, but also using direct access to the source database.
  1. Locate the study query, usually called ‘Crashes’ on the Crash Magic administration form.
  2. Select the blue tab on the Crashes query panel. At the bottom of the panel there is a box for entering query criteria. In that box, enter an expression to select just the crash that generated the error.  (i.e. CaseNumber = ‘20181048629’) Then press the “Execute current query” button just to the right.
  3. If the query generates more than one row, it’s time to figure out which row in the data is causing the problem.  Most of the values in the two rows will match. However, some will not.  Typically the the rows that don’t match will be vehicle or person fields.  Identify the non-matching field names.
  4. Now that you’ve identified the fields, find out which table that data is coming from.  Do this by scrolling up or searching for one of those field names. (e.g. InjuredAge_U2_P1) Note the table “alias” at the start of the field name.  In this case, it is D2i.
  5. Scroll down to find this table alias in the ‘JOIN’ statement below.  The example JOIN list here is unusually large. Most configurations have 3 or 4 tables joined at most.  What we’re looking for is the name of the table.  In this case it is dbo.ZINJ.  You can also identify how that table is joined.
    LEFT OUTER JOIN dbo.ZINJ D2i ON ((D2i.CrashKey=C.CrashKey) AND (D2i.UnitNum=2) AND (D2i.Seating=1)))
    From experience, I will be expecting to find two ZINJ rows for this crash with UnitNum=2 and Seating=1.

  6. At this point, you must locate the source data for this record.  This may be a CSV or XML file from your state or your local PD.  You may also have access to this data in an in-house database.  Query the ZINJ table for all rows with the matching CrashKey.  In this case, indeed it was found that there were two Injury records for unit 2 and seating position 1. 
  7. Correct the offending record and re-export / re-import the crash.
Was this article helpful?
0 out Of 5 Stars
5 Stars 0%
4 Stars 0%
3 Stars 0%
2 Stars 0%
1 Stars 0%
5
How can we improve this article?
How Can We Improve This Article?