Recently, I was asked to build a shopping order system for my sister.

Due to the complexity and ambitiousness of the requirement, I chose google spreadsheet initially to build a simple system before starting with a web based system.

Then I got to a case that I need to make a cell validation which ensures there is no ID duplicated in a certain column.


Failed solution


This stack exchange question appears first when I googled.
Based on the accepted answer, I got the first solution by using countif function. It works by following expressions.

  | A    | B |        C
---------------------------------
1 | bar  | 2 | =COUNTIF(A1,A1:A3)
2 | baar | 1 | =COUNTIF(A2,A1:A3)
3 | bar  | 2 | =COUNTIF(A3,A1:A3)
---------------------------------
Column C displays the expressions entered in column B

The solution works perfectly until my sister told me that the validation continuously fails even when there is not any duplicated ID entered.

After debugging, I found out that the countif function does not work as expected in some cases. Check this example

  |          A          | B |        C
------------------------------------------------
1 | 562572814105416318  | 2 | =COUNTIF(A1,A1:A2)
2 | 562572814105416000  | 1 | =COUNTIF(A2,A1:A2)
------------------------------------------------

Wow. Surprised!!. The countif function returns the results which are totally different from what I expected.

I tried checking countif API reference for more information. Because google spreadsheet is not opened source, there is no more information than from the official document.
I guess that the reason of this weird behavior is the fuzzy matching functionality of the countif function, as it is also able to check for pattern with asterisk, number order (>=, =, <=, ...).


Lesson learnt

I did more debugging by removing digits one by one. Then I figured out that when the being compared numbers have more than 15 digits, the countif function rounds all them off.

  |          A             | B |        C
---------------------------------------------------
1 | 1.000.000.000.000.000  | 2 | =COUNTIF(A1,A1:A2)
2 | 1.000.000.000.000.001  | 1 | =COUNTIF(A2,A1:A2)
---------------------------------------------------

To make sure that all other functions do not compare value in the same manner, I also checked other places where I use vlookup.
Fortunately, vlookup does not use fuzzy matching. Thus, values are compared exactly.

  |          A             |  B  |  C  |            D
-------------------------------------------------------------------
1 | 1.000.000.000.000.000  | foo | foo | =VLOOKUP(A1,A1:B2,2,FALSE)
2 | 1.000.000.000.000.001  | bar | bar | =VLOOKUP(A2,A1:B2,2,FALSE)
-------------------------------------------------------------------

Of course countif is very useful in many cases when one wants obtain statistical values from data.
The lesson learnt here is that never use countif when the criterion parameter (the second parameter) is dynamic. You should use this function only when you know what is the value of the criterion exactly (or statically).


Solution

Eventually, I chose MATCH function because of its true equality compare function.

This google spread sheet includes all the test formulas. You can open to check by yourself or make a copy and play with more formulas.