Update:
After you made clear that 60
is the dimension of your space, not the length of the vectors, the answer below is not applicable for you, so I'll keep it just for history.
Since your vectors are normalized, you can employ kd-tree
to find the neighbors within an MBH
of incremental hypervolume.
No database I'm aware of has native support of kd-tree
, so you can try to implement the following solution in MySQL
, if you are searching for a limited number of closest entries:
- Store the projections of the vectors to each of
2
-dimensional space possible (takes n * (n - 1) / 2
columns)
- Index each of these columns with a
SPATIAL
index
- Pick a square
MBR
of a given area within any projection. The product of these MBR
's will give you a hypercube of a limited hypervolume, which will hold all vectors with a distance not greater than a given one.
- Find all projections within all
MBR
's using MBRContains
You'll still need to sort within this limited range of values.
For instance, you have a set of 4
-dimensional vectors with magnitude of 2
:
(2, 0, 0, 0)
(1, 1, 1, 1)
(0, 2, 0, 0)
(-2, 0, 0, 0)
You'll have to store them as follows:
p12 p13 p14 p23 p24 p34
--- --- --- --- --- ---
2,0 2,0 2,0 0,0 0,0 0,0
1,1 1,1 1,1 1,1 1,1 1,1
0,2 0,0 0,0 2,0 2,0 0,0
-2,0 -2,0 -2,0 0,0 0,0 0,0
Say, you want similarity with the first vector (2, 0, 0, 0)
greater than 0
.
This means having the vectors inside the hypercube: (0, -2, -2, -2):(4, 2, 2, 2)
.
You issue the following query:
SELECT *
FROM vectors
WHERE MBRContains('LineFromText(0 -2, 4 2)', p12)
AND MBRContains('LineFromText(0 -2, 4 2)', p13)
…
, etc, for all six columns