I'm assuming that the weight is an integer. Here's an approach which joins to a dummy table to increase the row-count per the weight; first, lets prove it just at TSQL:
SET NOCOUNT ON
--DROP TABLE [index]
--DROP TABLE seo
CREATE TABLE [index] ([key] int not null) -- names for fun ;-p
CREATE TABLE seo (url varchar(10) not null, [weight] int not null)
INSERT [index] values(1) INSERT [index] values(2)
INSERT [index] values(3) INSERT [index] values(4)
INSERT [index] values(5) INSERT [index] values(6)
INSERT [index] values(7) INSERT [index] values(8)
INSERT [index] values(9) INSERT [index] values(10)
INSERT [seo] VALUES ('abc',1) INSERT [seo] VALUES ('def',2)
INSERT [seo] VALUES ('ghi',1) INSERT [seo] VALUES ('jkl',3)
INSERT [seo] VALUES ('mno',1) INSERT [seo] VALUES ('mno',1)
INSERT [seo] VALUES ('pqr',2)
DECLARE @count int, @url varchar(10)
SET @count = 0
DECLARE @check_rand TABLE (url varchar(10) not null)
-- test it lots of times to check distribution roughly matches weights
WHILE @count < 11000
BEGIN
SET @count = @count + 1
SELECT TOP 1 @url = [seo].[url]
FROM [seo]
INNER JOIN [index] ON [index].[key] <= [seo].[weight]
ORDER BY NEWID()
-- this to check distribution
INSERT @check_rand VALUES (@url)
END
SELECT ISNULL(url, '(total)') AS [url], COUNT(1) AS [hits]
FROM @check_rand
GROUP BY url WITH ROLLUP
ORDER BY url
This outputs something like:
url hits
---------- -----------
(total) 11000
abc 1030
def 1970
ghi 1027
jkl 2972
mno 2014
pqr 1987
Showing that we have the correct overall distribution. Now lets bring that into LINQ-to-SQL; I've added the two tables to a data-context (you will need to create something like the [index]
table to do this) - my DBML:
<Table Name="dbo.[index]" Member="indexes">
<Type Name="index">
<Column Name="[key]" Member="key" Type="System.Int32" DbType="Int NOT NULL" CanBeNull="false" />
</Type>
</Table>
<Table Name="dbo.seo" Member="seos">
<Type Name="seo">
<Column Name="url" Type="System.String" DbType="VarChar(10) NOT NULL" CanBeNull="false" />
<Column Name="weight" Type="System.Int32" DbType="Int NOT NULL" CanBeNull="false" />
</Type>
</Table>
Now we'll consume this; in the partial class
for the data-context, add a compiled-query (for performance) in addition to the Random
method:
partial class MyDataContextDataContext
{
[Function(Name = "NEWID", IsComposable = true)]
public Guid Random()
{
throw new NotImplementedException();
}
public string GetRandomUrl()
{
return randomUrl(this);
}
static readonly Func<MyDataContextDataContext, string>
randomUrl = CompiledQuery.Compile(
(MyDataContextDataContext ctx) =>
(from s in ctx.seos
from i in ctx.indexes
where i.key <= s.weight
orderby ctx.Random()
select s.url).First());
}
This LINQ-to-SQL query is very similar to the key part of the TSQL we wrote; lets test it:
using (var ctx = CreateContext()) {
// show sample query
ctx.Log = Console.Out;
Console.WriteLine(ctx.GetRandomUrl());
ctx.Log = null;
// check distribution
var counts = new Dictionary<string, int>();
for (int i = 0; i < 11000; i++) // obviously a bit slower than inside db
{
if (i % 100 == 1) Console.WriteLine(i); // show progress
string s = ctx.GetRandomUrl();
int count;
if (counts.TryGetValue(s, out count)) {
counts[s] = count + 1;
} else {
counts[s] = 1;
}
}
Console.WriteLine("(total)\t{0}", counts.Sum(p => p.Value));
foreach (var pair in counts.OrderBy(p => p.Key)) {
Console.WriteLine("{0}\t{1}", pair.Key, pair.Value);
}
}
This runs the query once to show the TSQL is suitable, then (like before) 11k times to check the distribution; output (not including the progress updates):
SELECT TOP (1) [t0].[url]
FROM [dbo].[seo] AS [t0], [dbo].[index] AS [t1]
WHERE [t1].[key] <= [t0].[weight]
ORDER BY NEWID()
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 3.5.30729.4926
which doesn't look too bad at all - it has both tables and the range condition, and the TOP 1
, so it is doing something very similar; data:
(total) 11000
abc 939
def 1893
ghi 1003
jkl 3104
mno 2048
pqr 2013
So again, we've got the right distribution, all from LINQ-to-SQL. Sorted?