views:

71

answers:

1

This is for http://cssfingerprint.com

I collect timing stats about how fast the different methods I use perform on different browsers, etc., so that I can optimize the scraping speed. Separately, I have a report about what each method returns for a handful of URLs with known-correct values, so that I can tell which methods are bogus on which browsers. (Each is different, alas.)

The related tables look like this:

CREATE TABLE `browser_tests` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `bogus` tinyint(1) DEFAULT NULL,
  `result` tinyint(1) DEFAULT NULL,
  `method` varchar(255) DEFAULT NULL,
  `url` varchar(255) DEFAULT NULL,
  `os` varchar(255) DEFAULT NULL,
  `browser` varchar(255) DEFAULT NULL,
  `version` varchar(255) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `user_agent` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=33784 DEFAULT CHARSET=latin1

CREATE TABLE `method_timings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `method` varchar(255) DEFAULT NULL,
  `batch_size` int(11) DEFAULT NULL,
  `timing` int(11) DEFAULT NULL,
  `os` varchar(255) DEFAULT NULL,
  `browser` varchar(255) DEFAULT NULL,
  `version` varchar(255) DEFAULT NULL,
  `user_agent` varchar(255) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=28849 DEFAULT CHARSET=latin1

(user_agent is broken down pre-insert into browser, version, and os from a small list of recognized values using regex; I keep the original user-agent string just in case.)

I have a query like this that tells me the average timing for every non-bogus browser / version / method tuple:

select c, avg(bogus) as bog, timing, method, browser, version 
  from browser_tests as b inner join (
    select count(*) as c, round(avg(timing)) as timing, method, 
     browser, version from method_timings 
    group by browser, version, method 
    having c > 10 order by browser, version, timing
  ) as t using (browser, version, method)
  group by browser, version, method 
  having bog < 1 
  order by browser, version, timing;

Which returns something like:

c  bog  tim  method       browser  version
88 0.8333 184 reuse_insert Chrome 4.0.249.89
18 0.0000 238 mass_insert_width Chrome 4.0.249.89
70 0.0400 246 mass_insert Chrome 4.0.249.89
70 0.0400 327 mass_noinsert Chrome 4.0.249.89
88 0.0556 367 reuse_reinsert Chrome 4.0.249.89
88 0.0556 383 jquery Chrome 4.0.249.89
88 0.0556 863 full_reinsert Chrome 4.0.249.89
187 0.0000 105 jquery Chrome 5.0.307.11
187 0.8806 109 reuse_insert Chrome 5.0.307.11
123 0.0000 110 mass_insert_width Chrome 5.0.307.11
176 0.0000 231 mass_noinsert Chrome 5.0.307.11
176 0.0000 237 mass_insert Chrome 5.0.307.11
187 0.0000 314 reuse_reinsert Chrome 5.0.307.11
187 0.0000 372 full_reinsert Chrome 5.0.307.11
12 0.7500 82 reuse_insert Chrome 5.0.335.0
12 0.2500 102 jquery Chrome 5.0.335.0
[...]

I want to modify this query to return only the browser/version/method with the lowest timing - i.e. something like:

88 0.8333 184 reuse_insert Chrome 4.0.249.89
187 0.0000 105 jquery Chrome 5.0.307.11
12 0.7500 82 reuse_insert Chrome 5.0.335.0
[...]

How can I do this, while still returning the method that goes with that lowest timing?

I could filter it app-side, but I'd rather do this in mysql since it'd work better with my caching.

A: 

Get all the min time into a temp table

SELECT 
  ID
  ,MIN(Time) AS MinTime 
INTO
  #tempMinTimes
FROM
  TABLE_NAME or (Sub Query)
GROUP BY
  ID

Then Inner Join onto this temp table on the ID field.

EDIT I only saw now there are duplicate ID's, you need to get rid of duplicate ID's or group on the Method Name as well and then in your join use the ID and Method Name

Eugene Niemand
I think you didn't understand the query I'm using. There are no "duplicate IDs", and your query would not work.
Sai Emrys
sorry I didnt read the question properly, I thought C was a ID I only saw now its the count, but if you generated an unique id my query would work
Eugene Niemand