I have 2 tables resulted from merging the following tables:
Authors
-Aid bigint
-Surname nvarchar(500)
-Email nvarchar(500)
Articles
-ArId varchar(50)
-Year int
-……Some other fields……
ArticleAuthors
-ArId varchar(50)
-Aid bigint
Classifications
-ClassNumber int
-ClassDescription nvarchar(100)
ClassArticles
-ArId varchar(50)
-ClassNumber int
After denormalizing these tables the resulted tables were:
Articles
-FieldId int
-ArId varchar(50)
-ClassNumber int (Foreign key from the Classifications table)
-Year int
Authors
-FieldId int
-ArId varchar(50) (Foreign key from the Articles table)
-Aid bigint
-Surname nvarchar(500)
-Email nvarchar(500)
-Year int
Here are the conditions of the data within the resulted tables:
- SQL Server 2008 database
- The relationships between the two tables are applied physically
- The Authors table has 50 million records
- The Articles table has 20 million records
- The author has written many articles during the same year with different emails
- There are authors in the authors table with ArIds that don’t reference ArIds in the Articles table (Orphan records)
- The values within the Year fields ranges from 2002 and 2009
- The Articles table has a unique clustered index on the [FieldId and Year] fields and this index created on 9 partitions (1 partition per year)
- The Authors table has a non-unique clustered index on the [Year, ArId, Aid] fields and this index is created on the same 9 partition as the Articles table (1 partition per year)
The question is:
We need to create a stored procedure that gets the following result from the two tables [Aid,Surname,Email] under the following conditions:
Authors that have written articles during and after a specific year (AND)
- The total number of articles for the author is greater than a specific number (AND)
- The count of the articles written by the author under a specific ClassNumber is greater than a specific percentage of the total number of his articles (AND)
- Get only the most recent email of the author (in the last year during which he has written an article)
- If the author has more than one email in the same year, get them all.
We need the query to take the least possible time
If anyone can help, Thank you very much.