tags:

views:

5518

answers:

8

I need to get a list of all documents in a site collection, which I believe I can do with either the alldocs table or the alluserdata table (MOSS 2007 SP1) but do not see how I can get the author information for the document. I do not need the contents of the document (e.g. AllDocStreams content)

Something like this:

SELECT     tp_DirName, tp_LeafName, tp_Version, tp_Modified, tp_Created
FROM         AllUserData
WHERE     (tp_ContentType = 'Document') 
AND (tp_LeafName NOT LIKE '%.css') 
AND (tp_LeafName NOT LIKE '%.jpg') 
AND (tp_LeafName NOT LIKE '%.png') 
AND (tp_LeafName NOT LIKE '%.wmf') 
AND (tp_LeafName NOT LIKE '%.gif') 
AND (tp_DirName NOT LIKE '%Template%') 
AND (tp_IsCurrentVersion = 1) 
AND (tp_LeafName NOT LIKE '%.xsl')
ORDER BY tp_SiteId, tp_ListId, tp_DirName, tp_LeafName, tp_IsCurrentVersion DESC

Is there a better way to go about this?

A: 

MOSS provides many webservices out of the box which make life a little easier. They are always worth exploring.

For this particular instance, I think the article, Getting a list of files from a MOSS document library using a SharePoint web service, will be of assistance. If this isn't your exact scenario, it will get you on the right track.

If the Document service doesn't help you, the Search service will I'm sure. Check the documentation for usage.

cciotti
I need to do this at the server / SQL level since IIS is set to time out at 2 minutes (120 seconds) and we have no control over that infrastructure piece as well as in order to actually deploy a code product (e.g. app or webpart) requires a release process that is at least 7 days.
PapaDaniel
This is really about doing legal discovery and just getting a list of all documents in the whole site collection and who created them, when and what are they named.
PapaDaniel
A: 

You can get some of the information from the UserInfo table by joining AllUserData.tp_Author to UserInfo.tp_ID, but messing around in these tables is not recommended and can be very fragile, and also your queries are not guaranteed to work after applying any patches or service packs to SharePoint. I would use either webservices or the SharePoint object model to access the data.

Cruiser
+2  A: 

Why not use the sharepoint object model rather then using the raw database approach? I know that the object model approach does have a performance penalty compared to the database, but MS could change the db schema with the next path. On the other hand the likelyhood of MS breaking their own object model is far less, and as far as I know the recommended way is to use either the object model or the web services.

Kasper
+2  A: 

Don't ever query the SharePoint database directly. This is completely unsupported and can get you into trouble moving forward (for instance, if a service-pack or hotfix modifies schema, then you app is broken).

+2  A: 

People that claim that you cannot query SharePoint databases because it is not supported are wrong. From reading the documentation, it is fine to query the database as long as you use the With (NoLock) clause. It is clearly not supported to update, delete, or insert records.

The below query is supported.

Select * From your_content_database.dbo.AllDocs With (NoLock)

I will post a query that provides the desired result in a few minutes

A: 

The below would return the top 100 largest documents that were added in the last 24 hours to the content database.
Select
   Top 100 W.FullUrl, W.Title, L.tp_Title as ListTitle, A.tp_DirName, A.tp_LeafName, A.tp_id , DS.Content , DS.Size, D.DocLibRowID, D.TimeCreated, D.Size, D.MetaInfoTimeLastModified, D.ExtensionForFile
From
   your_content_database.dbo.AllLists L With (NoLock) join
   your_content_database.dbo.AllUserData A With (NoLock)
      On L.tp_ID=tp_ListId join
   your_content_database.dbo.AllDocs D With (NoLock)
      On A.tp_ListID=D.ListID
      And A.tp_SiteID=D.SiteID
      And A.tp_DirName=D.DirName
      And A.tp_LeafName=D.LeafName join
   your_content_database.dbo.AllDocStreams DS With (NoLock)
      On DS.SiteID=A.tp_SiteID
      And DS.ParentID=D.ParentID
      And DS.ID=D.ID join
    your_content_database.dbo.Webs W With (NoLock) 
      On W.ID=D.WebID
      And W.ID=L.Tp_WebID
      And W.SiteID=A.tp_SiteID
Where
   DS.DeleteTransactionID=0x
   And D.DeleteTransactionID=0x
   And D.IsCurrentVersion=1
   And A.tp_DeleteTransactionID=0x
   And A.tp_IsCurrentVersion=1
   And D.HasStream=1
   And L.tp_DeleteTransactionId=0x
   And ExtensionForFile not in ('webpart','dwp','aspx','xsn','master','rules','xoml')
   And D.MetaInfoTimeLastModified>DateAdd(d,-1,GetDate())
Order by DS.Size desc

A: 

Five reasons not to directly query SharePoint databases http://www.sharepoint4arabs.com/AymanElHattab/Lists/Posts/Post.aspx?ID=99

Ayman El-Hattab
A: 
  • Why don't you use a Content Query web part?
  • Why don't you use a search object to query the same? This would be my preferred solution. Search has most properties already and you can add more if you need them. Search is probably a lot quicker than querying content database(s).

Whether it is supported or not, it is still bad form to query the Content Database directly and any developer who would suggest this as a solution should get a lecture ;). For instance, what happens if an admin creates a second content database to your webapp? If you query goes across site collections it will not return the desired results until you provide for this in code.

ArjanP